Virtual hash table

Imagine a hash that can grow to too much data to be kept in ram.
Should I rely on virtual memory?
I have ideas how to make a mutable hash-table that is stored in a file and allows
the garbage collector to remove values from ram when no longer accessible,
for normally not all values are used at the same time.
The values will be restricted to serializable ones, of course.
But: may be such a tool already exists?
If not, would one want a file stored hash-table?

As an example: I want to simulate a CPU with a large memory.
I can divide the memory in pages, store non empty pages in the file-hash and keep a few pages only in ram.

Stepping outside my area of expertise, this seems like the kind of thing databases do to avoid keeping all the data in memory (btrees on disk, or something?).

1 Like

Yeah, I'd use sqlite or another database if the data set is that big. (I've been thinking about writing a module that uses the dict generic interface with a sqlite backend for a persistent key-value store, actually)

1 Like

Thanks to shawnw and benknobble.

I had a look to sqlite in db, but it seems rather complicated for a simple user as me.

As an example: I want to simulate a CPU with a large memory.
I can divide the memory in pages, store non empty pages in the file-hash and keep a few pages only in ram.

If the strategy is to keep the pages in the database, then I think
the complexity is comparable to have the swapped out pages
in files. Performance wise I am unsure what's the best solution.

Suggestion: Often I'll start with the "dumbest" implementation that could possibly work. Put it behind an API so that someday/maybe it's easier to change the approach.

Here maybe there could be a store.rkt file that provides put-page and get-page functions. Users of those functions don't know/care how they work exactly.

Here the simplest thing might be, use the filesystem directly. Consider the filesystem is a hash table from filename to data. So try using that: put-data creates/updates/writes a page file under some dir. get-data reads the page file. The filenames are (say) page numbers.

That's it. Not clever.

Reading/writing files might turn out to be fast enough. Plus your OS will give you some level of caching for file reads.

So you could do that initially, then turn back to the "more interesting" parts of the CPU simulator. Someday, if the performance is too slow, then you could look at things like sqlite (and you'd probably feel more motivated to do so, then). OTOH maybe the performance is OK, and this is work you never need to do.

I don't know if this is helpful advice but I wanted to offer the suggestion.

2 Likes

To greghendershott

Thanks, I have seen (many years ago) a system using a directory as a hash to store data as separate files within that directory. It’s an option.

Thanks again, Jos

I suppose this got off-track, but I'd intended this to be less "use a DB" and more "see if the algorithms and implementations in DBs are suitable for your use-case."

Something like this:
Have to test more.

#lang racket/base

#|____________________________________________________________________________________________________

`VHT’ is short for ‘virtual hash table’.
A VHT is like a mutable hash but each value is stored in a file.
The files are located together in a directory.
For keys symbols are used. They must form legal filenames when transformed to strings.
".vhf" is added as file-extension.
Values must be serializable.
A no longer accessible value is garbage collectable, but remains in the VHT.
______________________________________________________________________________________________________

procedure : (VHT-open ‹path› [#:exists ‹exists›]) --> VHT?
‹path› : path-string?
‹exists› : (or/c 'error 'replace 'update) = 'error

‹exists = 'error
Creates a new directory for the VHT to be returned.
An exception is raised if the directory already exists.

‹exists› = 'replace
If directory ‹path› already exists, it is deleted, its content included.
A new directory is made for the VHT.

‹exists› = 'update
Directory ‹path› must already exists and is accepted as VHT directory.
______________________________________________________________________________________________________

procedure : (VHT? ‹obj›) --> boolean?
‹obj› : any/c

Returns #t if ‹obj› is a VHT, else returns #f.
______________________________________________________________________________________________________

procedure : (VHT-path ‹vht›) --> (or/c symbol? #f)
‹vht› : VHT?

Returns the full path of the directory in which the VHT is stored.
Returns #f if the VHT has been deleted.
______________________________________________________________________________________________________

procedure : (VHT-set! ‹vht› ‹key› ‹datum›) --> void?
‹vht› : VHT?
‹key› : symbol? (acceptable as filename).
‹datum› : serializable?

Stores the ‹datum›. If the ‹key› is already present in the ‹vhf›,
the old datum is replaced by the new ‹datum›.
______________________________________________________________________________________________________

procedure : (VHT-remove! ‹vht› ‹key›) --> void?
‹vht› : VHT?
‹key› : symbol? (acceptable as filename)

Removes the datum associated with ‹key› from the ‹vht›.
If the ‹key› has no associated value, nothing happens.
______________________________________________________________________________________________________

procedure : (VHT-ref ‹vht› ‹key›) --> any/c
‹vht› : VHT?
‹key› : symbol?

Retrieves a datum that was entered in the ‹vhf› with the ‹key›.
Raises an exception if the ‹key› cannot be found.
______________________________________________________________________________________________________

procedure : (VHT-delete ‹vht›) --> void
‹vht› : (or/c path-string? VHT?)

Deletes the directory of ‹vht›. If ‹vht› is a VHT, (VHT-path ‹vht›) is set to #f.
______________________________________________________________________________________________________

syntax : (with-VHT ‹arg› ‹path› [#:exists ‹exists›] ‹expr› ...)
‹path› : path-string?
‹exists› : (or/c 'error 'replace 'update) = 'error

Same as (let ((‹arg› (VHT-open ‹path› #:exists ‹exists›))) ‹expr› ...).
______________________________________________________________________________________________________

pocedure : (call-with-VHT ‹path› [#:exists ‹exists›] ‹proc›) --> any
‹path› : path-string?
‹exists› : (or/c 'error 'replace 'update) = 'error
‹proc› : (-> VHT? any)

Same as (‹proc› (VHT-open ‹path› #:exists ‹exists›)).
____________________________________________________________________________________________________|#

(require
  (only-in racket/serialize serialize deserialize serializable?)
  (only-in racket delete-directory/files)
  (for-syntax racket/base))

(struct VHT ((path #:mutable)))

(define (VHT-open path #:exists (exists 'error))
  (unless (path-string? path) (raise-argument-error 'VHT-open "path-string?" path))
  (define cpath (path->complete-path path))
  (define dir-exists (directory-exists? cpath))
  (case exists
    ((error)
     (when dir-exists (error 'VHT-open "new VHT requested, but it already exists: ~s" cpath))
     (make-directory cpath)
     (VHT cpath))
    ((replace)
     (delete-directory/files cpath #:must-exist? #f)
     (make-directory cpath)
     (VHT cpath))
    ((update)
     (unless dir-exists (error 'VHT-open "existing VHT requested, but not found: ~s" cpath))
     (VHT cpath))
    (else (raise-argument-error 'VHT-open "(or/c 'error 'replace 'update)" exists))))

(define (VHT-delete vht/path)
  (cond
    ((VHT? vht/path)
     (define path (VHT-path vht/path))
     (when path
       (delete-directory/files path #:must-exist? #f)
       (set-VHT-path! vht/path #f)))
    (else
      (unless (path-string? vht/path) (raise-argument-error 'VHT-delete "path-string?" vht/path))
      (delete-directory/files (path->complete-path vht/path) #:must-exist? #f))))

(define (VHT-set! vht key datum)
  (unless (VHT? vht) (raise-argument-error 'VHT-set! "VHT?" vht))
  (unless (serializable? datum) (raise-argument-error 'VHT-set! "serializable?" datum))
  (unless (symbol? key) (raise-argument-error 'VHT-set! "symbol?" key))
  (define str (symbol->string key))
  (define path (path-replace-extension (build-path (VHT-path vht) (symbol->string key)) ".vht"))
  (unless (path-string? path) (error 'VHT-put! "unacceptable key: ~s" key))
  (define (fail exn) (error "cannot deserialize: ~s~n~a" datum (exn-message exn)))
  (define serialized-datum (with-handlers ((exn:fail? fail)) (serialize datum)))
  (with-output-to-file path #:exists 'replace (λ () (write (serialize datum)))))

(define (VHT-ref vht key)
  (unless (VHT? vht) (raise-argument-error 'VHT-ref! "VHT?" vht))
  (unless (symbol? key) (raise-argument-error 'VHT-ref "symbol?" key))
  (define filename (symbol->string key))
  (unless (path-string? filename) (error 'VHT-ref "unacceptable key: ~s" key))
  (define path (path-replace-extension (build-path (VHT-path vht) filename) ".vht"))
  (unless (file-exists? path) (error 'VHT-ref "unknown key: ~s" key))
  (with-input-from-file path (λ () (deserialize (read)))))

(define (VHT-remove vht key)
  (unless (VHT? vht) (raise-argument-error 'VHT-ref! "VHT?" vht))
  (unless (symbol? key) (raise-argument-error 'VHT-ref "symbol?" key))
  (define filename (symbol->string key))
  (unless (path-string? filename) (error 'VHT-ref "unacceptable key: ~s" key))
  (define path (path-replace-extension (build-path (VHT-path vht) filename) ".vht"))
  (when (file-exists? path) (delete-file path)))

(define-syntax (with-VHT stx)
  (syntax-case stx ()
    ((_ vht path #:exists exists expr ...)
     #'(let ((vht (VHT-open path #:exists exists))) expr ...))
    ((_ vht path expr ...)
     #'(let ((vht (VHT-open path))) expr ...))))

(define (call-with-VHT path #:exists (exists 'error) proc)
  (proc (VHT-open path #:exists exists)))
;_____________________________________________________________________________________________________
; Examples

(define my-vht (VHT-open "vht" #:exists 'error))
(VHT-set! my-vht 'aap 'aap)
(VHT-set! my-vht 'noot 'noot)
(VHT-ref my-vht 'aap)
(VHT-ref my-vht 'noot)

(with-VHT my-vht "vht" #:exists 'replace
  (VHT-set! my-vht 'aap 'aap)
  (VHT-set! my-vht 'noot 'noot)
  (writeln (VHT-ref my-vht 'aap))
  (writeln (VHT-ref my-vht 'noot)))

(with-VHT my-vht "vht2" #:exists 'replace
  (VHT-set! my-vht 'aap 'aap)
  (VHT-set! my-vht 'noot 'noot)
  (writeln (VHT-ref my-vht 'aap))
  (writeln (VHT-ref my-vht 'noot))
  (VHT-remove my-vht 'aap)
  (with-handlers ((exn:fail? (λ (exn) (displayln (exn-message exn))))) (VHT-ref my-vht 'aap)))

(call-with-VHT "vht3" #:exists 'replace
  (λ (vht)
    (VHT-set! vht 'aap 'aap)
    (VHT-set! vht 'noot 'noot)
    (writeln (VHT-ref vht 'aap))
    (VHT-ref vht 'noot)))
                       
(VHT-delete my-vht)
;(VHT-delete "vht2")
(VHT-delete "vht3")
(VHT-delete "vht3")

(Hi Jos, it’s so good to see emails from you 30+ years later …)

Please find below a rewrite using contracts and in-lined purpose statements (the style I had in “How to Design Racket Programs” .. never finished).

— Using contracts disentangles the checking code from the work-performing code.
— Using contracts and combining them with your purpose statements creates as much checkable and well-documented interface as possible.
— I assume you know all this so I just did the work for you.

— I do wonder whether an AI could generate standard API documentation from such a (provide ..) form.

— Going through the exercise exposed a couple of small mistakes in your (very nice!!!) prose-form specifications.
— I also left two notes behind. One is about something I didn’t fix. Another one is a design concern (and I am beginning to think your choice is the right one).

Thanks for sharing — Matthias

p.s. I urge you to turn the examples into actual unit tests. (I ran the examples via raco test VMT.rkt ; rm -rf vht/ vht2/ but one should be able to do this from inside Emacs or DrRacket or VSCode or whatever.)

#lang racket/base

;; `VHT’ is short for ‘virtual hash table’.
;; A VHT is like a mutable hash but each value is stored in a file.
;; The files are located together in a directory.
;; For keys symbols are used. They must form legal filenames when transformed to strings.
;; ".vhf" is added as file-extension.
;; Values must be serializable.
;; A no longer accessible value is garbage collectable, but remains in the VHT.

(require (only-in racket/contract contract-out ->i -> any/c or/c))
(require (only-in racket/format ~a))

(provide

 VHT?

 ;; __________________________________________________________________________________________________
 ;; 
 ;; procedure : (VHT-path ‹vht›) --> (or/c symbol? #f)
 ;; ‹vht› : VHT?
 ;; 
 ;; Returns the full path of the directory in which the VHT is stored.
 ;; Returns #f if the VHT has been deleted.

 ;; MF Note 1: the result contract is plain wrong unless you `set!` the default predicate
 ;; MF Mote 2: making the path field mutable feels wrong as a design here
 ;;            but you're not exporting set-VHT-path! so it's at least safe. 

 VHT-path
 

 (contract-out
  (VHT-open
   
   ;; ‹exists = 'error
   ;; Creates a new directory for the VHT to be returned.
   ;; 
   ;; ‹exists› = 'replace
   ;; If directory ‹path› already exists, it is deleted, its content included.
   ;; A new directory is made for the VHT.
   ;; 
   ;; ‹exists› = 'update
   ;; Directory ‹path› is accepted as VHT directory.
   
   (->i ([path path-string?]) (#:exists [exists (or/c 'error 'replace 'update)])
        #:pre/name (path exists) "new VHT requested, but it already exists"
        (when (equal? exists 'error) (not (directory-exists? (path->complete-path path))))
        #:pre/name (path exists) "existing VHT requested, but not found: ~s"
        (when (equal? exists 'update) (not (directory-exists? (path->complete-path path))))
        (r VHT?)))

  [VHT-delete

   ;; Deletes the directory of ‹vht›. If ‹vht› is a VHT, (VHT-path ‹vht›) is set to #f.
   
   (-> (or/c VHT? path-string?) void?)]

  [VHT-set!
   
   ;; Stores the ‹datum›. If the ‹key› is already present in the ‹vhf›,
   ;; the old datum is replaced by the new ‹datum›.
   
   (->i ([vht VHT?] [key symbol?] [datum serializable?])
        #:pre/name (vht key) "unacceptable key"
        (path-string? (path-replace-extension (build-path (VHT-path vht) (~a key)) ".vht"))
        (r void?))]

  [VHT-ref
   
   ;; Retrieves a datum that was entered in the ‹vhf› with the ‹key›.
   
   (->i ([vht VHT?] [key symbol?])
        #:pre/name (key) "unacceptable key"
        (path-string? (symbol->string key))
        (r any/c))]

  [VHT-remove

   ;; Removes the datum associated with ‹key› from the ‹vht›.
   ;; If the ‹key› has no associated value, nothing happens.
   
   (->i ([vht VHT?] [key symbol?])
        #:pre/name (key) "unacceptable key"
        (path-string? (symbol->string key))
        (r any/c))])

 ;; __________________________________________________________________________________________________
 ;; 
 ;; syntax : (with-VHT ‹arg› ‹path› [#:exists ‹exists›] ‹expr› ...)
 ;; ‹path› : path-string?
 ;; ‹exists› : (or/c 'error 'replace 'update) = 'error
 ;; 
 ;; Same as (let ((‹arg› (VHT-open ‹path› #:exists ‹exists›))) ‹expr› ...).

 with-VHT

 (contract-out
  [call-with-VHT
   
   ;; Same as (f (VHT-open ‹path› #:exists ‹exists›))`. 

   (->i ([path path-string?] [f (-> VHT? any/c)]) (#:exists (exists (or/c 'error 'replace 'update)))
        (r any/c))])

 )
 
;; ---------------------------------------------------------------------------------------------------
(require
  (only-in racket/serialize serialize deserialize serializable?)
  (only-in racket delete-directory/files)
  (for-syntax racket/base))

(struct VHT ((path #:mutable)))

(define (VHT-open path #:exists (exists 'error))
  (define cpath (path->complete-path path))
  (case exists
    ((error)
     (make-directory cpath)
     (VHT cpath))
    ((replace)
     (delete-directory/files cpath #:must-exist? #f)
     (make-directory cpath)
     (VHT cpath))
    ((update)
     (VHT cpath))))

(define (VHT-delete vht/path)
  (cond
    ((VHT? vht/path)
     (define path (VHT-path vht/path))
     (when path
       (delete-directory/files path #:must-exist? #f)
       (set-VHT-path! vht/path #f)))
    (else
     (delete-directory/files (path->complete-path vht/path) #:must-exist? #f))))

(define (VHT-set! vht key datum)
  (define path (path-replace-extension (build-path (VHT-path vht) (symbol->string key)) ".vht"))
  (define (fail exn) (error "cannot deserialize: ~s~n~a" datum (exn-message exn)))
  (define serialized-datum (with-handlers ((exn:fail? fail)) (serialize datum)))
  (with-output-to-file path #:exists 'replace (λ () (write (serialize datum)))))

(define (VHT-ref vht key)
  (define filename (symbol->string key))
  (define path (path-replace-extension (build-path (VHT-path vht) filename) ".vht"))
  (unless (file-exists? path) (error 'VHT-ref "unknown key: ~s" key))
  (with-input-from-file path (λ () (deserialize (read)))))

(define (VHT-remove vht key)
  (define filename (symbol->string key))
  (define path (path-replace-extension (build-path (VHT-path vht) filename) ".vht"))
  (when (file-exists? path) (delete-file path)))

;; ---------------------------------------------------------------------------------------------------
;; syntax for examples 

(define-syntax (with-VHT stx)
  (syntax-case stx ()
    ((_ vht path #:exists exists expr ...)
     #'(let ((vht (VHT-open path #:exists exists))) expr ...))
    ((_ vht path expr ...)
     #'(let ((vht (VHT-open path))) expr ...))))

(define (call-with-VHT path #:exists (exists 'error) proc)
  (proc (VHT-open path #:exists exists)))
;_____________________________________________________________________________________________________
; Examples

(define my-vht (VHT-open "vht" #:exists 'error))
(VHT-set! my-vht 'aap 'aap)
(VHT-set! my-vht 'noot 'noot)
(VHT-ref my-vht 'aap)
(VHT-ref my-vht 'noot)

(with-VHT my-vht "vht" #:exists 'replace
  (VHT-set! my-vht 'aap 'aap)
  (VHT-set! my-vht 'noot 'noot)
  (writeln (VHT-ref my-vht 'aap))
  (writeln (VHT-ref my-vht 'noot)))

(with-VHT my-vht "vht2" #:exists 'replace
  (VHT-set! my-vht 'aap 'aap)
  (VHT-set! my-vht 'noot 'noot)
  (writeln (VHT-ref my-vht 'aap))
  (writeln (VHT-ref my-vht 'noot))
  (VHT-remove my-vht 'aap)
  (with-handlers ((exn:fail? (λ (exn) (displayln (exn-message exn))))) (VHT-ref my-vht 'aap)))

(call-with-VHT "vht3" #:exists 'replace
               (λ (vht)
                 (VHT-set! vht 'aap 'aap)
                 (VHT-set! vht 'noot 'noot)
                 (writeln (VHT-ref vht 'aap))
                 (VHT-ref vht 'noot)))
                       
(VHT-delete my-vht)
;(VHT-delete "vht2")
(VHT-delete "vht3")
(VHT-delete "vht3")

Hi Matthias

Thanks very much!!!

Indeed, the contract on VHT-path is wrong, should be - path? -.
About the mutable field: Its mutability is not required if I restrict the argument of
VHT-delete to paths (not VHTs) or remove this procedure entirely. I am not yet sure how to decide.

//Using contracts disentangles the checking code from the work-performing code//
Sure, but I think the performance is low because of the overhead of opening and closing a file each time VHT-ref or VHT-set! is called. I think the checks take a very small amount of overall time. Nevertheless, I take your advice seriously. I am working on a VHT that uses file positioning in one single random accessible file, but this is a much more complicated job (requires an index to be added to the file
when closing it (possibly a separate file), requires administration of unoccupied space after removing data, and so on) I guess file positioning is faster than opening and closing files. In the old days CDC's FORTRAN had nice tools for such files.

The documentation should be put in a scribble file and should include examples, I think. The documentation is with the code because that is the way I work: First: find out what you want, second: design and feasibility tests, third: document, fourth: possibly return to the first step, last: code. However, examples and documenting with scribble goes not well without the code. That's why I start with prose preceding the code, just to awaken my mind and to have ALL documentation and design available while coding.

Thanks again, Jos

The contracts are merely reformulations of the conditions that were in your original code.

What do you mean by ‘performance is low’? Are you saying moving the conditions into contracts caused a slow-down?

— Matthias

On the contrary, moving conditions into contracts speed up, I think.

What I wanted to say is that such speed up may be very small compared to the overall time due to closing and opening files all the time.

Jos

Yes, the performance of your implementation is definitely determined by the file ops.
That’s why you hide behind a contracted interface and when the performance
bottleneck kicks in, replace the implementation but not the interface.

FWIW, I even moved your error messages into the contract system so that would
mostly stay the same.

You may also consider imposing a trace contract on the interface of the following shape:

VHT-open . (VHT-set! | VHT-ref | VHT-remove)* VHT-delete

You could even specialize the repetition to the keys that VHT-set! introduces.

[[ https://github.com/camoy/trace-contract ]]

The default-stuffer from the Racket web server uses this approach to store serialized continuations that are too large to put in URLs directly: it saves them in files under ~/.urls/, where the name of the file is the MD5 hash of the content.

The underlying functions are provided for use in non-default "stuffers". The web-server/stuffers/store module (source) provides an interface for key–value stores and an implementation for directory-backed storage. The hash-stuffer function (source) then implements hash-addressed storage on top of it.

1 Like

The idea of a trace contract is interesting. I’ll look into that.

Separating the interface from the implementation is interesting too.

324ADA0EF8D34B2E9DDEC96E1494687E.png

Jos, in case you want to give that SQLite option a try, and you're willing to use 'dict-*' methods instead of specifically named ones, here you go:

#lang racket

(require db)
(require struct-plus-plus)
(require racket/runtime-path)

; drp allows us to access the backing store by reference to this file
; instead of to the working directory
(define-runtime-path here ".")
(define backing-store (build-path here "vht.db"))

; This will create the database if it's not there already
(define dbh (sqlite3-connect #:database backing-store #:mode 'create))

; Not actually necessary to use struct++ here but it's my module and I like it
(struct++ virtual-hash-table ([dbh connection?])
          #:methods gen:dict
          [(define (dict-ref dict key  [default (thunk (error "key not found" key))])
             ; The explicit exclusion of exn:break isn't necessary
             ; since I'm only trapping things that are both exn:fail
             ; and have a particular message, but I find it a good
             ; reminder to myself
             (with-handlers ([exn:break? raise] ; never trap user breaks (e.g. ^C)
                             [(λ (e)
                                (and (exn:fail? e)
                                     (regexp-match #"query-value: query returned wrong number of rows"
                                                   (exn-message e))))
                              (λ (e)
                                (if (procedure? default)
                                    (default)
                                    default))
                              ]
                             [any/c (λ (e) (pretty-print e))]
                             )
               (query-value (virtual-hash-table.dbh dict) "select value from data where key = ?" key)))
           (define (dict-set! dict key val)
             (query-exec (virtual-hash-table.dbh dict)
                         "insert or replace into data (key, value) values (?, ?)"
                         key val))
           (define (dict-remove! dict key)
             (query-exec (virtual-hash-table.dbh dict) "delete from data where key = ?" key))
           (define (dict-count dict)
             (query-value (virtual-hash-table.dbh dict) "select count(1) from data"))])

(define vht (virtual-hash-table++ #:dbh dbh)) ; keyword constructor from struct++
(define (defatalized-dict-ref dict key)
  (with-handlers ([any/c identity]) ; no default throws exn, defatalize it and return the exn
    (dict-ref dict key)))

; Make sure to start from a clean slate if you run this multiple times
(query-exec (virtual-hash-table.dbh vht) "delete from data")


(dict-ref vht "first_name" "no such key 'first_name' so here is a string default value")
(dict-ref vht "first_name" (thunk "no such key 'first_name', have a procedurally-generated default value"))
(displayln (~a "if no default value provided, dict-ref for a non-existent key gives: "
               (defatalized-dict-ref vht "first_name")))

; Insert the key, then demonstrate that it was inserted
(dict-set! vht "first_name" "David")
(displayln (~a "value of 'first_name' key: " (dict-ref vht "first_name")))

; Remove the key, demonstrate it was removed
(dict-remove! vht "first_name")
(displayln (~a "trying to access first_name key which was just removed, the result should be struct:exn:fail. Result:  "
               (defatalized-dict-ref vht "first_name")))

; Insert a couple of things, then count how many items in the table.
(dict-set! vht "first_name" "David")
(dict-set! vht "last_name" "Storrs")
(displayln (~a "the hash has " (dict-count vht) " items"))

On my machine this prints:

"no such key 'first_name' so here is a string default value"
"no such key 'first_name', have a procedurally-generated default value"
if no default value provided, dict-ref for a non-existent key gives: #(struct:exn:fail key not found "first_name" #<continuation-mark-set>)
value of 'first_name' key: David
trying to access first_name key which was just removed, the result should be struct:exn:fail. Result:  #(struct:exn:fail key not found "first_name" #<continuation-mark-set>)
the hash has 2 items
test.rkt> 
"no such key 'first_name' so here is a string default value"
"no such key 'first_name', have a procedurally-generated default value"
if no default value provided, dict-ref for a non-existent key gives: #(struct:exn:fail key not found "first_name" #<continuation-mark-set>)
value of 'first_name' key: David
trying to access first_name key which was just removed, the result should be struct:exn:fail. Result:  #(struct:exn:fail key not found "first_name" #<continuation-mark-set>)
the hash has 2 items

There's a limitation with my SQLite-backed implementation from before: it is limited to keys and values of types that SQLite can handle. Here's a version that allows you to use any value that can be (de)serialized into something that SQLite can understand, provided that you specify the appropriate (de)serializer functions. I've included to/from JSON versions as examples. I've also expanded the list of operations the 'dict' can perform.

This, of course, creates a separate concern: if you use (e.g.) a list as a key in your VHT then when you round-trip it through the database what you get back will not be eq? to what you put in. This will be an issue with pretty much any backing store you use that isn't RAM.

This time around, struct++ is very much required.

#lang racket

(require db)
(require struct-plus-plus)
(require racket/runtime-path)
(require json)

(define (serialize/json val)
  (with-output-to-string (thunk (write-json val))))

(define (deserialize/json val)
  (with-input-from-string val (thunk (read-json))))

(struct++ virtual-hash-table ([(filepath (make-temporary-file (~a "racket-virtual-hash-table-~a-backing-store-sqlite3.db")))
                               path-string?]
                              [(serializer   identity) (procedure-arity-includes/c 1)]
                              [(deserializer identity) (procedure-arity-includes/c 1)]
                              ; Don't set this, it's autogenerated
                              [(dbh #f)  (or/c #f connection?)])
          (#:rule ("create the DB connection"
                   #:transform dbh (filepath)
                   [; don't trust the user, make sure the connection goes to the right file
                    (when dbh (disconnect dbh))
                    (define connect (sqlite3-connect #:database filepath #:mode 'create))
                    (query-exec connect "create table if not exists data (key unique, value)")
                    connect]))


          #:methods gen:dict
          [(define (dict-has-key? dict key)
             (= 1 (query-value (virtual-hash-table.dbh dict)
                               "select count(1) from data where key = ?"
                               ((virtual-hash-table.serializer dict) key))))

           (define (dict-empty? dict)
             (zero? (dict-count dict)))

           (define (dict-keys dict)
             (define s (virtual-hash-table.serializer dict))
             (map s (query-rows (virtual-hash-table.dbh dict) "select key from data")))

           (define (dict-values dict)
             (define d (virtual-hash-table.deserializer dict))
             (map d (query-rows (virtual-hash-table.dbh dict) "select value from data")))

           (define (dict->list dict)
             (define deserialize (virtual-hash-table.deserializer dict))
             (for/list ([row (query-rows (virtual-hash-table.dbh dict)
                                         "select key, value from data")])
               (match row
                 [(vector key val) (list (deserialize key) (deserialize val))]))
             )

           (define (dict-ref dict key  [default (thunk (error "key not found" key))])
             (define serialize   (virtual-hash-table.serializer dict))
             (define deserialize (virtual-hash-table.deserializer dict))
             ; The explicit exclusion of exn:break isn't necessary
             ; since I'm only trapping things that are both exn:fail
             ; and have a particular message, but I find it a good
             ; reminder to myself
             (with-handlers ([exn:break? raise] ; never trap user breaks (e.g. ^C)
                             [(λ (e)
                                (and (exn:fail? e)
                                     (regexp-match #"query-value: query returned wrong number of rows"
                                                   (exn-message e))))
                              (λ (e)
                                (if (procedure? default)
                                    (default)
                                    default))
                              ]
                             [any/c (λ (e) (pretty-print e))])
               (deserialize
                (query-value (virtual-hash-table.dbh dict)
                             "select value from data where key = ?"
                             (serialize key)))))

           (define (dict-set! dict key raw-val)
             (define serialize (virtual-hash-table.serializer dict))
             (query-exec (virtual-hash-table.dbh dict)
                         "insert or replace into data (key, value) values (?, ?)"
                         (serialize key) (serialize raw-val)))

           (define (dict-remove! dict key)
             (define serialize (virtual-hash-table.serializer dict))
             (query-exec (virtual-hash-table.dbh dict) "delete from data where key = ?"
                         (serialize key)))

           (define (dict-count dict)
             (query-value (virtual-hash-table.dbh dict) "select count(1) from data"))

           (define (dict-clear! dict)
             (query-exec (virtual-hash-table.dbh dict) "delete from data"))

           ])

; Convenience method, only used for the demo
(define (defatalized-dict-ref dict key)
  (with-handlers ([any/c identity]) ; no default throws exn, defatalize it and return the exn
    (dict-ref dict key)))

(module+ main
  ; drp allows us to access the backing store by reference to this file
  ; instead of to the working directory
  (define-runtime-path here ".")
  (define backing-store (build-path here "vht.db"))

  {define (run-demo vht)
    (displayln (~a "\nThe backing store for this vht is: " (virtual-hash-table.filepath vht)))

    ; Make sure to start from a clean slate if you run this multiple times.
    ;
    ; NB: Were I writing this as a module, I would not export or document
    ; the 'dbh' field; it is private.  Clearing the table can better be
    ; accomplished using dict-clear (see below) but I wanted to
    ; demonstrate the low-level version first.
    (query-exec (virtual-hash-table.dbh vht) "delete from data")

    (displayln (~a "\nThe vht is empty?: " (dict-empty? vht)))

    (displayln (~a "dict has key 'first_name'?: " (dict-has-key? vht "first_name")))
    (displayln  (dict-ref vht "first_name" "No such key 'first_name' so here is a string default value"))
    (displayln (dict-ref vht "first_name" (thunk "No such key 'first_name', have a procedurally-generated default value")))
    (displayln (~a "If no default value provided, dict-ref for a non-existent key gives: "
                   (defatalized-dict-ref vht "first_name")))

    ; Insert the key, then demonstrate that it was inserted
    (dict-set! vht "first_name" "David")
    (display "Value of 'first_name' key after insert: ") (println (dict-ref vht "first_name"))

    ; Remove the key, demonstrate it was removed
    (displayln "Removing 'first_name' key...")
    (dict-remove! vht "first_name")
    (display "Trying to access first_name key which was just removed, the result should be struct:exn:fail. Result:  ") (println (defatalized-dict-ref vht "first_name"))

    (displayln "Adding a bunch of items including some non-primitive types that SQLite cannot handle by default...")
    (define alphabet-hash (hash 'a "apple" 'b "ball")) ; this is a hash, not a hasheq
    (dict-set! vht "first_name" "David")
    (dict-set! vht "last_name" "Storrs")
    (dict-set! vht 17 "seventeen (has a non-string key)")
    (dict-set! vht (hash 'a "apple" 'b "ball") "used alphabet hash as key")
    (dict-set! vht "using alphabet hash as value" alphabet-hash)
    (displayln (~a "The dict has " (dict-count vht) " items"))
    (display "dict->list: ")  (pretty-print (dict->list vht))
    (display (~a "(dict-ref vht alphabet-hash #f) gives: ")) (println  (dict-ref vht alphabet-hash #f))
    (displayln "Clearing the dict...")
    (dict-clear! vht)
    (displayln (~a "The dict has " (dict-count vht) " items"))

    }

  ;; (displayln "Running the demo using a vht with a specified backing store: ")
  ;; (run-demo (virtual-hash-table++ #:filepath backing-store))
  ;; (displayln "")
  ;; (displayln "Running the demo using a vht with a self-generated backing store: ")
  ;; (run-demo (virtual-hash-table++))

  (displayln "\nRunning the demo using a vht with a self-generated backing store that (de)serializes its keys and values from/to JSON so that you can store things other than primitives")
  (run-demo (virtual-hash-table++ #:serializer serialize/json #:deserializer deserialize/json))
  )

On my machine I get:

Running the demo using a vht with a self-generated backing store that 
(de)serializes its keys and values from/to JSON so that you can store things 
other than primitives

The backing store for this vht is: /var/folders/2n/05gry83x6wd2cdwtq4l73bbm0000gn/T/racket-virtual-hash-table-17157112291715711229621-backing-store-sqlite3.db

The vht is empty?: #t
dict has key 'first_name'?: #f
No such key 'first_name' so here is a string default value
No such key 'first_name', have a procedurally-generated default value
If no default value provided, dict-ref for a non-existent key gives: #(struct:exn:fail key not found "first_name" #<continuation-mark-set>)
Value of 'first_name' key after insert: "David"
Removing 'first_name' key...
Trying to access first_name key which was just removed, the result should be exn:fail. Result:  (exn:fail "key not found \"first_name\"" #<continuation-mark-set>)
Adding a bunch of items including some non-primitive types that SQLite cannot handle by default...
The dict has 5 items
dict->list: '(("first_name" "David")
              ("last_name" "Storrs")
              (17 "seventeen (has a non-string key)")
              (#hasheq((a . "apple") (b . "ball")) "used alphabet hash as key")
              ("using alphabet hash as value"
               #hasheq((a . "apple") (b . "ball"))))
(dict-ref vht alphabet-hash #f) gives: "used alphabet hash as key"
Clearing the dict...
The dict has 0 items

Thanks,

I know nothing about SQLite, but can use your post as an example when I would decide to learn more about SQLite..

Jos

If you're familiar with any relational database (e.g. PostgreSQL, MySQL, Oracle...) then you can do SQLite just fine. It has a couple of things to be aware of -- the database is in one file, which means that it's trivial to back up, but multiple writers at a time are slow because they block each other.