Would it make sense to add a function for inverting a hash table to the standard library?

If you have a hash table, there's a pretty reasonable "inverse" to that table: in the original, you have keys mapping to values. The obvious inverse would be a hash table whose keys are the values in the original table, and the values are lists of keys in the original table with that value. (Or perhaps sets, or some other suitable data structure.)

Given that there are functions in the standard library for things like intersecting and unioning hash tables, it seems like this inverse operation would belong there, too.

FWIW, I wrote this today:

(define (invert-hash-table ht)
  (define values-to-key-list (make-immutable-hash
                  (for/list ([val (set->list (apply set (hash-values ht)))])
                    (cons val empty))))
  (for/fold ([acc values-to-key-list])
            ([pair (hash->list ht)])
    (let* ([val (cdr pair)]
           [key (car pair)]
           [keylist (hash-ref acc val)])
      (hash-set acc val (cons key keylist)))))

(1) is there a more standard, easy way to do this? Am I missing some existing library function or one-liner?

(2) If not, would it make sense to add the above function (or something equivalent) to the standard library?

(3) A question/feedback: it was counterintuitive to me to discover that I needed to use car and cdr for the pairs, and that first and second did not work.

That seems weird to me. I know car and cdr (and I even know their etymology as 'contents of the address/decrement register') but I find it more readable to see first and second. And my guess was that first and second were just syntactic sugar / aliases for car and cdr. Although I do see some ambiguity for "second" for a list, even a list of two elements: is "second" the second element, or the list containing just the second element (that is, the rest of the list)...?

To me it doesn't make sense to include operations the data structure can't implement efficiently. If I have a program that need an "inverse" I just use two hash tables.

Do you have concrete examples where an inverse is needed?

  1. A question/feedback: it was counterintuitive to me to discover that I needed to use car and cdr for the pairs, and that first and second did not work.

The names are somewhat historical, but car and cdr signals that one works with "a data structure built with pairs" - whereas first and rest (and second`) signal that one works with lists (which is a subset of "a data structure built with pairs").

Before contracts the names first and rest was just a help for readers of the program.
After contracts were introduced, first and rest check whether they receive lists.

Note: second is the same as a cadr that accepts lists only.

1 Like

For cases where the same equality and hash functions can be used for both the keys and values, and values are unique, I'd just use

(define (invert-hash-table ht)
  (hash-map/copy ht (lambda (k v) (values v k))))

Edit: One that lets you specify which kind of hash table to return and stores keys as a list per distinct value:

(define (invert-hash-table ht #:kind [kind 'equal])
  (for/fold ([new-ht (case kind
                       [(equal) (hash)]
                       [(eqv) (hasheqv)]
                       [(eq) (hasheq)]
                       [(equal-always) (hashalw)])])
            ([(k v) (in-hash ht)])
    (hash-update new-ht v (curry cons k) '())))

(hash-update makes folding over things while accumulating results in a hash table so easy)

2 Likes

There's a lot of variation in how this could work, to the point where I'm not sure it merits its own function. All of these questions could have different answers:

  • Do you want a mutable or immutable hash table?
  • How do you want key comparisons to work? equal?, eqv?, eq?, equal-always?, or something else?
  • How do you want to handle duplicates? Error? Take the first or last one? Merge them into a collection?
  • And if you do want to gather all duplicates into a collection, do you want a list? A vector? A set?

For that reason, I would reach for either for/hash to solve this problem in simple cases or transducers and reducers in complex cases. Your invert-hash-table definition can be implemented with transducers like this:

(define (invert-hash-table ht)
  (transduce (in-hash-entries ht)
             (bisecting entry-value entry-key)
             (grouping into-list)
             #:into into-hash))

; returns (hash 1 '(a b) 2 '(e c) 3 '(d))
(invert-hash-table (hash 'a 1 'b 1 'c 2 'd 3 'e 2))
3 Likes

Skip the extra set->list, you don't need it.

(for/list ([val (apply set (hash-values ht))]) …)

You could equally do something like (edit: GMail eats whitespace on outgoing messages):

(for/fold ([h (hash)])
          ([pair (hash->list ht)])
  (match-define (cons k v) pair)
  (hash-update h v (curry cons k) empty))
1 Like

Those are good points. It does look like, since there are so many questions, and so many different answers, that any library function for inverting a hash table that attempted to serve all those different needs would at best be very complex, and possibly slow because of all the overhead of trying to be all things to all people. It could also just be difficult to use because anyone trying to use it for their specific situation would have to wade through all the documentation and examples for completely different situations.

Oh wow, transducers look pretty cool. That transduce expression is nicely readable.

1 Like