Can I reliably use integers directly as indices for immutable hash table iteration?

MartianMoorGrove · March 11, 2023, 11:12pm

For immutable hash tables, can I reliably use integers ranged 0 to (- (hash-count ht) 1) directly as indices for the hash-iterate- variant procedures rather than return indices from hash-iterate-first and hash-iterate-next? I cannot see why they're not reliable, but maybe I'm missing something.

Using the integers directly gives me about a 4% speed boost on my computer, and I could reference hash table entries in a pseudorandom manner that may be useful in certain algorithms.

Clarification: my question includes the unsafe versions of those procedures.

gus-massa · March 12, 2023, 2:26am

I don't think so. Does this trick work in CS and BC?

IIRC, a few years ago, @stchang and me added some optimization to hash-iterate. The previous representation was a list of sub-hashes, and the new representation was the numbers telling which branch to choose while you descend in the secret tree that represent the hash, all encoded as a digits of a fixnum in hexadecimal (or something like that). So the numbers were increasing, but there were gaps here and there. I don't remember the details now, and they may have changed in this year and they may change in the future.

I don't understand what you want to do. Something like this?

#lang racket

(define h (hash 1 2 3 4 5 6 7 8 9 0))

(for ([i (in-range 3)])
  (display (hash-iterate-value h
                               (random (hash-count h)))))

; ==> 600
; ==> 204
; ==> 664
; ==> 220
; ==> 064

MartianMoorGrove · March 12, 2023, 4:26am

I don't know if it works in BC. It has been working for me in CS, but I wanted to make sure that it always will.

The gaps do appear in the mutable hash tables, but I was hoping they didn't exist in immutable ones.

Your example is close but I separate the two ideas. I would like to use them similarly to this:

For iteration

#lang racket

(define h (hash 1 2 3 4 5 6 7 8 9 0))

(for ([i (in-range 3)])
  (displayln (hash-iterate-value h i)))

For random referencing

#lang racket

(define h (hash 1 2 3 4 5 6 7 8 9 0))

(define last-index (- (hash-count h) 1))

(define approximate-random-flonum-index (* (random) last-index))

(define random-flonum-index (round approximate-random-flonum-index))

(define random-index (inexact->exact random-flonum-index))

(hash-iterate-value h random-index)

For that last example, I do it that way so that I can generate indices larger than what random can on its own.

alexh · March 12, 2023, 7:44am

So you are effectively asking for a guarantee that the internal implementation of the immutable hash tables in Racket CS will never change?

This would not only prevent using a more performant implementation in the future, should one be discovered, but also prevent fixing any defects that might exist in the current implementation.

Alex.

benknoble · March 12, 2023, 4:04pm

I don't know if you need other hash-table features (in which case, you may need to pay the complexity of maintaining parallel structures over the same data), but if you need integers in (range n) as indices and fast random access, why not use a vector? That is, after all, what they are designed for.

LiberalArtist · March 12, 2023, 4:53pm

MartianMoorGrove:

For random referencing

#lang racket

(define h (hash 1 2 3 4 5 6 7 8 9 0))

(define last-index (- (hash-count h) 1))

(define approximate-random-flonum-index (* (random) last-index))

(define random-flonum-index (round approximate-random-flonum-index))

(define random-index (inexact->exact random-flonum-index))

(hash-iterate-value h random-index)

For that last example, I do it that way so that I can generate indices larger than what random can on its own.

This example is interesting.

For generating indexes larger than random can do, consider random-natural from math/base. (It would be good to link to that from the random docs!)

For getting a random entry from a hash table, I think this is the best you can do with current APIs:

#lang racket

(require math/base)

(define/contract (random-key+value hsh)
  (-> (and/c hash? (not/c hash-empty?))
      (values any/c any/c))
  (let loop ([target (random-natural (hash-count hsh))]
             [pos (hash-iterate-first hsh)])
    (if (zero? target)
        (hash-iterate-key+value hsh pos)
        (loop (sub1 target) (hash-iterate-next hsh pos)))))

(random-key+value #hash([a . 1] [b . 2] ["x" . "y"] [42 . -1/3]))

I think a hash-iterate-nth function could be added without constraining future improvements to the internal representation of hash tables, but I'm not sure that it would be useful enough to justify new primitives.

On your overall question, though, the docs for hash-iterate-first and hash-iterate-next are pretty explicit that indexes are not guaranteed to be consecutive integers.

MartianMoorGrove · March 12, 2023, 10:24pm

I do not know that the direct use of integers on immutable hash tables would ever be incompatible with future changes or improvements to their internal implementations. Here are two questions that could address that and should close this topic:

Are hash-iterate-first and hash-iterate-next only needed to skip over empty hash table buckets that are left over from mutating removals?
Will immutable hash tables "never" have empty buckets due to their functional nature?

I am hoping for a guarantee that this topic's subject functionality is reliable right now. If it is and the previous questions are answered with yes, I would likely create a Feature Request asking for the functionality to be guaranteed into the future. Being able to access or traverse a hash table almost like it's a vector would allow for some optimizations. With that said, I want my hope to be doomed here if it's going to be, rather than on GitHub.

MartianMoorGrove · March 12, 2023, 10:52pm

My context is that I'm using immutable hash tables to represent vertex edges in a weighted graph implementation. Fast random access would be useful for probing a vertex's table of edges without knowing any of the keys/neighbors, either for content sampling or random entry into graph traversal.

I may indeed have to maintain something in parallel if I can't reliably use integers directly as indices.

shhyou · March 12, 2023, 10:53pm

But immutable hash tables can still be created via hash-add and hash-remove. Would those two operations leave gaps in the hash tables?

MartianMoorGrove · March 12, 2023, 11:12pm

Thanks for mentioning that procedure! I will test it out.

Your example is a valid option for me if I can't use integers directly as indices.

Yeah I'm not sure hash-iterate-nth would have a enough widespread usage to justify a new primitive.

I was hoping immutable hash tables could be an exception to the current documentation. My two new questions in a previous post may determine that, eventually.

@shhyou I just saw your reply so I'll include mine to you here. I have not encountered gaps when using hash-set or hash-remove, but was hoping for assurances that it never happens.

mflatt · March 12, 2023, 11:51pm

The hash-iterate-first and hash-iterate-next operations are not intended merely to skip over empty slots. They do depend on the implementation, and we've experimented with different encodings of position in different implementations, as @gus-massa says. So, allowing a number between 0 and the key count doesn't seem like something we'd want to guarantee.

Note that unsafe versions of the operation are specific to different hash table configurations (immutable vs. immutable and weak vs. non-weak), and those don't even promise an exact integer repetition of the iteration. The implementation of immutable hash tables does not use an integer for unsafe-immutable-hash-iterate-first, for example.

I can imagine adding a immutable-hash-nth operation on immutable hash tables that would turn out to be the same for immutable hash tables currently as hash-iterate-key+value, though. It wouldn't be guaranteed to have exactly the same performance in the future.

hendrikboom3 · March 13, 2023, 12:23am

While we're on the dsubject of hash tables, can we please have mutable hach tables in typed Racket? That might require a few other changes, in typed hash tables. such as that hashtable becoms a union of Mutablehashtable and immutablehashtable.

-- hendrik

MartianMoorGrove · March 13, 2023, 12:54am

That implicitly provides a good, solid answer of no overall, so my side of this topic is wrapped up. Thank you!

jbclements · March 13, 2023, 6:18am

I think I must be misunderstanding your question. Specifically, it looks to me like Typed Racket already has mutable hash tables?

#lang typed/racket

(define a : (Mutable-HashTable Symbol Natural) (make-hash))

(hash-set! a 'a 134)
(hash-ref a 'a)

Apologies if I'm misunderstanding you.

hendrikboom3 · March 13, 2023, 10:38am

Interesting. I'm going to have to look again at my attempt to use them and figure out why it failed.
Thanks for showing me this.

-- hendrik

dstorrs · March 15, 2023, 9:17pm

Have you looked at the Racket Generic Graph Library library? It will do a lot of the heavy lifting for you. There's also an immutable cyclic data structure discussed at the bottom of 4.10 Pairs and Lists although I suspect that isn't what you're looking for.

MartianMoorGrove · March 21, 2023, 2:29am

I just noticed your comment and apologize for my delay in response.

Yes, I have looked at the Racket Generic Graph Library. I decided to write my own, more specialized library that better suits another project I'm working on.

Thank you for that second link! I was unaware that placeholders for immutable hash tables were added to Racket. I may find a use for them.

Topic		Replies	Views
Hash-table name? General	2	282	February 13, 2023
Typed Racket: the good and the bad Show & Tell	2	447	December 19, 2021
How to turn a list into a hash table? Questions & Answers question	2	579	October 31, 2022
Check-eq? not behaving properly in Typed Racket Questions & Answers	3	155	September 20, 2023
Type Checker: Inference for polymorphic keyword functions not supported Questions & Answers question , typed-racket	6	435	December 21, 2021

Can I reliably use integers directly as indices for immutable hash table iteration?

Related topics