[TIL] hash-ref is very fast on hasheq

Laurent.O · October 16, 2022, 5:25pm

Not a question per se, just a positive remark: I was wondering whether hash-ref was fast for fixed large sets of fixnums, or if a custom binary search could do better. Answer: It's plenty fast: it can answer 1 000 000 queries on a set of 10 000 000 fixnums in less than 200ms in total.

For reference:

gist.github.com

https://gist.github.com/Metaxal/97a9f582b0c84c8aa1c4aebeb77dc88e

hasheq-ref-vs-bin-search.rkt

#lang racket
(require racket/fixnum
         math/base)

;;; Conclusion: hasheq-ref is (slightly) faster than a custom binary search
;;; (It's actually probably implemented as a binary search, with some more optimization,
;;;  as the bin search is within a factor 2)

(#%declare #:unsafe)

This file has been truncated. show original

The custom binary search is actually slower, by a factor 2 or so.

At least I don't have to waste time on this, so thank you, implementors

sschwarzer · October 16, 2022, 5:38pm

If you use a hasheq, I'd expect keys to be compared by identity (eq?), not by hash function results or a numerical comparison.

If key equality is checked with eq?, I wonder whether it's a good idea to use fixnums as keys. Probably fixnums aren't interned/reused. (I'm only guessing that they aren't.)

JustinZed · October 16, 2022, 5:53pm

From the Racket docs, " As a special case among numbers, two fixnums that are = are also the same according to eq?." 4.1 Equality

alexh · October 16, 2022, 11:05pm

There are a few extra things you might want to consider in an actual application:

Performance of creating a 10 000 000 fixnum hash vs creating and sorting a 10 000 000 fxvector
The number of lookups you use in your test has to match what you expect from your application. In an extreme case, if your application needs to do just a few lookups and you already have all the data in a list, it might be faster to scan the list in linear time than to construct a hash and use the much faster hash-ref
The number of elements you search for also has to match what you expect from your application. You might find that, if you have only a few thousand elements the performance profile changes.

I suspect (without checking) that, for sets of 100 elements or less, vector-memq might be faster than both the binary search in a fxvector and a hash-ref...

Alex.

simonls · October 17, 2022, 3:36pm

Additionally if you have very many elements (and depending on how dense those are distributed) it may make sense to use a bloom filter. It can tell you that something is not in the set with certainty, but it only can tell you whether something could be in the set.

So usually it is used to quickly decide that something isn't in the set and if it might be in the set you need to check some other data structure.
I mostly mention this, because I like the idea of using a bloom filter, so far I haven't had a case where it worked out to improve performance.

gus-massa · October 17, 2022, 7:14pm

I agree.

[Ignoring some minor details, like negative fixnums and the 32bits version of CS.]

In BC, the "even pointers" are real pointers to objects, and the odd "pointers" are fake pointers that are an encoded fixnums, where the N is internally represented as 2N+1.

In CS, the fixnum N is internally represented as 8N, and the other 6 remainders modulo 8 are pointers to real objects+something, where something indicate the kind of the the object, like a cons, of a box, a struct,... And if you were counting, there is 1 additional remainder that indicate that it's an special value like #t, #f, ... or a char.

So, fixnum are not interned, but imagining them as interned is a good approximation.

Topic		Replies	Views
_fixnum slow? (despite docs) Questions & Answers	11	270	December 26, 2024
Racket set performance General performance	11	550	December 27, 2021
Can I reliably use integers directly as indices for immutable hash table iteration? Questions & Answers question , racket , hash-tables	16	485	March 21, 2023
Typed Racket: the good and the bad Show & Tell	2	448	December 19, 2021
RFC: hash table pattern matching General	68	1934	January 10, 2024

[TIL] hash-ref is very fast on hasheq

Related topics