Racket set performance

I ported a Python version of Advent of Code Day 25 to this Racket version , and I was surprised by what appears to be the relatively poor performance of Racket's set implementation.

The Python version runs in about 1.5 seconds, and the Racket version runs in about 10 seconds. I'll try and put together a simpler example comparing the two, but if anyone has some insight into the Racket set performance, please share. I realize Python's set implementation is in C, but I would still expect Racket to be within a factor of 2 to 4, or so.

My main solution uses a vector and runs in 1.1 seconds, but I'm still interested in seeing if I can get the Racket set version closer to the Python speed.

If I remember correctly, there seemed to be a performance issue with Racket hash tables that gave some slow timings on some Computer Language Benchmark Game entries, so if the set is implemented with a hash table (likely?) then maybe it's related.

UPDATE: I found the Computer Benchmark Game program that mentions the performance issue with hash tables in the comments. And even with the issue, Racket was only 1.4x slower, despite using 1 core vs. Python using 4 cores.

3 Likes

One piece of low-hanging fruit would be, instead of equal?-based sets, using make-mutable-seteqv to get the much cheaper equality check. You could also try replacing in-set with in-mutable-set, which should get you more optimized (though less generic) iteration code.

(Of course, I’d really recommend using in-immutable-set, since the sets are never actually mutated after they are constructed, but that would be a different program, though it could be interesting to see a comparison.)

Using set! on module-level variables in move-all! will trigger various pessimizations.

Potentially a big contributor is that list-ref is O(n): using it inside a loop to treat a list as a random-access data-structure is never good. Why not try something like: (with apologies for any formatting errors from mobile)

(for/fold ([east-set (seteqv)]
           [south-set (seteqv)])
          ([line (in-list lines)]
           [y (in-naturals)]
           #:when #t
           [char (in-string line)]
           [x (in-naturals)])
  (match char
    [#\>
     (values (set-add east-set (+ x (* y +i)))
             south-set)]
    [#\v
     (values east-set
             (set-add south-set (+ x (* y +i))))]
    [_
     (values east-set south-set)]))

-Philip

4 Likes

Hello!

(P.S. Sorry, I received the post by email, and in my email client it was not obvious that you already supplied a link to the code. Please disregard. Happy holidays!)

I ported a Python version of Advent of Code Day 25 to this Racket version , and I was surprised by what appears to be the relatively poor performance of Racket's set implementation.

I'm curious about your solution, can you post it?
I'm curious to see how you used sets.

My solution written in openlisp, a pure interpreter, runs in about 4 seconds, uses an array for the map and conses up an auxiliary list...
(I can post it later today if interested).

Cheers

2 Likes

I made a small comparison
python

s1 = set([1, 2, 3])
s2 = set([2, 4, 6])

measure(lambda: s1.union(s2))
measure(lambda: s1.intersection(s2))

racket:

(define s1 (set 1 2 3))
(define s2 (set 4 5 6))

(measure (lambda ()
           (set-union s1 s2)))
(measure (lambda ()
           (set-intersect s1 s2)))

and here is the output

➜ racket s.rkt
2.582904052734375
3.471212158203125
➜ python3 s.py
1.2732043266296387
0.9760482311248779

I didn't see it's out of factor of 2 to 4? Probably you can provide your implementation and let's find the real reason?

It appears fixing list-ref doesn’t help much. There are only ~100 lines, so list-ref is fast enough for this particular input. But if there are more lines, list-ref could definitely become the bottleneck.

FWIW, in Python, it’s idiomatic to use enumerate over range(len(...)). In Racket, we could similarly use in-indexed. Initializing the sets could then be written as:

  (for* ([(line y) (in-indexed lines)]
         [(ch x) (in-indexed line)])
    (match ch
      [#\> (set-add! east-set (make-rectangular x y))]
      [#\v (set-add! south-set (make-rectangular x y))]
      [_   (void)]))

On my machine, the original program takes 7s to run.

Switching from mutable-set to mutable-seteqvreduces the running time from 7s to 3.5s.

Switching from mutable-seteqv to mutable hasheqv mapping element to true further reduces 0.4s (but you need to change set-add! and set-member? too).

3 Likes

I probably should've mentioned I only used list-ref because of being in "puzzle mode". In other words, I knew the impact was minimal, and it simplified the code somewhat :slight_smile: My primary solution does beat the Python version (barely).

I appreciate the tip about using eqv - my original (median of 3 timings) was 10.3 seconds. Switching to mutable-seteqv reduced to 4.66 seconds, as you mentioned. I originally read your post incorrectly, I thought you got the total time down to 0.4 seconds with hasheqv, so I was pretty excited - then I realized you meant it just got 0.4 seconds faster. Mine was 4.07 seconds with hasheqv.

So, it looks like Racket is about 2.8x slower than Python when using hasheqv. I think this is probably acceptable, but I may investigate further because I think that data structure is pretty important to many programs.

One piece of low-hanging fruit would be, instead of equal? -based sets, using make-mutable-seteqv to get the much cheaper equality check.

Thanks for mentioning this - good tip.

I’d really recommend using in-immutable-set , since the sets are never actually mutated after they are constructed

Good point re: mutation. My original port of the Python version didn't create new "destination" sets, so when I switched to doing that, I left the mutable sets because it was easier, and I doubted that using immutable sets would be any faster.

I was extremely pleased with using Racket for Advent of Code. I think this set/hash performance was the only disappointment in quite a bit of coding!

Also, @dannypsnl , as you mentioned, you made a "small comparison" - I think it's too small to gain any performance insights. I'll try and create two simple versions later, one for Python, one for Racket, that demonstrate the performance difference with sets. I think the Advent of Code puzzle probably has roughly 10,000 elements.

1 Like

A couple more variants using nested hasheq:

(define mutable-set make-hasheq)

(define (in-set s)
  (for*/list ([(x hx) s] [(y hy) hx])
    (make-rectangular x y)))

(define (set-add! s v)
  (hash-set! (hash-ref! s (real-part v) make-hasheq)
             (imag-part v)
             #t))

(define (set-member? s v)
  (hash-has-key? (hash-ref s (real-part v) make-hasheq) 
                 (imag-part v)))

This takes 770ms on my machine, but in-set will create a potentially long list, so it’s not ideal.

We can change in-set to

(define (in-set s)
  (for*/stream ([(x hx) s] [(y hy) hx])
    (make-rectangular x y)))

This takes 1.1s.

An efficient solution would use define-sequence-syntax, which uses for*/stream when it’s not used directly in a for form, and uses :do-in when it appears directly in a for form. This should make it even faster than the for*/list variant.

1 Like

I also wouldn't expect immutable sets to be any faster: if anything, they might be slightly slower. But, just to clarify, in this part of my comment:

I was specifically referring to this code:

        (set! east-set s*)
        (set! south-set s*))

The Racket and Chez Scheme compilers do very little optimization of variables assigned to with set! (preferring to expend their effort elsewhere), which can confuse people. Chez boxes all such variables on the heap (to facilitate a more simple and efficient implementation of closures for the common set!-free case), so even referencing such variables is a little more expensive. Racket quickly abandons inlining optimizations when set! gets involved, enough so that (set! proc proc) is an idiom to prevent proc from being inlined (for testing etc.), even though a compiler that wanted to could eliminate that assignment altogether. I am not certain, but I suspect that set! may also defeat at least some optimizations that depend on inferring the type of a variable, e.g. to eliminate redundant checks.

I think some of the optimizations from define-sequence-syntax do not occur with in-indexed, which is why I typically would write such code as:

  (for ([line (in-list lines)]
        [y (in-naturals)]
        #:when #t
        [ch (in-string line)]
        [x (in-naturals)])
    (match ch
      [#\> (set-add! east-set (make-rectangular x y))]
      [#\v (set-add! south-set (make-rectangular x y))]
      [_   (void)]))

where #:when #t serves to create a nested iteration context.

I think part of the cost of racket/set is that generic dispatch is relatively expensive and is less optimizable than calls to known functions, and there aren't versions of set-add/set-add!/etc. that are specialized to "hash sets", analogous to in-immutable-set/in-mutable-set for in-set. It would be interesting to put a number on the cost of generics, perhaps by copying the "hash set" implementation from racket/set to create a non-generic drop-in replacement.

I incorporated most of the feedback I received here into this version:

Advent of Code Day 25 in Racket w/ sets

It matches the Python version closely, and it's more idiomatic Racket. Timings are as follows (using the MacOS time at the commandline):

Python = 1.5 seconds
Racket = 4.1 seconds (update: 1.38 seconds now)

So Racket is now only 2.7x slower which seems more acceptable.

UPDATE: Wow - I switched from a complex number to a number which allowed me to use hasheq , and that got the time down to 1.38 seconds ! :slight_smile:

1 Like

in-indexed should work efficiently when it appears directly within a for loop. It actually compiles to the same code that you wrote, provided that we use (in-indexed (in-list lines)) and (in-indexed (in-string line)).

Relevant code:

And here’s a benchmarking code that shows it does use do-in:

#lang racket

(define xs (build-list 10000000 values))

(define-syntax-rule (test blob)
  (begin
    (collect-garbage)
    (collect-garbage)
    (collect-garbage)
    (time blob)))

(test (for/sum ([(a i) (in-indexed (in-list xs))]) i))
(test (for/sum ([(a i) (in-indexed xs)]) i))
(test (for/sum ([(a i) (in-indexed (in-list xs))]) i))
(test (for/sum ([(a i) (in-indexed xs)]) i))
(test (for/sum ([(a i) (in-indexed (in-list xs))]) i))
(test (for/sum ([(a i) (in-indexed xs)]) i))

2 Likes

Ugh, thanks: I realized I've fallen into this state of confusion before, because I forget that the standard language saying that an …

application can provide better performance for number iteration when it appears directly in a for clause.

does include recursive expansion of other define-sequence-syntax forms.

(Though, actually, it looks like the docs for in-indexed don't have that note anyway.)