A micro-benchmark

soegaard · October 9, 2023, 9:30pm

I haven't studied macros yet.
What makes these things macros?

In "vector-wraps.rkt. you will see:

(define-syntax-rule (define-vector-wraps <pattern> ...) <template>)

This defines define-vector-wraps as a macro.

In this case the macro is defined using a "rule".
When define-vector-wraps is used, the expander matches the macro call to the patterns.
It then constructs the code in the template by substituting syntax from the patterns.

In this case the pattern is rather long, since it consists of definitions of quite a few functions and macros. In the case of f64vector we get definitions of in-fXvector, unsafe-f64vector-copy!, for/f64vector, for*/f64vector, f64vector-copy.

shawnw · October 9, 2023, 9:51pm

As already mentioned, in my extra-srfi-libs package.

raco pkg install extra-srfi-libs

or the equivalent in the DrRacket package manager window to install.

lewisl · October 9, 2023, 10:00pm

should that have been (define-vector-wraps "f64vector")?

soegaard · October 9, 2023, 10:07pm

Yes! The "f64lvector" ought to be "f64vector". I think the string is used in error messages.

countvajhula · October 11, 2023, 8:04pm

(I haven't followed this thread in detail so apologies if this has been covered)

If you can implement gen:sequence for the data structure, you can use the Generic Collections library's in and other generic sequence utilities provided there. For built-in Racket data structures that aren't already supported, it may be necessary to submit a pull request to the library to add a default implementation.

countvajhula · October 11, 2023, 8:09pm

This looks great. I submitted a request to support in-VM benchmarking to hyperfine some time back, but they decided it was out of scope for the project. It'll be nice to have a visualization tool like this in Racket as @soegaard said.

rogert · October 12, 2023, 4:26pm

Prompted by @sschwarzer's questions:

...I produced some variants in ChezScheme (Racket's backend). My idiomatic baseline is flvector-sum:

github.com

rogerturner/A-micro-benchmark/blob/main/flvector-bench.ss#L20-35


      
          (define (flvector-sum flvec)
            (let loop ([len (flvector-length flvec)] [i 0] [sum 0.0])
              (if (fx<? i len)
                (loop len (fx1+ i) (fl+ sum (flvector-ref flvec i)))
                (fl+ sum))))
          
          (define (flvector-bench make update sum len flval)
            (let ([flvec (make len flval)])
              (update flvec 0 flval)
              (sum flvec)))
          
          (define (repeat make update sum description)
            (display description)
            (display (flvector-bench make update sum 100000 0.5)) (newline)
            (time
              (do ([i 0 (fx+ i 1)])
                  ((fx= i 1000))
                (flvector-bench make update sum 100000 0.5))))
          
          (define (run)
            (repeat make-flvector  flvector-set!  flvector-sum  "baseline ")

The fl.vector library implements simple lazy allocation and manual loop unrolling, with fl.vector-sum and fl*vector-sum variants. Sample timings:

*Version*         *Timings in seconds for 1000 executions*
baseline           .19
unroll make        .14
unroll sum         .12
unroll both        .07
fl. +unroll        .07
fl. "lazy"         .000007

(all produce the correct result )

Zeb · October 13, 2023, 3:28am

Why is the lazy result so much faster than anything else we've seen so far?

rogert · October 13, 2023, 9:59am

The conventional Scheme to allocate a vector is (make-vector length fill);
the fl.vector library just saves length and fill, so fl.vector-sum can produce the sum by multiplication:

github.com

rogerturner/A-micro-benchmark/blob/main/fl.vector.ss#L24


      
          #| a Fl.vector is either
             a Pair: (Fixnum . Flonum)  [length . fill value]
                  or (Flvector . Unused)
             or a Flvector  [fl.vector- procedures can be applied to flvector arguments] |#
             
          (define (make-fl.vector len flval)       ;; Fixnum Flonum -> Fl.vector
            (cons len flval))
            
          (define (fl.vector-sum flvec)            ;; Fl.vector -> Flonum
            (if (pair? flvec)
              (let ([cf (car flvec)])
                (if (fixnum? cf)
                  (* cf (cdr flvec))
                  (fl*vector-sum cf)))
              (fl*vector-sum flvec)))
          
          (define (fl.vector-set! flvec i flval)   ;; Fl.vector Fixnum Flonum ->
            (if (pair? flvec)
              (let ([cf (car flvec)])
                (when (fixnum? cf)
                  (set-car! flvec (make-fl*vector cf (cdr flvec))))

Allocation of the 800Kb vector is in fl.vector-set!; some sample stats in flvector-bench:

github.com

rogerturner/A-micro-benchmark/blob/main/flvector-bench.ss#L104


      
          unroll both 50000.0
          (time (do ((...)) ...))
              91 collections
              0.075367584s elapsed cpu time, including 0.002303000s collecting
              0.075367000s elapsed real time, including 0.002339000s collecting
              800091696 bytes allocated, including 801707520 bytes reclaimed
          fl. +unroll 50000.0
          (time (do ((...)) ...))
              91 collections
              0.075227834s elapsed cpu time, including 0.002293000s collecting
              0.075224000s elapsed real time, including 0.002343000s collecting
              800107696 bytes allocated, including 801723424 bytes reclaimed
          fl. "lazy" 50000.0
          (time (do ((...)) ...))
              no collections
              0.000007041s elapsed cpu time
              0.000007000s elapsed real time
              32000 bytes allocated
          |#
          
          #| *Notices*

...show 800Mb total memory allocation (as expected), but 32Kb for the "lazy" run.

fl.vector contains versions of the basic vector procedures (-ref, -length, etc), which
should produce correct results for either flvector or fl.vector arguments.

It may be worth noting that the ChezScheme compiler (and hence Racket) can inline
procedures from a user library, so uses are comparable in execution to standard library procedures.

Topic		Replies	Views
How to make small executables? Questions & Answers	22	1527	September 28, 2023
Is there plans for improving the performance of the generated code from the racket compiler? Internals question	16	1011	May 28, 2024
For/sum in typed/racket Questions & Answers typed-racket	10	113	September 17, 2024
Language implementation design decisions and trade offs Questions & Answers question	11	509	May 14, 2022
Why does Typed Racket run faster when `#:no-optimize`? Questions & Answers question , performance	3	379	November 3, 2023

A micro-benchmark

Related topics