Recently had a thought experiment since I'm making a 2d game, which is heavy on numerical manipulation of 2d points.
[I'm using Chez Scheme for the game, but that's not too relevant to this post].
The question was, there's quite a few ways we can represent 2d points. Which way is generally the "most efficient"?
The contenders are:
Good old cons pairs, with x in the car and y in the cdr
2-long flvectors, with x as the first element and y in the second
Structs/records, with x in the first slot and y in the second
Complex numbers, with x in the real part and y in the imaginary part
I made a microbenchmark:
Expand
#lang racket/base
(require racket/flonum)
(define (deal)
;; assumption: random's performance is mostly uniform
(exact->inexact (random -5000 5000)))
(define test-count 5000000)
(define (pair-add a b)
(cons (fl+ (car a) (car b)) (fl+ (cdr a) (cdr b))))
(define (pair-unit p)
(define x (car p))
(define y (cdr p))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(cons (fl/ x norm) (fl/ y norm)))
(define (pair-scale p scalar)
(cons (fl* (car p) scalar) (fl* (cdr p) scalar)))
(define (test-pair)
(define points
(for/vector ([i (in-range test-count)])
(cons (deal) (deal))))
(define toadd (cons (deal) (deal)))
(define adds (for/vector ([p (in-vector points)])
(pair-add p toadd)))
(define norms (for/vector ([p (in-vector points)])
(pair-unit p)))
(define scalar (deal))
(define scales (for/vector ([p (in-vector points)])
(pair-scale p scalar)))
(list (vector-length adds) (vector-length norms) (vector-length scales)))
(define (flvector-add a b)
(flvector (fl+ (flvector-ref a 0) (flvector-ref b 0))
(fl+ (flvector-ref a 1) (flvector-ref b 1))))
(define (flvector-unit p)
(define x (flvector-ref p 0))
(define y (flvector-ref p 1))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(flvector (fl/ x norm) (fl/ y norm)))
(define (flvector-scale p scalar)
(flvector (fl* (flvector-ref p 0) scalar) (fl* (flvector-ref p 1) scalar)))
(define (test-flvector)
(define points
(for/vector ([i (in-range test-count)])
(flvector (deal) (deal))))
(define toadd (flvector (deal) (deal)))
(define adds (for/vector ([p (in-vector points)])
(flvector-add p toadd)))
(define norms (for/vector ([p (in-vector points)])
(flvector-unit p)))
(define scalar (deal))
(define scales (for/vector ([p (in-vector points)])
(flvector-scale p scalar)))
(list (vector-length adds) (vector-length norms) (vector-length scales)))
(struct vec2 (x y))
(define (struct-add a b)
(vec2 (fl+ (vec2-x a) (vec2-y b))
(fl+ (vec2-y a) (vec2-y b))))
(define (struct-unit p)
(define x (vec2-x p))
(define y (vec2-y p))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(vec2 (fl/ x norm) (fl/ y norm)))
(define (struct-scale p scalar)
(vec2 (fl* (vec2-x p) scalar) (fl* (vec2-y p) scalar)))
(define (test-struct)
(define points
(for/vector ([i (in-range test-count)])
(vec2 (deal) (deal))))
(define toadd (vec2 (deal) (deal)))
(define adds (for/vector ([p (in-vector points)])
(struct-add p toadd)))
(define norms (for/vector ([p (in-vector points)])
(struct-unit p)))
(define scalar (deal))
(define scales (for/vector ([p (in-vector points)])
(struct-scale p scalar)))
(list (vector-length adds) (vector-length norms) (vector-length scales)))
(define (complex-unit p)
(define x (flreal-part p))
(define y (flimag-part p))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(make-flrectangular (fl/ x norm) (fl/ y norm)))
(define (complex-scale p scalar)
(make-flrectangular (fl* (flreal-part p) scalar) (fl* (flimag-part p) scalar)))
(define (test-complex)
(define points
(for/vector ([i (in-range test-count)])
(make-flrectangular (deal) (deal))))
(define toadd (make-flrectangular (deal) (deal)))
(define adds (for/vector ([p (in-vector points)])
(+ p toadd)))
(define norms (for/vector ([p (in-vector points)])
(complex-unit p)))
(define scalar (deal))
(define scales (for/vector ([p (in-vector points)])
(complex-scale p scalar)))
(list (vector-length adds) (vector-length norms) (vector-length scales)))
(define (main)
(display "Pairs:")
(collect-garbage 'major)
(time (displayln (test-pair)))
(display "flvectors:")
(collect-garbage 'major)
(time (displayln (test-flvector)))
(display "structs:")
(collect-garbage 'major)
(time (displayln (test-struct)))
(display "complex:")
(collect-garbage 'major)
(time (displayln (test-complex))))
(module* main #f
(main))
For each approach, I generate 5 million points, then exercise the approach as follows:
Randomly generate another point (once) and add it to the 5 million points
Normalize each point to have magnitude 1
Scale each point by a scalar
Here's the results I saw:
Pairs:
cpu time: 5390 real time: 5396 gc time: 4113
flvectors:
cpu time: 3272 real time: 3275 gc time: 1969
structs:
cpu time: 6421 real time: 6428 gc time: 5085
complex:
cpu time: 3697 real time: 3700 gc time: 2223
Not scientifically rigorous benchmarking, though the general trend appears reproducible across multiple runs. It seems flvectors are the most competitive, even though reading from them may potentially allocate every single time (in this microbenchmark, it's unlikely to allocate since the elements are immediately processed and put into another flvector).
Complexes are surprisingly competitive as well, likely for the same reasons, with cons cells and records coming in quite a bit further behind.
Just putting this out there as a curiosity. The improvement would be 1) use a real benchmarking library to do more iterations and 2) Test the scenario where reads from the structure are not immediately consumed locally and are passed elsewhere, possibly forcing flvectors and complexes to allocate.
I was going to ask if you'd measured memory usage too. My suspicion is that flvectors will use the most memory - the vectors have to include their length after all in addition to their payload, unlike the other approaches.
(Historically I use complex numbers for Cartesian coordinates)
My guess is that with flvectors, you're helping the compiler unbox as much as possible. When you have code like:
The compiler knows that the result of flvector-ref will always be a flonum, so it should be able to avoid (un)boxing any intermediate values. (Indeed, since the values come from and go back into flvectors, they should never be boxed at all.)
In contrast, with:
The contents of the pair can be anything, so some internal flonum? check has to be inserted before unboxing, and there has to be boxing and unboxing to put the flonums in and out of the pair, so that's an extra level of indirection.
i tested that changing racket/base in racket almost multiply the time by 2... i will try to reconsider using racket/base insteatd racket in all my programs when possible
update:
../Scheme-PLUS-for-Racket/main/Scheme-PLUS-for-Racket/bracket-apply.rkt:138:3: lambda: unbound identifier;
also, no #%app syntax transformer is bound in the transformer phase in: lambda
../Scheme-PLUS-for-Racket/main/Scheme-PLUS-for-Racket/assignment.rkt:125:3: lambda: unbound identifier;
also, no #%app syntax transformer is bound in the transformer phase in: lambda
the failure comes fast
as i see that having only one submodule with racket cause the penality on the whole time i will stop modifications here...
I repeated the benchmark in Chez Scheme for good measure. Similar results, except that complexes are relatively slower, and all of the benchmarks are much faster than Racket.
Run with chez --program main.ss
Expand
#!chezscheme
(import (chezscheme))
(define rnd (make-pseudo-random-generator))
(define (deal)
;; assumption: random's performance is mostly uniform
(inexact (- (pseudo-random-generator-next! rnd 10000) 5000)))
(define test-count 5000000)
(define (init-vec proc)
(define ret (make-vector test-count 0))
(do [(i 0 (add1 i))] [(>= i test-count)]
(vector-set! ret i (proc)))
ret)
(define (pair-add a b)
(cons (fl+ (car a) (car b)) (fl+ (cdr a) (cdr b))))
(define (pair-unit p)
(define x (car p))
(define y (cdr p))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(cons (fl/ x norm) (fl/ y norm)))
(define (pair-scale p scalar)
(cons (fl* (car p) scalar) (fl* (cdr p) scalar)))
(define (test-pair)
(define points
(init-vec (lambda () (cons (deal) (deal)))))
(define toadd (cons (deal) (deal)))
(define adds
(vector-map (lambda (p) (pair-add p toadd)) points))
(define norms
(vector-map pair-unit points))
(define scalar (deal))
(define scales
(vector-map (lambda (p) (pair-scale p scalar)) points))
(list (vector-ref adds 0) (vector-ref norms 0) (vector-ref scales 0)))
(define (flvector-add a b)
(flvector (fl+ (flvector-ref a 0) (flvector-ref b 0))
(fl+ (flvector-ref a 1) (flvector-ref b 1))))
(define (flvector-unit p)
(define x (flvector-ref p 0))
(define y (flvector-ref p 1))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(flvector (fl/ x norm) (fl/ y norm)))
(define (flvector-scale p scalar)
(flvector (fl* (flvector-ref p 0) scalar) (fl* (flvector-ref p 1) scalar)))
(define (test-flvector)
(define points
(init-vec (lambda () (flvector (deal) (deal)))))
(define toadd (flvector (deal) (deal)))
(define adds
(vector-map (lambda (p) (flvector-add p toadd)) points))
(define norms
(vector-map flvector-unit points))
(define scalar (deal))
(define scales
(vector-map (lambda (p) (flvector-scale p scalar)) points))
(list (vector-ref adds 0) (vector-ref norms 0) (vector-ref scales 0)))
(define-record-type vec2
(fields x y))
(define (struct-add a b)
(make-vec2 (fl+ (vec2-x a) (vec2-y b))
(fl+ (vec2-y a) (vec2-y b))))
(define (struct-unit p)
(define x (vec2-x p))
(define y (vec2-y p))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(make-vec2 (fl/ x norm) (fl/ y norm)))
(define (struct-scale p scalar)
(make-vec2 (fl* (vec2-x p) scalar) (fl* (vec2-y p) scalar)))
(define (test-struct)
(define points
(init-vec (lambda () (make-vec2 (deal) (deal)))))
(define toadd (make-vec2 (deal) (deal)))
(define adds
(vector-map (lambda (p) (struct-add p toadd)) points))
(define norms
(vector-map struct-unit points))
(define scalar (deal))
(define scales
(vector-map (lambda (p) (struct-scale p scalar)) points))
(list (vector-ref adds 0) (vector-ref norms 0) (vector-ref scales 0)))
(define-record lvec2 ((immutable double x) (immutable double y)))
(define (lstruct-add a b)
(make-lvec2 (fl+ (lvec2-x a) (lvec2-y b))
(fl+ (lvec2-y a) (lvec2-y b))))
(define (lstruct-unit p)
(define x (lvec2-x p))
(define y (lvec2-y p))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(make-lvec2 (fl/ x norm) (fl/ y norm)))
(define (lstruct-scale p scalar)
(make-lvec2 (fl* (lvec2-x p) scalar) (fl* (lvec2-y p) scalar)))
(define (test-lstruct)
(define points
(init-vec (lambda () (make-lvec2 (deal) (deal)))))
(define toadd (make-lvec2 (deal) (deal)))
(define adds
(vector-map (lambda (p) (lstruct-add p toadd)) points))
(define norms
(vector-map lstruct-unit points))
(define scalar (deal))
(define scales
(vector-map (lambda (p) (lstruct-scale p scalar)) points))
(list (vector-ref adds 0) (vector-ref norms 0) (vector-ref scales 0)))
(define (complex-unit p)
(define x (cfl-real-part p))
(define y (cfl-imag-part p))
(define norm (flsqrt (fl+ (fl* x x) (fl* y y))))
(fl-make-rectangular (fl/ x norm) (fl/ y norm)))
(define (complex-scale p scalar)
(fl-make-rectangular (fl* (cfl-real-part p) scalar) (fl* (cfl-imag-part p) scalar)))
(define (test-complex)
(define points
(init-vec (lambda () (fl-make-rectangular (deal) (deal)))))
(define toadd (fl-make-rectangular (deal) (deal)))
(define adds
(vector-map (lambda (p) (+ p toadd)) points))
(define norms
(vector-map complex-unit points))
(define scalar (deal))
(define scales
(vector-map (lambda (p) (complex-scale p scalar)) points))
(list (vector-length adds) (vector-length norms) (vector-length scales)))
(define (main)
(display "Pairs:")
(collect 4 4)
(time (display (test-pair)))
(display "flvectors:")
(collect 4 4)
(time (display (test-flvector)))
(display "structs:")
(collect 4 4)
(time (display (test-struct)))
(display "legacy structs:")
(collect 4 4)
(time (display (test-lstruct)))
(display "complex:")
(collect 4 4)
(time (display (test-complex))))
(main)
Results:
Pairs:((-5184.0 . 2842.0) (-0.999775853931432 . -0.021171723965606796) (8744375.0 . 185175.0))(time (display (test-pair)))
30 collections
0.928861577s elapsed cpu time, including 0.351799512s collecting
0.932841115s elapsed real time, including 0.353317558s collecting
1262891296 bytes allocated, including 982887776 bytes reclaimed
flvectors:(#vfl(-5858.0 604.0) #vfl(-0.7817194788876338 0.6236302240331575) #vfl(-950232.0 758064.0))(time (display (test-flvector)))
40 collections
0.807444927s elapsed cpu time, including 0.213906192s collecting
0.811447900s elapsed real time, including 0.215408842s collecting
1582897888 bytes allocated, including 1382896624 bytes reclaimed
structs:(#[#{vec2 ifg6mo42xdds2u625sqj3o7da-0} -3295.0 -253.0] #[#{vec2 ifg6mo42xdds2u625sqj3o7da-0} -0.9998308911299209 -0.018389919585158274] #[#{vec2 ifg6mo42xdds2u625sqj3o7da-0} -3557652.0 -65436.0])(time (display (test-struct)))
40 collections
1.124347418s elapsed cpu time, including 0.510183295s collecting
1.130014981s elapsed real time, including 0.513124652s collecting
1582913600 bytes allocated, including 1222884928 bytes reclaimed
legacy structs:(#[#{lvec2 ifg6mo42xdds2u625sqj3o7da-1} 4683.0 3760.0] #[#{lvec2 ifg6mo42xdds2u625sqj3o7da-1} 0.7906723880365014 0.612239475039511] #[#{lvec2 ifg6mo42xdds2u625sqj3o7da-1} -2065450.0 -1599335.0])(time (display (test-lstruct)))
40 collections
0.850581782s elapsed cpu time, including 0.310781624s collecting
0.854529447s elapsed real time, including 0.312154663s collecting
1102912016 bytes allocated, including 902884032 bytes reclaimed
complex:(5000000 5000000 5000000)(time (display (test-complex)))
203 collections
2.162937535s elapsed cpu time, including 1.601356059s collecting
2.172449054s elapsed real time, including 1.608902293s collecting
1867875456 bytes allocated, including 880127936 bytes reclaimed
I'm not sure yet all of the reasons why your Chez code is faster, but one big way to speed up the Racket code is to pass #:length test-count to all of the uses of for/vector so that they don't reallocate the vector repeatedly.
It's surprising that Racket is slower than Chez Scheme. The main issue seems to be that schemify in Racket adds some lets in functions like pair-unit to force left-to-right evaluation, and cp0 is not able to eliminate them at the Chez Scheme level, which leads to extra boxing of intermediate flonum values.
I don't immediately know how to fix that, so it will require more investigation.
The two things that I now think make the difference:
(random -5000 5000) has to find the current random-number generator through a parameter on every call; changing it to (random -5000 5000 rnd) with (define rnd (make-pseudo-random-generator)) saves most of the non-GC time
Racket's GC heuristic causes it to send more time collecting in exchange for lower peak memory use: 1.5 GB peak for Racket versus 2.3 GB peak for Chez Scheme.
i changed from racket to racket/base in all the Scheme+ modules (33 modules) and i got a 15% speed up , with the vector+ test:
Pairs:(5000000 5000000 5000000)
cpu time: 15231 real time: 15861 gc time: 6993
flvectors:(5000000 5000000 5000000)
cpu time: 12117 real time: 13728 gc time: 4894
structs:(5000000 5000000 5000000)
cpu time: 18013 real time: 19114 gc time: 9114
complex:(5000000 5000000 5000000)
cpu time: 12242 real time: 12782 gc time: 5120
vectors:(5000000 5000000 5000000)
cpu time: 16761 real time: 17469 gc time: 9229
vectors+:(5000000 5000000 5000000)
cpu time: 22887 real time: 24188 gc time: 9661
cpu time: 22887 with racket/base versus cpu time: 26124 with racket, in the future i will start all my code with racket/base instead of racket.
note: globally my MacOS M1 system seems on the knees ,on all the data types , this system has ran 3x times slower than what i read in the post, this has nothing to see with Racket. I should reboot , the system seems overloaded with no reason: