I was looking at the performance impacts of parameters for use in a language project and ran into some odd performance characteristics: accessing struct constructors through a parameterize-introduced binding appears to be 10-20% faster than using them directly, while struct accesses are about 10-20% slower when the accessor procedure is referenced through a parameter.
So, about 4.5s to construct a complete binary tree of depth 25, and about .4s to compute its sum. If, instead, we access the struct constructors and accessors through a parameter binding:
4.1s to construct the tree, .6s to traverse and compute the sum.
A longer sum computation time makes sense, as there's an additional layer of indirection when accessing the parameter procedure, but... why is construction faster?
To thicken the plot, it seems that just wrapping the code block in parameterizewith some dummy parameter is sufficient to get performance improvements on construction:
#lang racket
(define p (make-parameter #f))
(struct tree (l r v))
(define (rand-tree depth)
(if (<= depth 1)
(tree #f #f (random 1000))
(tree (rand-tree (sub1 depth))
(rand-tree (sub1 depth))
(random 1000))))
(define (sum todo acc)
(match todo
['() acc]
[(cons h t)
(if h
(sum (cons (tree-l h) (cons (tree-r h) t))
(+ acc (tree-v h)))
(sum t acc))]))
(define (print-time) (println (current-inexact-monotonic-milliseconds)))
(parameterize ([p #f])
(print-time)
(define t1 (rand-tree 25))
(print-time)
(println (sum `(,t1) 0))
(print-time))
This has timings of:
354.8463
4192.6203
16761484464
4550.8924
So, about 3.9s to construct. This consistently outperforms the parameter-less version.
Maybe these are threading/cache related behaviors?
I see some memoization code in the fast example that's not in the slow example. Is that intentional? It's probably the reason for the performance difference.
Here’s my third version, which has no parameterize, but with even greater speed up!
#lang racket
(struct tree (l r v))
(define gen (current-pseudo-random-generator))
(define (rand-tree depth)
(if (<= depth 1)
(tree #f #f (random 1000 gen))
(tree (rand-tree (sub1 depth))
(rand-tree (sub1 depth))
(random 1000 gen))))
(time (rand-tree 25))
;; cpu time: 2230 real time: 2242 gc time: 1781
The issue is that every time that you call random without the second argument, Racket has to look up for the generator parameter. My understanding is that in the first version, Racket has to traverse multiple frames to find the parameter, while in the second version the usage of parameterize copies the parameter to a frame that can be accessed faster. But if you access the parameter directly, as done in the third version, then there’s no need to look up anything (in the loop).
Based on a quick look at the implementation of parameters in racket/src/cs/rumble/parameter.ss, I don't think the speed difference is due to the nearest parameterization containing an entry for the current-pseudo-random-generator parameter. The parameterization appears to start out as an empty hasheq and only contains the parameters added by parameterize.
My best guess is that the difference is due to the continuation-mark lookup step that fetches the parameterization.
Update: Here's a version that lets you insert extra frames to push the parameterize continuation-mark frame as far away as you like:
which is practically the same. (also, holy gc workload) Thanks for the heads-up about time.
The memoization in the second example was intentional, otherwise running the parameter procedure seems to eat quite a few cycles. This is probably why @sorawee's third example is so much faster, assuming random
does the dereference each loop. Still not clear on why @sorawee's second example is faster, though @sorawee and @ryanc's hypotheses are very much appreciated.