Using futures (via the wonderfully simple
for/async), I'm seeing a long sequence of numbers like these:
cpu: 256599 real: 4585 gc: 3065
I'm quite happy with the speedup I already get, but it appears that the GC is taking a lot of time (I suppose the 3065 ms of GC are 'real' ms), so I'm hoping I could shave some more seconds off it.
But I don't yet see where in my code I could reduce the GC time.
The code isn't opensource yet so I can't share it unfortunately, but it uses the following operations:
- untyped flonum and fixnum operations (and not using math/flonum), including
- a few struct get/set, but that's not in the inner loops
- a number of
defines for intermediate flonum values
- no closure, no named let loops, no lambda
- no generic arithmetic operations
- Racket CS 220.127.116.11
I've also read the docs on memory management. There are also some valuable information about compiler hints for fixnums and flonums.
Are there additional general advice about how to reduce the GC time?
lets make any difference?
Could Typed Racket help with this?
In the following, is there a way to give a compiler hint about the fact that
idx is a fixnum, and that
grad is a
flonum, to avoid boxing?
(for ([idx (in-list a-list-of-fixnums)]
[grad (in-list a-list-of-flonums)])
If not, would turning the lists in
flvector and using
in-flvector do the trick?
Fixnums don't allocate, so there's no need to worry about them taking space in the heap. For flonums, you'll really need to use more specialized data structures to avoid boxing.
So is there any GC-difference between
(for ([idx (in-list a-fixnum-list)]) ...) and
(for ([idx (in-fxvector a-fxvector)]) ...), for example with respect to cons cells?
The list will feature N separate allocations, and N-1 pointers, so GC traversal will take somewhat longer. It will take about 2N space. The fxvector will take about N space, and have no interior pointers, so it will take only constant time to traverse in the GC. A plain vector with fixnums in it will also take about N space, but the whole vector will be traversed by the GC.
Turning a few lists into *vectors helped, along with using unsafe-fl operations, but not as much as I hoped:
cpu: 207335 real: 3864 gc: 2600
I'll take it anyway.
If anyone has more advice, I'm happy to try a few things.