Futures and garbage collection

Using futures (via the wonderfully simple for/async), I'm seeing a long sequence of numbers like these:

cpu: 256599 real: 4585 gc: 3065

I'm quite happy with the speedup I already get, but it appears that the GC is taking a lot of time (I suppose the 3065 ms of GC are 'real' ms), so I'm hoping I could shave some more seconds off it.

But I don't yet see where in my code I could reduce the GC time.

The code isn't opensource yet so I can't share it unfortunately, but it uses the following operations:

  • untyped flonum and fixnum operations (and not using math/flonum), including flvector-set!, in-flvector and fxquotient
  • for/fold, for/async, define, unless, when, in-range, in-list
  • a few struct get/set, but that's not in the inner loops
  • a number of defines for intermediate flonum values
  • no closure, no named let loops, no lambda
  • no generic arithmetic operations
  • Racket CS

I've also read the docs on memory management. There are also some valuable information about compiler hints for fixnums and flonums.

Are there additional general advice about how to reduce the GC time?
Would turning defines to lets make any difference?
Could Typed Racket help with this?

In the following, is there a way to give a compiler hint about the fact that idx is a fixnum, and that grad is a flonum, to avoid boxing?

(for ([idx (in-list a-list-of-fixnums)]
      [grad (in-list a-list-of-flonums)])

If not, would turning the lists in fxvector and flvector and using in-fxvector and in-flvector do the trick?

If you can put those in flvector[1] and fxvector[2], you can use in-flvector[3] and in-fxvector[4].

[1] https://docs.racket-lang.org/reference/flonums.html#(part._flvectors)
[2] https://docs.racket-lang.org/reference/fixnums.html#(part._fxvectors)
[3] https://docs.racket-lang.org/reference/flonums.html#(def._((lib._racket%2Fflonum..rkt)._in-flvector))
[4] https://docs.racket-lang.org/reference/fixnums.html#(def._((lib._racket%2Ffixnum..rkt)._in-fxvector))


Fixnums don't allocate, so there's no need to worry about them taking space in the heap. For flonums, you'll really need to use more specialized data structures to avoid boxing.


So is there any GC-difference between (for ([idx (in-list a-fixnum-list)]) ...) and (for ([idx (in-fxvector a-fxvector)]) ...), for example with respect to cons cells?

The list will feature N separate allocations, and N-1 pointers, so GC traversal will take somewhat longer. It will take about 2N space. The fxvector will take about N space, and have no interior pointers, so it will take only constant time to traverse in the GC. A plain vector with fixnums in it will also take about N space, but the whole vector will be traversed by the GC.


Turning a few lists into *vectors helped, along with using unsafe-fl operations, but not as much as I hoped:

cpu: 207335 real: 3864 gc: 2600

I'll take it anyway.

If anyone has more advice, I'm happy to try a few things.