It looks like the non-shared versions involve the creation of a large byte string via make-bytes
. The large-string allocation triggers a check whether the current thread has any custodian limits, and that involves current-thread
, which blocks futures but not parallel threads.
I found that explanation by using the future visualizer. By default, the future visualizer only shows that current-thread
was called, but not why. I set the new PLT_FUTURE_TRACE_DEPTH
environment variable (which I forgot to document, but will!) to 10 to get more context information, and that showed make-bytes
as the issue.
An emerging theme here and in Why is scheduling this future from a thread slower than scheduling it from the main thread? - #3 by jrkalyan is that futures have various issues that could be resolved with more work, but parallel threads may avoid some issues in the first place.