Why is scheduling this future from a thread slower than scheduling it from the main thread?

In the case where you create the future and immediately touch it within the same thread, it turns out to be unlikely that the future gets picked up by the future scheduler before touch, and so the thread that calls touch just runs the future directly (i.e., in the green/coroutine thread).

When you create a future and then create a thread to touch it, then there's enough delay between the creation of the future and the time the new thread reaches touch that the future is likely to have been picked up by the future scheduler to run in parallel. So, the thread waits for a result, instead of running the future's computation itself.

It certainly seems like there's room for improvement in the scheduler heuristics here!

Meanwhile, I can't help thinking that you just want parallel threads. If you'd like to try out that route, see Help test via snapshots: parallel threads.