Transitive thread suspension for pausing computations made up of nested threads

I'm interested in running a pause-able computation as a separate thread that may itself spawn its own threads. I'd like to be able to pause the parent thread, and have it transitively pause the child threads it has spawned (as well as any of their descendants). What approaches do people use to achieve this?

I'm aware that I can set up transitive resumption by calling (thread-resume child parent) after spawning a child. This is close to the kind of relationship I'd like to set up, but for suspension as well.

I am also aware that I could spawn all the child threads using thread/suspend-to-kill with a particular custodian, and then shutdown that custodian to suspend, and (thread-resume parent new-custodian) to resume everything. However, using thread/suspend-to-kill will prevent me from being able to permanently stop a thread, interfering with some uses of thread-wait.

Is there already a convenient way to do this that I'm overlooking?

Otherwise, what I had in mind was to set up a power grid-like directed graph. Threads are running if and only if they are receiving power.

Power would transmit from an input node to an output node whenever the input node is both receiving power itself and is not currently switched off.

The nodes would be threads, junctions (that can be switched off) used for composing groups, and root power sources (that can also be switched off). Newly-created threads and junctions can be attached to one or more input nodes. To suspend a network of threads, we switch off all of its root power sources. For the pause-able computation situation I described, I'd have one root power source I could turn off at any time.

One nice thing about this approach is that it can remember any internal child thread suspensions that were made, so that when power is turned off and then restored again, those suspended threads remain suspended. That is, to perform an internal suspension, we switch off one or more internal junctions, whose off state is remembered regardless of how power flows. In contrast, normal transitive thread-resume will undo those internal suspensions, causing everything to resume, which may not be desired.

Anyway, I'm interested in hearing about related thread organizational patterns and ideas. Is my desire to have pause-able computations dubious? Should I be thinking about this problem in a different way? Please let me know.

1 Like

My own experience with something is limited to one similar case. I built the “stop doing stuff” and “resume doing stuff” threads via an explicit pattern. implemented with channels. I somehow sensed it gave me the right kind of explicit control. (The pattern just listens to events on channels every so often and otherwise performs work as needed. When the ’stop’ event comes in, it switches to just listening on this channel.)

2 Likes

One use case I'm interested in is to be able to run semi-untrusted code that I can observe, suspend, and resume without requiring cooperation. Particularly, I'd like to be able to run multiple untrusted parents that may communicate with one another, share kill-safe abstractions, and have it be possible to suspend one parent without harming any kill-safe abstractions it has created that are still in use by the other parents.

It seems possible to set up supervising/monitoring threads for each of the untrusted threads, and have the monitors cooperate in the way you're suggesting, sending them commands to issue to the untrusted threads. Though it seems difficult to give a monitor enough visibility into whether its untrusted thread has multiple benefactors (as in (thread-resume thd benefactor)), so that they only act on a suspend command when all the benefactors are also suspended.

One option would be to replace all of the important thread operators and constructors (and possibly custodians as well) with a set that are monitor-friendly, and expose only these to the untrusted code. It seems like many of these facilities would need to be replaced.

— If untrusted code is a zo file (dusty deck, as we used to say), this idea won’t work (unless you patch at a very low level).
— If untrusted code is source and/or written in a prescribed language in the Racket world, “linking” them to new implementations of thread-* should work.

But, how would this arrangement deal with process or system that is, separate OS processes spawned from the untrusted code? Spawn a process, set up a pipe, get the work done there. This scenario isn’t covered.

That's a good point, this won't work for arbitrary untrusted code.

The scenario I'm interested in is one where either I know the untrusted code lives entirely in Racket, or I have intentionally handed the code a capability for limited interaction with external processes. More specifically, I provide every capability that the code can access.

I will probably go with the approach of linking with replacements, as in your second bullet point.

Gregr wrote "More specifically, I provide every capability that the code can access.”

If this is about capability programming, take a look at Shill:

[

osdi14-mdkc
PDF Document · 241 KB

](https://users.cs.northwestern.edu/~chrdimo/pubs/osdi14-mdkc.pdf)

Yes, the code doesn’t look parenthetical but don’t let this distract you. It’s all implemented in Racket.

1 Like

Shill looks like an excellent resource. Thank you for the recommendation!

We can use engines to get an approximation of what I would like. By using engines, we are able to turn groups of concurrent computations on or off independently of a thread's resumed/suspsended status.

Here is an interactive example of the idea. Note the tricky details around the call to engine-run.

(Edited for kill-safety.)

#lang racket/base
(require racket/engine racket/match)

(struct thread-manager (t.handler ch.command ch.off ch.on))

(define (make-thread-manager)
  (let ((ch.command (make-channel)) (ch.off (make-channel)) (ch.on (make-channel)))
    (define (command cmd) (if cmd (on) (off)))
    (define (on)
      (displayln "manager is on")
      (sync (handle-evt (channel-put-evt ch.on (void))
                        (lambda (_) (on)))
            (handle-evt ch.command command)))
    (define (off)
      (displayln "manager is off")
      (sync (handle-evt (channel-put-evt ch.off (void))
                        (lambda (_) (off)))
            (handle-evt ch.command command)))
    (thread-manager (thread/suspend-to-kill on) ch.command ch.off ch.on)))

(define (thread-manager-off tm)
  (thread-resume (thread-manager-t.handler tm) (current-thread))
  (channel-put (thread-manager-ch.command tm) #f))

(define (thread-manager-on tm)
  (thread-resume (thread-manager-t.handler tm) (current-thread))
  (channel-put (thread-manager-ch.command tm) #t))

(define (make-thread tm proc)
  (match-define (thread-manager t.handler _ ch.off ch.on) tm)
  (let ((eng (engine (lambda (_) (proc)))))
    (thread
     (lambda ()
       (let ((self (current-thread)))
         (thread-resume t.handler self)
         (let loop ()
           (sync ch.on)
           ;; If we do not wrap the engine-run call with call-in-nested-thread
           ;; (or some other mediating thread), and the outer thread is
           ;; suspended, the call to engine-run will itself be affected by the
           ;; suspend, and will no longer be watching for the stop event.  This
           ;; means the engine will keep running when it should be suspended.
           (unless (call-in-nested-thread
                    (lambda ()
                      (engine-run (choice-evt (thread-suspend-evt self) ch.off)
                                  eng)))
             (loop))))))))

(define ((work name latency))
  (let loop ()
    (displayln name)
    (sleep latency)
    (loop)))

(define tm (make-thread-manager))
(define t1 (make-thread tm (work 't1 1)))
(define t2 (make-thread tm (work 't2 2)))
(define t3 (make-thread tm (work 't3 3)))

(void
 (thread
  (lambda ()
    (let loop ()
      (displayln "will suspend t3 soon")
      (sleep 10)
      (thread-suspend t3)
      (displayln "suspended t3")
      (displayln "will resume t3 soon")
      (sleep 10)
      (displayln "resuming t3")
      (thread-resume t3)
      (loop)))))

(displayln "listening for on/off commands")
(let loop ()
  (match (read)
    ((? eof-object?) (void))
    ('off
     (displayln "turning manager off")
     (thread-manager-off tm)
     (loop))
    ('on
     (displayln "turning manager on")
     (thread-manager-on tm)
     (loop))))