Does custodian-shutdown-all block until threads are suspended/killed?

I have some code that is using custodian-shutdown-all in order to cancel an operation running in a previously spawned thread. I had made the assumption that custodian-shutdown-all would not return until the thread(s) it is managing has been suspended/killed. Is that the case? I see some evidence that my assumption is not true.

1 Like

See https://docs.racket-lang.org/reference/subprocess.html#%28def.%28%28quote.~23~25kernel%29._current-subprocess-custodian-mode%29%29 and https://docs.racket-lang.org/reference/subprocess.html#%28def.%28%28quote.~23~25kernel%29._subprocess-group-enabled%29%29

For a sample use, see https://github.com/mfelleisen/SwDev/blob/master/Testing/harness.rkt

Yes, that should be the case.

@EmEf points to information about processes outside the Racket process, where the issue is more complicated. But the question should be straightforward for Racket threads.

An example would be welcome if things don't seem to be working right.

I tried to manually force synchronization with thread-wait, which still didn't resolve my problem, so perhaps there is something else happening here.

The application is a browser with a text% widget and editor-canvas% that is updated in a new thread for each request. The race condition occurs when I attempt to cancel a previous request and start a new one. I see two different error conditions. In the first I get this contract violation:

cancelling request: (#<thread:...ses/src/widgets.rkt:643:25> #<input-port:gopher.club>)
goto-gopher: suika.erkin.party, /links, 1, 498
gopher selector: /links
; sequence-contract-violation: negative: method insert cannot be called, except in states (unlocked), args "       "
; Context (plain; to see better errortrace context, re-run with C-u prefix):
;   /home/jonathan/git/molasses/src/widgets.rkt:64:0 insert-directory-line
;   /home/jonathan/git/molasses/src/widgets.rkt:87:0 goto-gopher
;   /home/jonathan/git/molasses/src/widgets.rkt:643:25

In the second case I don't get an error but the text% widget contains text from the cancelled request preceding the new request's contents. The new request should have cleared the previous contents before making its updates.

The full code is here, but the most relevant portion is this piece from my browser-text% widget:

    (define/public (cancel-request)
        (when (custodian? thread-custodian)
            (eprintf "cancelling request: ~a~n" (custodian-managed-list thread-custodian (current-custodian)))
            (custodian-shutdown-all thread-custodian)
            ;; without this the editor gets stuck in no refresh mode
            (when (in-edit-sequence?)
                (end-edit-sequence))
            (set! thread-custodian #f)
            ;; update status message
            (send (get-canvas) update-status "Ready")))
      
    (define/private (load-page req [initial-selection-pos #f] #:back? [back? #f])
      (define (make-history-updater old-url old-position)
        (if back?
            (lambda () (pop-history))
            (lambda ()
              ;; add previous page to history
              (when old-url
                (if old-position
                    ;; also save the position of the selection that we are following so we can return to it
                    (push-history (struct-copy browser-url old-url [selection-pos old-position]))
                    (push-history old-url))))))
      
      ;; this will shutdown the previous custodian on every page load.
      ;; seems wasteful not to re-use the custodian if we aren't actually interrupting
      ;; the previous thread's work.
      (cancel-request)
      (set! thread-custodian (make-custodian))

      (define update-history (make-history-updater current-url
                                                   (if selection
                                                       (get-snip-position selection)
                                                       #f)))
      (send (get-canvas) update-status "Loading...")
      
      (parameterize ([current-custodian thread-custodian])
        (cond
          [(equal? (request-protocol req) 'gopher)
           (update-history)
           (set! current-url (browser-url req initial-selection-pos))
           (send (get-canvas) update-address req)
           (thread (thunk
                    (goto-gopher req this initial-selection-pos)
                    (send (get-canvas) update-status "Ready")))]
          [(equal? (request-protocol req) 'gemini)
           (thread (thunk
                    (define terminal-request (goto-gemini req this))
                    (unless (void? terminal-request)
                      (update-history)
                      (set! current-url (browser-url terminal-request #f))
                      (send (get-canvas) update-address terminal-request)
                      (send (get-canvas) update-status "Ready"))))]
          [else
           ;; TODO display error to user?
           (eprintf "Invalid request protocol!~n")])))

In this case I see the problem occasionally when cancelling a gopher request. I'm not that familiar with custodians and threads in Racket, so perhaps I`m just doing something wrong. My intent is to terminate the previous request thread before starting a new one so that updates to the browser text and canvas widgets don't conflict. I think I have all my text% updates enclosed in edit sequences, which I think is something the GUI system requires.

Are text% updates actually executed in a separate GUI thread that isn't being cancelled by my custodian?

Modifying a displayed text% in a thread other than the eventspce's handler thread is tricky. From the way your code uses edit sequences, I bet you're familiar with the information in 5 Editors. Still, even if it turns out that the text% object handles concurrent updates following a particular pattern, I don't think the implementation accounts for terminating a thread; "kill safety" is a higher bar than "thread safety". Meanwhile, the eventspace's handler thread is the part that will refresh the screen, and that's separate from the thread your code uses to update a text%.

Overall, I'm not sure what's going wrong, but killing a thread that modifies text% is probably not supported. The problem much more likely to be in that area instead of custodian-shutdown-all somehow not terminating a thread.

1 Like

Thank you for that explanation @mflatt. It sounds like I will need to separate the network request portion from the UI update, so that the spawned thread only handles the network request. This shouldn't be too complicated, but if anyone has any other thoughts I'd love to hear them.

This is my first real GUI app in Racket so there's lots to learn!

1 Like

My solution for now is to set a flag in my request thread after the network request has completed but before text% updates are made. When cancelling a request, if the flag is set I call thread-wait to wait for the text% updates to complete rather than calling custodian-shutdown-all.

This appears to be working well so far. At some point I may want to revisit this, but for now this avoids the problem of killing a thread while it is updating an editor<%>.

1 Like