Creating singletons tasks in a job manager

tl;dr: I'd like a way to ensure that a particular instance of a struct is eq? across threads so it can be used for concurrency management. What would the best way be?

Medium version:

If a struct goes into a parameter then it will be copied from thread to thread instead of shared, which is a problem if you want to use it to ensure singleton state. My first thought on how to resolve the issue would be this:

#lang racket

(require racket/splicing)

(struct job-manager () #:transparent)
(splicing-let ([thing (job-manager)])
  (define (get-foo) thing))

I believe this would ensure that all threads are using the same (i.e. eq?) version of foo. Is there a better way?

Long version:

The majordomo2 package is a task manager written by yours truly. I want to be able to offer a guarantee that, if requested, only one instance of a task can be running at a time; an example of why you might want this would be if you were using it to run a backup on your files -- you don't want to accidentally have two separate threads attempting to back up your files and potentially interfering with one another.

The way the module currently works is that you create a majordomo instance and then tell it to run tasks. For example:

(define jarvis (start-majordomo))
(define result-channel
    (add-task jarvis build-email-greetings "hi there" '(alice bob charlie))
(define result  (sync result-channel))

Each task results in two threads, a worker that does the task and a manager that keeps an eye on the worker and restarts it if appropriate and if requested. The threads are 'out in the wild' with no cross communication and no way to address them aside from waiting for a value to come back on the result channel. As a result, if you did this:

(add-task jarvis run-backup root-path)
(add-task jarvis run-backup root-path)

...then you would end up with two instances of your backup running simultaneously.

My first thought on how to prevent this was to have add-task accept a #:singleton id argument. This would cause the majordomo instance (jarvis, in the above example) to keep an internal hash where it could track the fact that this function is already running and refuse to start another task that uses the same id. There would need to be some concurrency management, such as using a semaphore to add tasks so we don't have a race condition.

In order for this to work we would need to be sure that everyone who wanted to run a singleton task was running them within the same majordomo instance, so there would need to be a singleton for that. I started to put it in a parameter and then realized that wasn't viable. Am I overthinking this? Is the little hack that I did at top the best solution?

Quick question: when you say "threads", you mean threads as created by the thread function of racket/thread, and not system threads, as created by places, right?

If I'm right about this, then I believe there should be no problem with simply sharing state between threads.

Here's an example of what I'm thinking of:

#lang racket

(define sema (make-semaphore 0))

(thread
 (λ ()
   (sleep 5)
   (semaphore-post sema)))

(thread
 (λ ()
   (semaphore-wait sema)
   (printf "got the lock!\n")))
1 Like

I like @jbclements answer, just doing the syncronization with lexical scoping in the implementation of the task that needs it.

Overall I don't like singletons, I think most of the time they are trying to automate something that doesn't need to be automated. If somebody creates two job managers and runs similar jobs with them and those interfere I see that as user-error, simply don't use two instances of job managers to start backups.

I am more likely to use a library that has simple predictable behavior, than one that tries to predict every way it will ever be used and specially adapt to that. I much prefer a dumb library that I can use intelligently, because I can easily understand what it does, than an overly "intelligent" library, that assumes/requires that I am "dumb" and makes me feel that way, because it changes internally so much that I can't predict how it reacts when I "turn some knobs".

The thing with things that are too "managed" is: where does it end?
If you build in some kind of global task unification, sure that works... ...until, now somebody starts 2 racket processes (or maybe uses and requires your module in a new empty namespace, creating a separate module instance) and thus creates two unsyncronized instances again.
Basically what I am saying is, unless you are implementing some kind of secure programming language that does clever things behind the scenes to give you additional invariants, I would go the route of simply stating to the user how to use the library.

I also think something to "unify" things may make sense as a separate library, maybe your library could have a parameter that normally does nothing, but using this parameter you can "plugin" that "unification" functionality implemented by another library.

The good thing about doing it that way, would be that people who don't need it, can use it without and those who do, can customize it to what they actually need. For example one person may use that parameter to "unify" it using a semaphore, somebody else may use a sort-of-lock implemented via state in their database (maybe they have an application across multiple machines using the same database), another one uses a lock file in a shared filesystem.

Your library shouldn't have to anticipate every need. I know it isn't necessarily easy to create flexible APIs that are usable like this, but it is always a joy as a user when it is possible to combine different libraries in a simple way. Maybe parameters aren't the right way to do that, that depends on your library.
Another thing to consider is that maybe you don't need this at all, if instead you could make it so that it is easy to wrap you api, adding some kind of pre-check that adds the "unification" in the wrapper.

That was more text, than I expected to write, please don't feel discouraged to implement your library in any way you want to, this is my preferred way and you may have different needs and preferences.
I hope it is good/helpful food for thought.

1 Like

Yes, I meant threads in the Racket sense. Your example seems solid -- I can simply provide the sema if I want other files / modules to share it.

I prefer a library that works in a simple and predictable way by default but gives me some well-documented keyword arguments that will allow me to automate common behavior. There's only so many times I want to write (sort (sync (run-job make-emails) string<?) when I could simply write (run-job make-emails #:sort string<? #:sync? #t). If for no other reason, it's got much better end weight.

This is a perfect example of "Hey, save me from writing boilerplate and make it easier for me to not screw up by simply letting me pass '#:singleton? #t' and then you sort it out."