It is really difficult to say something general, so if you can describe more about the data and the dependencies between different units of work, it is more likely that somebody can give you hints on potential tweaks.
Do you have a worker per core or a worker per task?
The former is what I would expect, the latter could waste a lot of resources if you have say 20000 tasks but only 8 / 16 / 64 cores (whatever it is).
If you can tell us some numbers (e.g. core counts and amount of tasks, how much is shared between tasks) and how the tasks could be described with a graph that would make the whole discussion more useful.
If you have high core counts you should take a look at loci as a replacement to places that works better in that scenario.
I also made a topic about something related that might be of interest:
My biggest question however is: does every task need to access one single data structure?
If so, then that seems like a problem that isn't very parallelize-able.
In a better case szenario you want to be able to cluster tasks into groups that are tightly coupled and then figure out how those get one part done and then send their results to groups that are later down in the "pipeline".
Also this topic kind of screams for a shout-out of Amdahl's law - Wikipedia