Delay deserialization of Data in workers until actual usage. #3998

Carreau · 2020-07-28T20:29:37Z

In some case deserialization is not thread safe and it would be good to
only deserialize if no work is being done at the same time.

This is trying to achieve this in part by delaying until unpicking
until the data is actually needed to be sent to the Executor. In the
case of a single thread TPE that would mean non-thread safe code can be
unpickled. Maybe we even want to move unpicking into the Executor, but
then we need the executor to support this; and we might want to make
sure we don't unpickle many times.

This of course needs a gross hack by looking at frames, but at least I
hope this will get some conversation started.

This also screws up the computation of per-type bandwidth; unless we
delay the BW calculation per type.

There is also some issues as clients will also call indirectly get_data_from_worker, and for example await client.submit(lambda x: x + 1, 10) will return a Serialized object instead of expected result...

In some case deserialization is not thread safe and it would be good to only deserialize if no work is being done at the same time. This is trying to achieve this in part by delaying until unpicking until the data is actually needed to be sent to the Executor. In the case of a single thread TPE that would mean non-thread safe code can be unpickled. Maybe we even want to move unpicking into the Executor, but then we need the executor to support this; and we might want to make sure we don't unpickle many times. This of course needs a gross hack by looking at frames, but at least I hope this will get some conversation started. This also screws up the computation of per-type bandwidth; unless we delay the BW calculation per type.

Carreau · 2020-07-28T21:01:40Z

My other idea was to use a custom serializer/deserializer that would be lazy, the problems being that:

This still needs hooks in the right place to deserialize just in time.
This appear to make other things beyond just the deps/data use the deserializer, but it's unclear as this was mostly failing in a number of places.

I'm also wondering if keeping the serialized data in worker, if it's needed somewhere else as then worked don't need to re-serialise it to be sent...

mrocklin · 2020-07-28T21:06:14Z

Yeah, this seems like a big enough hack that we would probably reject it without a lot of evidence that it was necessary in a variety of situations. In general I would recommend that any group facing things like this look into improving their serialization, or using locks to protect finicky resources. You could also look into using a synchronous executor if you wanted everything to be in one thread.

The closest common situation I can recall that sounds like this one is dealing with TensorFlow graphs, which don't like being deserialized in one thread and then used in another.

Any fix like this would have, I think, enough unexpected results that I'd be very hesitant to consider it.

Carreau · 2020-07-28T21:12:34Z

Yeah, this seems like a big enough hack that we would probably reject it without a lot of evidence that it was necessary in a variety of situations.

Do you mean the inspect stack, or keeping obj as serialized until necessary ?

I can likely find another way to pass the information that data should be lazyly deserialized; but mostly wanted to know if it's worth looking into.

Also yes, the group is looking into improving thread safety and (de)serialisation of objects, and already on a single thread TPE to minimize issues, and looking into locks.

mrocklin · 2020-07-28T21:17:51Z

In general lazy deserialization. Or if we do want to do this that's fine, but we need to survey other use cases first, see how important it is, figure out what a convention might be that makes sense, etc.. This feels like tacking on a feature for a single user's use case fairly deep into the guts of Dask.

My guess is that there is some more general solution somewhere that lets this group get what they want, and doesn't add an odd corner case into the core.

mrocklin · 2020-07-28T21:20:12Z

Alternatively, if this is the right solution, then I think that we need to build a case for it. The best case I can think of that is similar to this is TensorFlow/Keras, as mentioned above.

quasiben · 2020-07-28T21:31:29Z

I thought we could already control serialization/deserialization with a separate thread with the config option: distributed.comm.offload . Does that not work ?

Carreau · 2020-07-28T21:42:11Z

Does that not work ?

I don't believe it does, the core of the problem being that some types objects don't like having work being done at the same time and other instances being deserialized.

deactivating offload already force the deserialisation to be done in the main loop, but then the ThreadPool executor is still running; And that's not good. My goal here is to reduce the chance of deserialisation happening while the TPE is doing work.

Either the objects need to have a both when in use that block deserialisation; OR I need dask/distributed to not deserialize while computing.

Right now this is hard as unpacking messages unpacks data as well.

jakirkham · 2020-07-29T18:54:57Z

There is a shared interest in delaying deserialization I think. For example here we discussed delaying deserialization to reduce memory usage ( rapidsai/dask-cuda#342 (comment) ). I think how that would look may differ a bit from how it is currently implement here, but the overall objective seems agreeable.

cc @madsbk @pentschev

madsbk · 2020-07-30T08:00:39Z

There is a shared interest in delaying deserialization I think. For example here we discussed delaying deserialization to reduce memory usage ( rapidsai/dask-cuda#342 (comment) ). I think how that would look may differ a bit from how it is currently implement here, but the overall objective seems agreeable.

cc @madsbk @pentschev

Yes, in rapidsai/dask-cuda#342 (comment) the plan is to delay the deserialization until the data is accessed by the task. It even enables tasks to coordinate deserializations with other work explicitly.

Currently I am on vacation but I will start working on this when I get back next week.

Carreau · 2020-08-03T18:29:19Z

The problem with not having dask being aware of this is then user get exposed to those proxy objects you suggest in rapidsai/dask-cuda#342 (comment) ;

I think it would be good to have a solution that works regardless of the types; or the serializer/deserializer involved.

I completely agree that the implementation here is horrible with stack inspection.

Please ping me on any work you are doing on this on the dask-cuda, I'm happy to help make it more generic.

jakirkham · 2020-08-03T21:11:45Z

I wonder if we can push deserialization into apply_function and apply_function_actor, which are where the actual function winds up being called for computation on the Executor. This should delay deserialization until the data is going to be used.

Carreau · 2020-08-04T01:33:57Z

I wonder if we can push deserialization into apply_function and apply_function_actor,

Does this have the potential to stall some task/duplicate work if the same deserialized data is required by 2 tasks.

mrocklin · 2020-08-04T02:07:09Z

I think that deserializing data when we get it is more sensible in the common case situation. I want to make sure that we're not mucking about with sensible behaviors because of a few odd cases.

madsbk · 2020-08-04T07:25:49Z

The problem with not having dask being aware of this is then user get exposed to those proxy objects you suggest in rapidsai/dask-cuda#342 (comment) ;

Agree, this is are disadvantages with the approach. We can mitigate this by make the proxy object as transparent as possible but you are right, it shouldn't be on by default.
However, if the application is really sensitive to serialization and deserialization you properly want to coordinate the work explicitly anyways?

jrbourbeau · 2020-12-09T15:21:53Z

#4307 should allow users to not deserialize and run tasks at the same time

Carreau · 2023-10-17T14:26:14Z

Closing as stale.

Carreau added 2 commits July 28, 2020 12:56

Deserialized Data in clients

6f3de37

jakirkham mentioned this pull request Jul 29, 2020

[FEA] Allow communicating spilled data rapidsai/dask-cuda#342

Closed

madsbk mentioned this pull request Mar 8, 2021

[Question] A new approach to memory spilling #4568

Open

Base automatically changed from master to main March 8, 2021 19:04

Carreau closed this Oct 17, 2023

Uh oh!

Delay deserialization of Data in workers until actual usage. #3998

Delay deserialization of Data in workers until actual usage. #3998

Uh oh!

Conversation

Carreau commented Jul 28, 2020

Uh oh!

Carreau commented Jul 28, 2020

Uh oh!

mrocklin commented Jul 28, 2020

Uh oh!

Carreau commented Jul 28, 2020

Uh oh!

mrocklin commented Jul 28, 2020

Uh oh!

mrocklin commented Jul 28, 2020

Uh oh!

quasiben commented Jul 28, 2020

Uh oh!

Carreau commented Jul 28, 2020

Uh oh!

jakirkham commented Jul 29, 2020

Uh oh!

madsbk commented Jul 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Carreau commented Aug 3, 2020

Uh oh!

jakirkham commented Aug 3, 2020

Uh oh!

Carreau commented Aug 4, 2020

Uh oh!

mrocklin commented Aug 4, 2020

Uh oh!

madsbk commented Aug 4, 2020

Uh oh!

jrbourbeau commented Dec 9, 2020

Uh oh!

Carreau commented Oct 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

madsbk commented Jul 30, 2020 •

edited

Loading