Skip to content

Conversation

@Carreau
Copy link
Contributor

@Carreau Carreau commented Jul 28, 2020

In some case deserialization is not thread safe and it would be good to
only deserialize if no work is being done at the same time.

This is trying to achieve this in part by delaying until unpicking
until the data is actually needed to be sent to the Executor. In the
case of a single thread TPE that would mean non-thread safe code can be
unpickled. Maybe we even want to move unpicking into the Executor, but
then we need the executor to support this; and we might want to make
sure we don't unpickle many times.

This of course needs a gross hack by looking at frames, but at least I
hope this will get some conversation started.

This also screws up the computation of per-type bandwidth; unless we
delay the BW calculation per type.

There is also some issues as clients will also call indirectly get_data_from_worker, and for example await client.submit(lambda x: x + 1, 10) will return a Serialized object instead of expected result...

Carreau added 2 commits July 28, 2020 12:56
In some case deserialization is not thread safe and it would be good to
only deserialize if no work is being done at the same time.

This is trying to achieve this in part by delaying until unpicking
until the data is actually needed to be sent to the Executor. In the
case of a single thread TPE that would mean non-thread safe code can be
unpickled. Maybe we even want to move unpicking into the Executor, but
then we need the executor to support this; and we might want to make
sure we don't unpickle many times.

This of course needs a gross hack by looking at frames, but at least I
hope this will get some conversation started.

This also screws up the computation of per-type bandwidth; unless we
delay the BW calculation per type.
@Carreau
Copy link
Contributor Author

Carreau commented Jul 28, 2020

My other idea was to use a custom serializer/deserializer that would be lazy, the problems being that:

  1. This still needs hooks in the right place to deserialize just in time.
  2. This appear to make other things beyond just the deps/data use the deserializer, but it's unclear as this was mostly failing in a number of places.

I'm also wondering if keeping the serialized data in worker, if it's needed somewhere else as then worked don't need to re-serialise it to be sent...

@mrocklin
Copy link
Member

Yeah, this seems like a big enough hack that we would probably reject it without a lot of evidence that it was necessary in a variety of situations. In general I would recommend that any group facing things like this look into improving their serialization, or using locks to protect finicky resources. You could also look into using a synchronous executor if you wanted everything to be in one thread.

The closest common situation I can recall that sounds like this one is dealing with TensorFlow graphs, which don't like being deserialized in one thread and then used in another.

Any fix like this would have, I think, enough unexpected results that I'd be very hesitant to consider it.

@Carreau
Copy link
Contributor Author

Carreau commented Jul 28, 2020

Yeah, this seems like a big enough hack that we would probably reject it without a lot of evidence that it was necessary in a variety of situations.

Do you mean the inspect stack, or keeping obj as serialized until necessary ?

I can likely find another way to pass the information that data should be lazyly deserialized; but mostly wanted to know if it's worth looking into.

Also yes, the group is looking into improving thread safety and (de)serialisation of objects, and already on a single thread TPE to minimize issues, and looking into locks.

@mrocklin
Copy link
Member

In general lazy deserialization. Or if we do want to do this that's fine, but we need to survey other use cases first, see how important it is, figure out what a convention might be that makes sense, etc.. This feels like tacking on a feature for a single user's use case fairly deep into the guts of Dask.

My guess is that there is some more general solution somewhere that lets this group get what they want, and doesn't add an odd corner case into the core.

@mrocklin
Copy link
Member

Alternatively, if this is the right solution, then I think that we need to build a case for it. The best case I can think of that is similar to this is TensorFlow/Keras, as mentioned above.

@quasiben
Copy link
Member

I thought we could already control serialization/deserialization with a separate thread with the config option: distributed.comm.offload . Does that not work ?

@Carreau
Copy link
Contributor Author

Carreau commented Jul 28, 2020

Does that not work ?

I don't believe it does, the core of the problem being that some types objects don't like having work being done at the same time and other instances being deserialized.

deactivating offload already force the deserialisation to be done in the main loop, but then the ThreadPool executor is still running; And that's not good. My goal here is to reduce the chance of deserialisation happening while the TPE is doing work.

Either the objects need to have a both when in use that block deserialisation; OR I need dask/distributed to not deserialize while computing.

Right now this is hard as unpacking messages unpacks data as well.

@jakirkham
Copy link
Member

There is a shared interest in delaying deserialization I think. For example here we discussed delaying deserialization to reduce memory usage ( rapidsai/dask-cuda#342 (comment) ). I think how that would look may differ a bit from how it is currently implement here, but the overall objective seems agreeable.

cc @madsbk @pentschev

@madsbk
Copy link
Contributor

madsbk commented Jul 30, 2020

There is a shared interest in delaying deserialization I think. For example here we discussed delaying deserialization to reduce memory usage ( rapidsai/dask-cuda#342 (comment) ). I think how that would look may differ a bit from how it is currently implement here, but the overall objective seems agreeable.

cc @madsbk @pentschev

Yes, in rapidsai/dask-cuda#342 (comment) the plan is to delay the deserialization until the data is accessed by the task. It even enables tasks to coordinate deserializations with other work explicitly.

Currently I am on vacation but I will start working on this when I get back next week.

@Carreau
Copy link
Contributor Author

Carreau commented Aug 3, 2020

The problem with not having dask being aware of this is then user get exposed to those proxy objects you suggest in rapidsai/dask-cuda#342 (comment) ;

I think it would be good to have a solution that works regardless of the types; or the serializer/deserializer involved.

I completely agree that the implementation here is horrible with stack inspection.

Please ping me on any work you are doing on this on the dask-cuda, I'm happy to help make it more generic.

@jakirkham
Copy link
Member

I wonder if we can push deserialization into apply_function and apply_function_actor, which are where the actual function winds up being called for computation on the Executor. This should delay deserialization until the data is going to be used.

@Carreau
Copy link
Contributor Author

Carreau commented Aug 4, 2020

I wonder if we can push deserialization into apply_function and apply_function_actor,

Does this have the potential to stall some task/duplicate work if the same deserialized data is required by 2 tasks.

@mrocklin
Copy link
Member

mrocklin commented Aug 4, 2020

I think that deserializing data when we get it is more sensible in the common case situation. I want to make sure that we're not mucking about with sensible behaviors because of a few odd cases.

@madsbk
Copy link
Contributor

madsbk commented Aug 4, 2020

The problem with not having dask being aware of this is then user get exposed to those proxy objects you suggest in rapidsai/dask-cuda#342 (comment) ;

Agree, this is are disadvantages with the approach. We can mitigate this by make the proxy object as transparent as possible but you are right, it shouldn't be on by default.
However, if the application is really sensitive to serialization and deserialization you properly want to coordinate the work explicitly anyways?

@jrbourbeau
Copy link
Member

#4307 should allow users to not deserialize and run tasks at the same time

Base automatically changed from master to main March 8, 2021 19:04
@Carreau
Copy link
Contributor Author

Carreau commented Oct 17, 2023

Closing as stale.

@Carreau Carreau closed this Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants