Annotate `extract_serialize` (for Cythonization) #4283

jakirkham · 2020-11-27T05:31:34Z

This adds some unintrusive type annotations to extract_serialize so that when Cythonizing this function, we can get optimal performance out of it. If run only as pure Python, the performance remains the same.

Pure Python:

In [1]: from distributed.protocol.serialize import extract_serialize

In [2]: data = 1_000_000 * b"abc"
   ...: msg = 11 * [10 * [2 * [5 * [data]]]]

In [3]: %timeit extract_serialize(msg)
789 µs ± 6.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Cython:

In [1]: from distributed.protocol.serialize import extract_serialize

In [2]: data = 1_000_000 * b"abc"
   ...: msg = 11 * [10 * [2 * [5 * [data]]]]

In [3]: %timeit extract_serialize(msg)
481 µs ± 5.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Provides some unintrusive type annotations of variables in `extract_serialize`. Cython is able to parse these type annotations and optimize the code. Though as the type annotations are already supported in Python normally, this remains valid Python code that is otherwise unaffected.

jakirkham · 2020-11-30T19:08:48Z

cc @mrocklin @quasiben

mrocklin · 2020-11-30T19:14:28Z

As a PR this seems fine to me.

At a larger scale it would obviously be grand if we could avoid having pyx files. My guess is that this will be hard to achieve while still getting optimal performance. I'd be very happy to learn otherwise though.

jakirkham · 2020-11-30T19:45:56Z

Right. I want to see how far we get with this strategy as...

It allows easier collaboration in the near term
Assuming this style is agreeable we can merge things in as needed
Things still work if someone doesn't have Cython (though they don't get the perf bump then)
Installation remains simple for the pure Python case

At least when I looked at the code this generated in C, it was pretty much as good as I would get even if I wrote it in Cython. My guess is pure Python mode in Cython has grown a lot since when Antoine and yourself last tried. So it seems worth exploring again for that reason alone (in addition to the benefits listed above).

If we still find we need .pyx files for a few cases, we can certainly explore that after we have squeezed out as much performance as we can get through this process. My guess is that will only affect a very small amount of code (if any), which will give us a lot of options in terms of how we handle that code (like it could become a small optional dependency for example).

jrbourbeau

Let's give this a shot, thanks @jakirkham

jakirkham · 2020-11-30T20:06:44Z

Thanks all! 😄

mrocklin · 2020-11-30T20:31:01Z

This approach excites me. I would be very happy if I could continue developing the scheduler in pure Python.

jakirkham added 4 commits November 26, 2020 18:24

Require path argument for _extract_serialize

ee85e4a

Handle bytes and bytearray separately

3410640

Check bytes and bytearray size internally

de9b102

jrbourbeau approved these changes Nov 30, 2020

View reviewed changes

jrbourbeau merged commit e9cd97f into dask:master Nov 30, 2020

jakirkham deleted the annotate_extract_serialize branch November 30, 2020 20:05

This was referenced Dec 1, 2020

line_profiler results on 4 workers (w/o stealing) over 20 iterations quasiben/dask-scheduler-performance#20

Open

Annotate ClientState for Cythonization #4290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Annotate `extract_serialize` (for Cythonization) #4283

Annotate `extract_serialize` (for Cythonization) #4283

Uh oh!

jakirkham commented Nov 27, 2020

Uh oh!

jakirkham commented Nov 30, 2020

Uh oh!

mrocklin commented Nov 30, 2020

Uh oh!

jakirkham commented Nov 30, 2020 •

edited

Loading

Uh oh!

jrbourbeau left a comment

Uh oh!

jakirkham commented Nov 30, 2020

Uh oh!

mrocklin commented Nov 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Annotate extract_serialize (for Cythonization) #4283

Annotate extract_serialize (for Cythonization) #4283

Uh oh!

Conversation

jakirkham commented Nov 27, 2020

Uh oh!

jakirkham commented Nov 30, 2020

Uh oh!

mrocklin commented Nov 30, 2020

Uh oh!

jakirkham commented Nov 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Nov 30, 2020

Uh oh!

mrocklin commented Nov 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Annotate `extract_serialize` (for Cythonization) #4283

Annotate `extract_serialize` (for Cythonization) #4283

jakirkham commented Nov 30, 2020 •

edited

Loading