Skip to content

Conversation

@gjoseph92
Copy link
Collaborator

While looking into #5083 I happened to notice that the dashboard felt very sluggish. I profiled with py-spy and discovered that the scheduler was spending 20% of runtime calculaing sum(map(len, group._dependencies)) < 5! A quick print statement showed some task groups depended on 25,728 other groups (each of size 1). We can easily skip those.

I originally had this conditional in the PR but we removed it for simplicity: #4967 (comment); turns out it was relevant after all!

Screen Shot 2021-07-23 at 3 27 20 PM

FYI @d-v-b this solves your performance regression from #5083.

cc @mrocklin @jrbourbeau

  • Closes #xxxx
  • Tests added / passed
  • Passes black distributed / flake8 distributed / isort distributed

While looking into dask#5083 I happened to notice that the dashboard felt very sluggish. I profiled with py-spy and discovered that the scheduler was spending 20% of runtime calculaing `sum(map(len, group._dependencies)) < 5`! A quick print statement showed some task groups depended on 25,728 other groups (each of size 1). We can easily skip those.

I originally had this conditional in dask#4967 but we removed it for simplicity: dask#4967 (comment); turns out it was relevant after all!
@mrocklin mrocklin merged commit 9c30f38 into dask:main Jul 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants