Skip to content

Conversation

@Illviljan
Copy link
Contributor

@Illviljan Illviljan commented May 15, 2023

  • ds.chunks loops all the variables, do it once.
  • Faster to create a meta dataframe once than letting dask guess 2000 times.

@Illviljan Illviljan mentioned this pull request May 15, 2023
6 tasks
@Illviljan Illviljan added the run-benchmark Run the ASV benchmark workflow label May 16, 2023

dask_array = var.set_dims(ordered_dims).chunk(self.chunks).data
series = dd.from_array(dask_array.reshape(-1), columns=[name])
if has_many_dims:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really that impactful, can we optimize set_dims instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll think I'll save the has_many_dims paths for a future PR. I think it might introduce bugs if we don't consistently chunk with the same shape.

@Illviljan
Copy link
Contributor Author

Illviljan commented May 21, 2023

        before           after         ratio
     [05c7888d]       [d135ab97]
-      2.47±0.02s          806±6ms     0.33  pandas.ToDataFrameDask.time_to_dataframe

@Illviljan Illviljan added the plan to merge Final call for comments label May 24, 2023
@Illviljan Illviljan merged commit 609a901 into pydata:main May 25, 2023
@dcherian dcherian mentioned this pull request Jun 15, 2023
19 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants