Skip to content

Added more metadata to Dask Dataframe creation#19

Merged
alxmrs merged 3 commits intomainfrom
fix-fast-zarr-open
Feb 19, 2024
Merged

Added more metadata to Dask Dataframe creation#19
alxmrs merged 3 commits intomainfrom
fix-fast-zarr-open

Conversation

@alxmrs
Copy link
Owner

@alxmrs alxmrs commented Feb 19, 2024

Fixed #17. It looks like it is, in fact, lazily opened. len(era5_df) requires a full scan. I opened #18 to address the length issue.

It should return right away since we want to convert chunks lazily. From the profile traces, it looks like `to_dd` converts the chunks right away.
I found that either `from_delayed` or `from_map` took forever to get the length of era5. This looks like a more fundamental issue with Dask Dataframes. Instead, I checked how cast it was to get columns.
@alxmrs alxmrs merged commit 7da2184 into main Feb 19, 2024
@alxmrs alxmrs deleted the fix-fast-zarr-open branch February 19, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Opening large Zarr datasets should be lazy (and fast)

1 participant