Skip to content
This repository was archived by the owner on Mar 6, 2023. It is now read-only.

Handle large area processing#49

Merged
sophieherrmann merged 19 commits into
masterfrom
update-load-helper
Oct 6, 2021
Merged

Handle large area processing#49
sophieherrmann merged 19 commits into
masterfrom
update-load-helper

Conversation

@ValentinaHutter
Copy link
Copy Markdown
Collaborator

odc_load_helper now only changes nodata values to np.nan

@ValentinaHutter ValentinaHutter self-assigned this Sep 8, 2021
@ValentinaHutter ValentinaHutter changed the title Changed odc_load_helper to improve CPU and MEM USAGE WIP: Handle large area processing Sep 10, 2021
Copy link
Copy Markdown
Contributor

@sophieherrmann sophieherrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some nice improvements!

Did you already check all other functions if attributes are properly passed?

Comment thread src/openeo_processes/comparison.py Outdated
Comment thread src/openeo_processes/comparison.py
Comment thread src/openeo_processes/cubes.py
Comment thread src/openeo_processes/cubes.py
Comment thread src/openeo_processes/cubes.py Outdated
Comment thread src/openeo_processes/math.py Outdated
Comment thread src/openeo_processes/math.py
Comment thread src/openeo_processes/utils.py Outdated
@sophieherrmann
Copy link
Copy Markdown
Contributor

I just found this http://xarray.pydata.org/en/stable/generated/xarray.save_mfdataset.html

could improve speed of writing netcdf files

@ValentinaHutter
Copy link
Copy Markdown
Collaborator Author

I just found this http://xarray.pydata.org/en/stable/generated/xarray.save_mfdataset.html

could improve speed of writing netcdf files

Thanks, I just inserted that :)

Copy link
Copy Markdown
Member

@clausmichele clausmichele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you splitting the request into two separate ones? I mean, this should be more flexible and split it into parts which have a size you know it's fine, otherwise this could work for size x = 100 for example, where you split into x1 = 50 and x2 = 50, but what if x = 1000? (numbers are just examples)

@ValentinaHutter
Copy link
Copy Markdown
Collaborator Author

Thanks for reviewing it, that was just a test to try out splitting it in two parts, but I will remove the change, as it is not improving the process.

@ValentinaHutter
Copy link
Copy Markdown
Collaborator Author

To handle large areas and apply processes to large areas, I had a look at all the processes. The processes that still need an update to work for large areas are sort, order. The issue is discribed here: #52

@sophieherrmann sophieherrmann changed the title WIP: Handle large area processing Handle large area processing Oct 6, 2021
@sophieherrmann
Copy link
Copy Markdown
Contributor

As this PR already provides a number of new features / bug fixes which are urgently needed, I'll merge it now.
The main new feature is, nearly all jobs are now running completely on dask - no direct access to the array values. This allows to also run large area jobs. Exceptions are described by @ValentinaHutter #49 (comment) and will be solved in a separate PR.

A connected issue is that the fit_curve process is computationally quite expensive (especially for large areas). This issue is documented here #53 and will also be addressed in a separate PR.

@sophieherrmann sophieherrmann merged commit 9727cc0 into master Oct 6, 2021
@ValentinaHutter ValentinaHutter deleted the update-load-helper branch May 2, 2022 07:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants