Simplify and lazify broadcast_to_shape#5307
Conversation
You can tell by using an undocumented dask.array function dask.array.utils.meta_from_array. I'm not sure why it's not documented though, maybe it's not supposed to be that public. >>> import dask.array as da
>>> import numpy as np
>>> a = da.array(np.ma.array([1.], mask=[1]))
>>> isinstance(da.utils.meta_from_array(a), np.ma.MaskedArray)
True |
|
Thanks @bouweandela that is golden knowledge. I just opened dask/dask#10278 to get more clarity on the status of that function. |
b4bb546 to
e7e8a41
Compare
|
I think this is now ready for review. I made a helper function to check whether an array is both lazy and masked type, as I figured it could come in useful elsewhere in Iris. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #5307 +/- ##
==========================================
+ Coverage 89.35% 89.37% +0.01%
==========================================
Files 89 89
Lines 22417 22419 +2
Branches 5382 5380 -2
==========================================
+ Hits 20030 20036 +6
+ Misses 1639 1637 -2
+ Partials 748 746 -2
☔ View full report in Codecov by Sentry. |
|
Thanks @ESadek-MO! |
* working for all except masked lazy * use moveaxis * handle lazy masked case * add tests for is_lazy_masked_data * whatsnew * check compute isn't called * update docstring
* Simplify and lazify broadcast_to_shape (#5307) * working for all except masked lazy * use moveaxis * handle lazy masked case * add tests for is_lazy_masked_data * whatsnew * check compute isn't called * update docstring * add whatnew patch footnote --------- Co-authored-by: Ruth Comer <10599679+rcomer@users.noreply.github.com>
🚀 Pull Request
Description
Attempt to resolve #5295, following my own suggestion at #5295 (comment). It turned out I didn't need to use
da.broadcast_todirectly - this is automatic due to dask's support for NEP13/18. We do still need to handle masked arrays separately because they don't (yet) support NEP13/18.This is still not working for the case where the array is both lazy and masked (see failing new test) because the mask gets ignored by
da.broadcast_to. We could apply a similar approach to that for the numpy masked arrays, but you can't tell whether a dask array is masked until you realise it. This seems like it must be a common problem. Anyone know a solution before I attempt to re-invent the wheel?Performance Note
For a simple array I found this new implementation takes nearly twice as long as
main. However we are only talking about ~4 microseconds, and it does not appear to depend on array size, so maybe it's OK?Consult Iris pull request check list