Improving API: part 1: functionality for input pyfive.high_level.Dataset#241
Improving API: part 1: functionality for input pyfive.high_level.Dataset#241valeriupredoi merged 30 commits intopyfivefrom
Conversation
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
|
Before I review the code itself, for me the big issue is the high level API. In #241 we suggest that we could just do things like Given the library code itself has to explicitly implement That means, I'd prefer to see us explicitly allowing |
yes, that is indeed on the TODO list - I was planning on tackling that in part 2 PR ie review and merge this first, then part 2 explicit stats, and then part 3 clean up (get rid of all those versions etc) 🍺 |
activestorage/active.py
Outdated
| """ Get the netcdf file and it's b-tree""" | ||
| """ Get the netcdf file and its b-tree""" | ||
| ncvar = self.ncvar | ||
| # in all cases we need an open netcdf file to get at attributes |
There was a problem hiding this comment.
I am not sure it's an open netcdf file any more is it?
activestorage/active.py
Outdated
| @@ -310,10 +307,6 @@ def _get_selection(self, *args): | |||
| # hopefully fix pyfive to get a dtype directly | |||
There was a problem hiding this comment.
I am not sure the docstring for this method, and this comment are quite right any more.
activestorage/active.py
Outdated
| @@ -362,13 +355,6 @@ def _from_storage(self, ds, indexer, chunks, out_shape, out_dtype, compressor, f | |||
| # Because we do this, we need to read the dataset b-tree now, not as we go, so | |||
| # it is already in cache. If we remove the thread pool from here, we probably | |||
| # wouldn't need to do it before the first one. | |||
There was a problem hiding this comment.
This comment is irrelevant now?
There was a problem hiding this comment.
This comment is irrelevant now?
That's my understanding
| chunk = chunk.reshape(-1, order='A') | ||
| chunk = chunk.reshape(shape, order=order) | ||
| else: | ||
| class storeinfo: pass |
There was a problem hiding this comment.
Why don't we import the class so it's more obvious what it's for?
There was a problem hiding this comment.
I tried this just now, there is unfortunately flakiness involved:
E AssertionError: assert 265.90347 == array(258.62814, dtype=float32)
E + where array(258.62814, dtype=float32) = <built-in function array>(258.62814, dtype='float32')
E + where <built-in function array> = np.array
tests/unit/test_active.py:104: AssertionError
----------------------------------------------------------------- Captured stdout call ------------------------------------------------------------------
Treating input <HDF5 dataset "TREFHT": shape (12, 4, 8), type "float32"> as variable object.
Reducing chunk of object <class 'pyfive.high_level.Dataset'>
Reducing chunk of object <class 'pyfive.high_level.Dataset'>XXX StI byte offset 12394 StI size 128
XXX
StI byte offset 12522actual offset StI size12394 actual size 128
128
actual offset 12522 actual size 128
with implement:
from pyfive.h5d import StoreInfo as storeinfo
storeinfo.byte_offset = offset
storeinfo.size = size
print("XXX", "StI byte offset", storeinfo.byte_offset, "StI size", storeinfo.size)
print("actual offset", offset, "actual size", size)There was a problem hiding this comment.
StoreInfo does not seem to be a safe class the way it is implemented at the moment, since it gets frozen in at times, and doesn't update on the go
bnlawrence
left a comment
There was a problem hiding this comment.
Most of my comments are about documentation. This looks good to go as part of a sequence of events.
davidhassell
left a comment
There was a problem hiding this comment.
Thnaks V - Just some very minor suggestions
activestorage/active.py
Outdated
| @@ -362,13 +355,6 @@ def _from_storage(self, ds, indexer, chunks, out_shape, out_dtype, compressor, f | |||
| # Because we do this, we need to read the dataset b-tree now, not as we go, so | |||
| # it is already in cache. If we remove the thread pool from here, we probably | |||
| # wouldn't need to do it before the first one. | |||
There was a problem hiding this comment.
This comment is irrelevant now?
That's my understanding
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
Co-authored-by: David Hassell <davidhassell@users.noreply.github.com>
| av._method = "min" | ||
| assert av.method([3,444]) == 3 | ||
| av_slice_min = av[3:5] | ||
| assert av_slice_min == np.array(249.6583, dtype="float32") |
There was a problem hiding this comment.
@bnlawrence @davidhassell this test works a treat, many thanks for reminding me about my own work that I forgot 🤣 I am still keeping the actual real-world test with Reductionist for now though, just so we are fully covered, until the end (ie when we done with work o Pyfive)
Description
Contribution towards #231
This allows for a type
pyfive.high_level.Datasetto be Active-ed. eg (the test I included):Before you get started
Checklist
[ ] Any changed dependencies have been added or removed correctly (if need be)