Fixed MemoryStore.list_dir#2117
Conversation
Ensures that nested children are listed properly.
src/zarr/testing/store.py
Outdated
| await store.set("foo/c/d/2", Buffer.from_bytes(b"\x01")) | ||
| await store.set("foo/c/d/3", Buffer.from_bytes(b"\x01")) | ||
|
|
||
| keys_expected = ["foo"] |
There was a problem hiding this comment.
Ah, this is failing on the S3 tests.
keys_expected = ["foo"]
keys_observed = [k async for k in store.list_dir("")]
> assert set(keys_observed) == set(keys_expected), keys_observed
E AssertionError: ['test/foo']It's including self.path (the bucket name?) in listing. Based on my reading of the tests, that should not be included, but I'm not 100% sure. 90940a0 has that fix
dstansby
left a comment
There was a problem hiding this comment.
I'm struggling to get my head around the new logic. Reading #2116, is the goal here to just not return any duplicate values? If so, I wonder if it would be easier to keep track of the values that have been returned, and only yield a value if it hasn't been returned yet (otherwise, check the next value). This way it would also truly generate the values one at a time, instead of the new approach that computes all the keys before yielding them.
|
Thanks for the review!
Effectively yes. Duplicate keys were a consequence of the previous implementation not handling arrays nested in child groups properly.
3c845d9 has an approach that's a middle ground between my previous fix and your suggestion. I think it's a lot simpler. It does still generate all the keys before yield them, but I'm not too worried about that. This is an in-memory store, so things should be pretty small and fast, and this matches the implementation of the non-prefix case. |
dstansby
left a comment
There was a problem hiding this comment.
👍 looks good to me. I think if you drop the last bit of the assert statement, pytest is clever and will report the difference between the two sets in it's output when the test fails.
Co-authored-by: David Stansby <dstansby@gmail.com>
src/zarr/testing/store.py
Outdated
| await store.set("foo/c/1", Buffer.from_bytes(b"\x01")) | ||
| await store.set("foo/c/d/1", Buffer.from_bytes(b"\x01")) | ||
| await store.set("foo/c/d/2", Buffer.from_bytes(b"\x01")) | ||
| await store.set("foo/c/d/3", Buffer.from_bytes(b"\x01")) |
There was a problem hiding this comment.
I don't think its super material to the test but I believe the intent of this test was to have foo be an array c be the subdirectory containing chunks. Perhaps it would be better to change this to something more like:
| await store.set("foo/c/1", Buffer.from_bytes(b"\x01")) | |
| await store.set("foo/c/d/1", Buffer.from_bytes(b"\x01")) | |
| await store.set("foo/c/d/2", Buffer.from_bytes(b"\x01")) | |
| await store.set("foo/c/d/3", Buffer.from_bytes(b"\x01")) | |
| await store.set("foo/bar/c/1", Buffer.from_bytes(b"\x01")) | |
| await store.set("foo/bar/c/2", Buffer.from_bytes(b"\x01")) | |
| await store.set("foo/spam/c/1", Buffer.from_bytes(b"\x01")) | |
| await store.set("foo/spam/c/2", Buffer.from_bytes(b"\x01")) |
There was a problem hiding this comment.
Right... IIUC, if you have a key at foo/zarr.json with a node_type="array", that implies you won't have any keys at foo/bar/zarr.json (a group or a node), since arrays can't have children?
I tried to update this in 7414b10, to make sure that the test store we construct is at least compatible with how a Zarr hierarchy would be laid out.
/foo # group
/c # array
/group-0
/group-1
/array-0
/array-1
/array-2
Ensures that nested children are listed properly.
Closes #2116