test: standardize IO tests and improve assertions#4923
test: standardize IO tests and improve assertions#4923wjones127 merged 6 commits intolance-format:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4923 +/- ##
==========================================
+ Coverage 81.64% 81.67% +0.03%
==========================================
Files 333 334 +1
Lines 131594 132492 +898
Branches 131594 132492 +898
==========================================
+ Hits 107444 108219 +775
- Misses 20550 20637 +87
- Partials 3600 3636 +36
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
westonpace
left a comment
There was a problem hiding this comment.
This is definitely an improvement, thanks!
I think the only thing I still find annoying is the need to setup the object_store_wrapper. Thoughts for future PR: I think (could be wrong) that we wrap all object stores today with the tracing object store anyways. I wonder if it would be possible to just put IOP tracking inside the tracing object store. Then the dataset can just have an io_stats method that returns stats for the dataset's usage of the object store (or even global object store stats, if the object store is shared amongst datasets).
| fn list(&self, prefix: Option<&Path>) -> BoxStream<'static, OSResult<ObjectMeta>> { | ||
| let _guard = self.hop_guard(); | ||
| self.record_read("list", prefix.cloned().unwrap_or_default(), 0, None); | ||
| self.target.list(prefix) | ||
| } | ||
|
|
||
| fn list_with_offset( | ||
| &self, | ||
| prefix: Option<&Path>, | ||
| offset: &Path, | ||
| ) -> BoxStream<'static, OSResult<ObjectMeta>> { | ||
| self.record_read( | ||
| "list_with_offset", | ||
| prefix.cloned().unwrap_or_default(), | ||
| 0, | ||
| None, | ||
| ); | ||
| self.target.list_with_offset(prefix, offset) | ||
| } | ||
|
|
||
| async fn list_with_delimiter(&self, prefix: Option<&Path>) -> OSResult<ListResult> { | ||
| let _guard = self.hop_guard(); | ||
| self.record_read( | ||
| "list_with_delimiter", | ||
| prefix.cloned().unwrap_or_default(), | ||
| 0, | ||
| None, | ||
| ); | ||
| self.target.list_with_delimiter(prefix).await | ||
| } |
There was a problem hiding this comment.
Doesn't have to be this PR but I wonder if we should start recording lists as a separate operation entirely (e.g. not reads)
It's not just that. You also have to to use the
Hmm, now that's an interesting idea. Yeah we could have some sort of test feature that enables collection of IO requests. And then it wouldn't require much setup. I'll think more about that. |
* `IOTracker` is now public in `lance-io`, so it can be re-used
throughout the codebase.
* Added new assertions `assert_io_eq!()`, `assert_io_lt!()` and
`assert_io_gt!()` which will print out the list of requests in case of
failure:
```rust
thread 'dataset::tests::test_load_manifest_iops' panicked at rust/lance/src/dataset.rs:2882:9:
assertion failed: `(left == right)`: Expected read_iops to be 3, got 2. Requests: [
IORequest(method=list, path="test/_versions"),
IORequest(method=get_opts, path="test/_versions/1.manifest"),
]
Diff < left / right > :
<2
>3
```
IOTrackeris now public inlance-io, so it can be re-used throughout the codebase.assert_io_eq!(),assert_io_lt!()andassert_io_gt!()which will print out the list of requests in case of failure: