Conversation
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
TLDR is I don't see any substantial change here (means test has passed!) |
# Which issue does this PR close? - Part of #7983 - Part of #8000 - closes #8677 I am also working on a blog post about this - #8035 # TODOs - [x] Rewrite `test_cache_projection_excludes_nested_columns` in terms of higher level APIs (#8754) - [x] Benchmarks - [x] Benchmarks with DataFusion: apache/datafusion#18385 # Rationale for this change A new ParquetPushDecoder was implemented here - #7997 I need to refactor the async and sync readers to use the new push decoder in order to: 1. avoid the [xkcd standards effect](https://xkcd.com/927/) (aka there are now three control loops) 3. Prove that the push decoder works (by passing all the tests of the other two) 4. Set the stage for improving filter pushdown more with a single control loop <img width="400" alt="image" src="https://github.com/user-attachments/assets/e6886ee9-58b3-4a1e-8e88-9d2d03132b19" /> # What changes are included in this PR? 1. Refactor the `ParquetRecordBatchStream` to use `ParquetPushDecoder` # Are these changes tested? Yes, by the existing CI tests I also ran several benchmarks, both in arrow-rs and in DataFusion and I do not see any substantial performance difference (as expected): - apache/datafusion#18385 # Are there any user-facing changes? No --------- Co-authored-by: Vukasin Stefanovic <vukasin.stefanovic92@gmail.com>
|
New code is shipped in arrow 57.1.0 -- no need for this PR anymore |
This is a test PR to confirm the changes in
ParquetRecordBatchStreamin terms of the PushDecoder arrow-rs#8159Do not cause any performance regressions