refactor test_cache_projection_excludes_nested_columns to use high level APIs#8754
Merged
alamb merged 1 commit intoapache:mainfrom Oct 31, 2025
Merged
refactor test_cache_projection_excludes_nested_columns to use high level APIs#8754alamb merged 1 commit intoapache:mainfrom
test_cache_projection_excludes_nested_columns to use high level APIs#8754alamb merged 1 commit intoapache:mainfrom
Conversation
…ache nested columns
3 tasks
Contributor
Author
|
@XiangpengHao can I trouble you for a review of this PR (as you wrote the original version of the test)? |
185af9f to
1512116
Compare
XiangpengHao
approved these changes
Oct 31, 2025
Contributor
XiangpengHao
left a comment
There was a problem hiding this comment.
Looks good to me! This is much more readable than before 💯
Contributor
Author
|
Tahnk you @XiangpengHao |
alamb
added a commit
that referenced
this pull request
Nov 10, 2025
# Which issue does this PR close? - Part of #7983 - Part of #8000 - closes #8677 I am also working on a blog post about this - #8035 # TODOs - [x] Rewrite `test_cache_projection_excludes_nested_columns` in terms of higher level APIs (#8754) - [x] Benchmarks - [x] Benchmarks with DataFusion: apache/datafusion#18385 # Rationale for this change A new ParquetPushDecoder was implemented here - #7997 I need to refactor the async and sync readers to use the new push decoder in order to: 1. avoid the [xkcd standards effect](https://xkcd.com/927/) (aka there are now three control loops) 3. Prove that the push decoder works (by passing all the tests of the other two) 4. Set the stage for improving filter pushdown more with a single control loop <img width="400" alt="image" src="https://github.com/user-attachments/assets/e6886ee9-58b3-4a1e-8e88-9d2d03132b19" /> # What changes are included in this PR? 1. Refactor the `ParquetRecordBatchStream` to use `ParquetPushDecoder` # Are these changes tested? Yes, by the existing CI tests I also ran several benchmarks, both in arrow-rs and in DataFusion and I do not see any substantial performance difference (as expected): - apache/datafusion#18385 # Are there any user-facing changes? No --------- Co-authored-by: Vukasin Stefanovic <vukasin.stefanovic92@gmail.com>
alamb
added a commit
that referenced
this pull request
Nov 10, 2025
# Which issue does this PR close? - Follow on to #8754 # Rationale for this change While working on #8754 I found the current formulation a bit akward so let's fix that # What changes are included in this PR? Move the builder configuration to a trait so the tests read more fluently. This is totally unecessary, it just makes me feel better about the tests # Are these changes tested? Yes by CI # Are there any user-facing changes? No this is test only
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
ParquetRecordBatchStream(async API) in terms of the PushDecoder #8677ParquetRecordBatchStreamin terms of the PushDecoder #8159Rationale for this change
I am reworking how the parquet decoder's state machine works in #8159
One of the unit tests,
test_cache_projection_excludes_nested_columnsuses non-public APIs that I am changingRather than rewrite them into other non public APIs I think it would be better if this test is in terms of public APIs
What changes are included in this PR?
test_cache_projection_excludes_nested_columnsto use high level APIsAre these changes tested?
They are run in CI
I also verified this test covers the intended functionality by commenting it out:
And then running the test:
cargo test --all-features --test arrow_readerAnd the test fails (as expected)
Are there any user-facing changes?
No, this is only test changes