-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[IcebergIO] Support column pruning #34856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #34856 +/- ##
============================================
+ Coverage 54.53% 56.48% +1.95%
- Complexity 1479 3301 +1822
============================================
Files 1010 1182 +172
Lines 160455 181555 +21100
Branches 1079 3409 +2330
============================================
+ Hits 87500 102553 +15053
- Misses 70857 75738 +4881
- Partials 2098 3264 +1166
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
assign set of reviewers |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @m-trieu for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
| </td> | ||
| <td> | ||
| <code>list[<span style="color: green;">str</span>]</code> | ||
| </td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to specify the required Beam SDK version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm not sure how to do this considering the file is auto-generated. Maybe we can include it in the schema field's description but that doesn't seem very clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't it work, even on SDKs that don't understand it yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only on Dataflow Runner V2 -- it'll fail if a user tries experimenting with any other runner + old SDK
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergScanConfig.java
Outdated
Show resolved
Hide resolved
|
Please update CHANGES.md |
|
Thanks Kenn |
|
Ahh forgot to update CHANGES. I'll open another PR to do that |
Part of #34789
Allows users to pass a list of field names to either keep or drop when reading from an Iceberg table.
For example, say we have a table with columns
colA,colB,colC,colD,colE. Either of the following will produce the same output:keep: ["colA", "colE"]drop: ["colB", "colC", "colD"]keepanddropare mutually exclusive and an error will be thrown if both are specified