Skip to content

Conversation

@ahmedabu98
Copy link
Contributor

Part of #34789

Allows users to pass a list of field names to either keep or drop when reading from an Iceberg table.

For example, say we have a table with columns colA, colB, colC, colD, colE. Either of the following will produce the same output:

  • keep: ["colA", "colE"]
  • drop: ["colB", "colC", "colD"]

keep and drop are mutually exclusive and an error will be thrown if both are specified

@codecov
Copy link

codecov bot commented May 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 56.48%. Comparing base (5288c54) to head (d9c3153).
Report is 2 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #34856      +/-   ##
============================================
+ Coverage     54.53%   56.48%   +1.95%     
- Complexity     1479     3301    +1822     
============================================
  Files          1010     1182     +172     
  Lines        160455   181555   +21100     
  Branches       1079     3409    +2330     
============================================
+ Hits          87500   102553   +15053     
- Misses        70857    75738    +4881     
- Partials       2098     3264    +1166     
Flag Coverage Δ
java 70.59% <ø> (+1.93%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ahmedabu98
Copy link
Contributor Author

assign set of reviewers

@github-actions
Copy link
Contributor

github-actions bot commented May 6, 2025

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @m-trieu for label java.
R: @liferoad for label website.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

</td>
<td>
<code>list[<span style="color: green;">str</span>]</code>
</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to specify the required Beam SDK version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm not sure how to do this considering the file is auto-generated. Maybe we can include it in the schema field's description but that doesn't seem very clean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it work, even on SDKs that don't understand it yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only on Dataflow Runner V2 -- it'll fail if a user tries experimenting with any other runner + old SDK

@ahmedabu98 ahmedabu98 changed the title [IcebergIO] support selecting some fields to either keep or drop [IcebergIO] Support column pruning May 6, 2025
@liferoad
Copy link
Contributor

liferoad commented May 7, 2025

Please update CHANGES.md

@ahmedabu98
Copy link
Contributor Author

Thanks Kenn

@ahmedabu98 ahmedabu98 merged commit 399ae72 into apache:master May 7, 2025
27 of 29 checks passed
@ahmedabu98
Copy link
Contributor Author

Ahh forgot to update CHANGES. I'll open another PR to do that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants