-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[IcebergIO] Support filter pushdown during reads #34827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/FilterUtils.java
Show resolved
Hide resolved
|
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Hi @ahmedabu98, are you still trying to fix the tests or is this truly ready for review again? Thanks! |
|
Test failures are irrelevant, this is ready for a review |
|
Hi @robertwb and @kennknowles - please review when you get a chance. Thanks! |
chamikaramj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| * Utilities that convert between a SQL filter expression and an Iceberg {@link Expression}. Uses | ||
| * Apache Calcite semantics. | ||
| * | ||
| * <p>Note: Only supports top-level fields (i.e. cannot reference nested fields). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure we clearly fail for unsupported queries.
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/ScanTaskReader.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/FilterUtils.java
Show resolved
Hide resolved
| call, | ||
| schema); | ||
| case NOT_EQ: | ||
| return convertFieldAndLiteral( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to do 100% test coverage for this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to do that in FilterUtilsTest. Let me know if anything is missing
sdks/java/io/iceberg/src/test/java/org/apache/beam/sdk/io/iceberg/FilterUtilsTest.java
Outdated
Show resolved
Hide resolved
sdks/java/io/iceberg/src/test/java/org/apache/beam/sdk/io/iceberg/IcebergIOReadTest.java
Outdated
Show resolved
Hide resolved
chamikaramj
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergScanConfig.java
Outdated
Show resolved
Hide resolved
|
Failing test is unrelated -- merging now |
|
Any ETA when this change will be released officially ? |
Part of #34789
Allows users to pass a SQL expression to filter files and rows when scanning. For example:
"colA" = 'SUCCESS' AND "date" < '2025-05-06'Uses Apache Calcite to parse SQL expressions (see doc reference: https://calcite.apache.org/docs/reference.html)