-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Switch usage to DataFileSet / DeleteFileSet #11158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6b35421 to
9d75725
Compare
5c25dd3 to
10150bf
Compare
1e0b413 to
54cffb4
Compare
4dbf58c to
73638a6
Compare
70dcd1a to
6fc4e35
Compare
1e6dfd7 to
8002680
Compare
8002680 to
a2688b8
Compare
95d12ef to
08ccda9
Compare
88e1d58 to
52c5899
Compare
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
| private final CharSequenceSet newDataFilePaths = CharSequenceSet.empty(); | ||
| private final CharSequenceSet newDeleteFilePaths = CharSequenceSet.empty(); | ||
| private final DataFileSet newDataFiles = DataFileSet.create(); | ||
| private final DeleteFileSet newDeleteFiles = DeleteFileSet.create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need these extra collections? Can't we use sets in newDataFilesBySpec and newDeleteFilesBySpec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm handling this already in 4d42f18. I just didn't want to introduce too many changes/refactorings as the PR is already quite large
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @nastra offline, overall feel like the change is good but I think it's worth running some benchmarks as a sanity check. There really shouldn't be much of a change afterwards but it's more so to make sure there's not some regression we're not expecting.
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/ManifestFilterManager.java
Outdated
Show resolved
Hide resolved
52c5899 to
ae3bcac
Compare
83c6052 to
0df0066
Compare
0df0066 to
c91f326
Compare
@amogh-jahagirdar I added benchmark results to the PR description |
|
|
||
| // delete file2 | ||
| table.newDelete().deleteFile(file2.path()).commit(); | ||
| table.newDelete().deleteFile(file2).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change required? There should be no behavior change, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a required change, but I wanted to move existing places to the non-path-based delete path
| table.newAppend().appendFile(file1).appendFile(file2).commit(); | ||
|
|
||
| // delete file2 | ||
| table.newDelete().deleteFile(file2.path()).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a required change, but I wanted to move existing places to the non-path-based delete path
| private final PartitionSet deleteFilePartitions; | ||
| private final PartitionSet dropPartitions; | ||
| private final CharSequenceSet deletePaths = CharSequenceSet.empty(); | ||
| private final Set<F> deleteFiles = newFileSet(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we co-locate deleteFiles and deleteFilePartitions as deleteFilePartitions essentially contains a set of partitions derived from deleteFiles.
|
Actually, the comments are minor and I verified that changes in tests are cosmetic locally. I'll go ahead and merge this one to unblock the subsequent PR. |
|
Thanks, @nastra! Thanks for reviewing, @amogh-jahagirdar! |
Uh oh!
There was an error while loading. Please reload this page.