-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Implement FanoutPositionDeleteWriter #3377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jackye1995
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me, just some nit in testing.
|
Thanks for the review @jackye1995 ! |
|
@marton-bod, what is the use case for this type of writers? The spec requires the position delete records to be sorted by |
|
|
||
| @Override | ||
| protected FileWriter<PositionDelete<T>, DeleteWriteResult> newWriter(PartitionSpec spec, StructLike partition) { | ||
| return new RollingPositionDeleteWriter<>(writerFactory, fileFactory, io, targetFileSizeInBytes, spec, partition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support rolling ORC writers? Other writers such as ClusteredPositionDeleteWriter have a check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does not fail right now as we disable ORC tests in TestPartitioningWriters. We can enable them now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it seems that some of these tests can be re-enabled for ORC now, whereas previously they has an initial Assume to not run against ORC.
As an example:
| Assume.assumeFalse("ORC delete files are not supported", fileFormat == FileFormat.ORC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since ORC: Add DeleteWriteBuilder for format v2 (#3250) went in, we don't need any special treatment for ORC format in the DeleteWriters. That means we could remove the Assume.assumeFalse() checks as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about enabling them in a separate PR?
|
This looks mostly good to me. We need to fix ORC writers and enable appropriate tests. I'd like to also know a little bit more about the use case. |
|
Thanks for reviewing @aokolnychyi! You're right about the spec mandating that delete entries must be sorted by As for the I think there might still be some utility in keeping this writer implementation as well for the problems similar to the one described above, but I'll leave it up to you. What do you think? |
|
@marton-bod, I would rather not include this if we don't know that it is definitely needed. Otherwise it would be easy to start using it even though there are better ways to prepare the data so that it isn't needed. Does that make sense? If there's a way around adding a class, then we should avoid adding it. |
|
Fanout writers may be inefficient so I'd rather use That being said, we should probably still enable ORC tests in |
|
@rdblue @aokolnychyi Makes sense! Thanks for reviewing, I'll close this PR. |
Implement new
FanoutPositionDeleteWriter, similar to the already-existingFanoutDataWriterclass.This new DeleteWriter implementation should provide an alternative to
ClusteredPositionDeleteWriterfor those users who are not inserting their data clustered by partition spec/partition values.@aokolnychyi @openinx @pvary - can you please take a look? Thanks!