fix delegated smoosh writer and some new facilities for segment writeout medium#12132
Merged
clintropolis merged 2 commits intoapache:masterfrom Jan 11, 2022
Merged
Conversation
…out medium changes: * fixed issue with delegated `SmooshedWriter` when writing files that look like paths, causing `NoSuchFileException` exceptions when attempting to open a channel to the file * `FileSmoosher.addWithSmooshedWriter` when _not_ delegating now checks that it is still open when closing, making it a no-op if already closed (allowing column serializers to add additional files and avoid delegated mode if they are finished writing out their own content and ned to add additional files) * add `makeChildWriteOutMedium` to `SegmentWriteOutMedium` interface, which allows users of a shared medium to clean up `WriteOutBytes` if they fully control the lifecycle. there are no callers of this yet, adding for future functionality * `OnHeapByteBufferWriteOutBytes` now can be marked as not open so it `OnHeapMemorySegmentWriteOutMedium` can now behave identically to other medium implementations
|
This pull request fixes 1 alert when merging b09a563 into 2299eb3 - view on LGTM.com fixed alerts:
|
maytasm
approved these changes
Jan 8, 2022
cheddar
approved these changes
Jan 10, 2022
Contributor
cheddar
left a comment
There was a problem hiding this comment.
One kinda nitty comment. Other than that, LGTM
| private List<File> filesInProcess = new ArrayList<>(); | ||
| // delegated smooshedWriter creates a new temporary file per file added. we use a counter to name these temporary | ||
| // files, and map the file number to the file name so we don't have to escape the file names (e.g. names with '/') | ||
| private int delegateFileCounter = 0; |
Contributor
There was a problem hiding this comment.
While this is likely only ever used in a single-threaded context. The extra overhead of an AtomicLong is basically nothing and it makes for a bit nicer guarantees, so I'd suggest using that instead.
Member
Author
There was a problem hiding this comment.
this class definitely isn't thread-safe, but i don't have a strong preference either way so went ahead and changed it
|
This pull request fixes 1 alert when merging e734b7b into 2a41b7b - view on LGTM.com fixed alerts:
|
sachinsagare
pushed a commit
to pinterest/druid
that referenced
this pull request
Oct 28, 2022
…out medium (apache#12132) * fix delegated smoosh writer and some new facilities for segment writeout medium changes: * fixed issue with delegated `SmooshedWriter` when writing files that look like paths, causing `NoSuchFileException` exceptions when attempting to open a channel to the file * `FileSmoosher.addWithSmooshedWriter` when _not_ delegating now checks that it is still open when closing, making it a no-op if already closed (allowing column serializers to add additional files and avoid delegated mode if they are finished writing out their own content and ned to add additional files) * add `makeChildWriteOutMedium` to `SegmentWriteOutMedium` interface, which allows users of a shared medium to clean up `WriteOutBytes` if they fully control the lifecycle. there are no callers of this yet, adding for future functionality * `OnHeapByteBufferWriteOutBytes` now can be marked as not open so it `OnHeapMemorySegmentWriteOutMedium` can now behave identically to other medium implementations * fix to address nit - use AtomicLong (cherry picked from commit 7cf9192)
sachinsagare
pushed a commit
to sachinsagare/druid
that referenced
this pull request
Nov 2, 2022
…out medium (apache#12132) * fix delegated smoosh writer and some new facilities for segment writeout medium changes: * fixed issue with delegated `SmooshedWriter` when writing files that look like paths, causing `NoSuchFileException` exceptions when attempting to open a channel to the file * `FileSmoosher.addWithSmooshedWriter` when _not_ delegating now checks that it is still open when closing, making it a no-op if already closed (allowing column serializers to add additional files and avoid delegated mode if they are finished writing out their own content and ned to add additional files) * add `makeChildWriteOutMedium` to `SegmentWriteOutMedium` interface, which allows users of a shared medium to clean up `WriteOutBytes` if they fully control the lifecycle. there are no callers of this yet, adding for future functionality * `OnHeapByteBufferWriteOutBytes` now can be marked as not open so it `OnHeapMemorySegmentWriteOutMedium` can now behave identically to other medium implementations * fix to address nit - use AtomicLong (cherry picked from commit 7cf9192)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes a bug in
FileSmoosherwhen usingaddWithSmooshedWriterin "delegated" mode , which writes to temporary files until the "parent"SmooshedWriteris closed, in cases file names which look like valid paths, e.g. 'foo/bar'.The added test case will explode with an error like:
prior to the changes in this PR, which is now using a counter to make a temp file name, and a map of the counter value to the actual file name when merging the delegated files back into the parent file.
There are also a couple of other improvements for some future changes that I would like to make. The first is
FileSmoosher.addWithSmooshedWriter, when not delegating, now checks that it is still open when doing stuff inside ofclose, making it now be a no-op if already closed (allowing column serializers to add additional files and avoid delegated mode if they are finished writing out their own content and ned to add additional files)Additionally,
SegmentWriteOutMediumhas a new method that allows callers to free-upWriteOutByteswhen using a shared medium and are fully in control of the lifecycle of theWriteOutBytes.The contract of this method is that the child medium is registered to the closer of the parent, so need not be explicitly closed, but it also may be explicitly closed, to allow freeing the backing resources of
WriteOutBytescreated by the medium. When using a shared medium, such as is done when serializing columns during segment creation, these resources are typically not able to be released until the medium itself is closed.The type of use case this method is targeting is actually present in
IntermediateColumnarLongsSerializer, which does not actually use theSegmentWriteOutMediumuntilgetSerializedSize/writeTois called where they are immediately written to the channel. However, there is little benefit to changing to this method, because it is a very small number of additional files that will be created when writing out each column, and so not much to gain by releasing them early. Something like #10628 however, which creates a lot of files and immediately dumps them in the channel, could greatly reduce the number of open files on the system if there were for example a very large number of columns.Finally,
OnHeapMemorySegmentWriteOutMediumpreviously would never mark itsWriteOutBytesas 'closed', meaning that callers could indefinitely write to one of these buffers even after the medium was closed. This seemed like it allows breaking the contract ofSegmentWriteOutMediumto useWriteOutBytesafter closed, and the underlyingByteBufferWriteOutByteshas methods to check that the medium is open, but the heap implementation has no way to set it as not opened. I have changed this so that all implementations ofSegmentWriteOutMediumbehave identically by pushing thefreemethod down toByteBufferWriteOutBytes(and overriding it in the direct implementation).This PR has: