Skip to content

Fix race between canHandle() and addSegment() in StorageLocation#8114

Merged
leventov merged 8 commits intoapache:masterfrom
jihoonson:fix-storage-location
Jul 27, 2019
Merged

Fix race between canHandle() and addSegment() in StorageLocation#8114
leventov merged 8 commits intoapache:masterfrom
jihoonson:fix-storage-location

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented Jul 19, 2019

Description

Historicals can use multiple threads to load segments in parallel from deep storage now (#7088, #4966). This could lead an incorrect computation of remaining space in StorageLocation.

The current pattern to use StorageLocation is:

if (location.canHandle(segment)) {
  // load segment files into the location
  location.addSegment(segment);
}

Even though each of canHandle() and addSegment() is synchronized, they are not atomically executed which could lead a wrong estimation of the available space.

This PR fixes this problem to add a new method, reserve(), to StorageLocation. reserve() basically does what canHandle() and addSegment() atomically. If some error occurs after reserve(), the caller should call remove(segment) to release the reserved space.


This PR has:

  • been self-reviewed.
  • added unit tests or modified existing tests to cover new code paths.

@jihoonson jihoonson added the Bug label Jul 19, 2019
@leventov
Copy link
Copy Markdown
Member

@jihoonson please don't remove "this PR has been reviewed with concurrency checklist" item from the list since this PR does have relation to concurrency.

@leventov leventov self-requested a review July 20, 2019 07:51
@jihoonson
Copy link
Copy Markdown
Contributor Author

@leventov added back.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm 👍


public boolean canHandle(DataSegment segment)
/**
* This method is available for only unit tests. Production code must use {@link #reserve} instead.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This javadoc is semi confusing because reserve calls canHandle?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to make it clear.

* In native parallel indexing, phase 1 tasks store segment files in local storage of middleManagers
* and phase 2 tasks read those files via HTTP.
*
* <p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: were these <p> added accidentally?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, these are actually inserted by Intellij's auto indentation based on the code style. It seems legit?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think there is maybe some reason why this isn't in the code style - i remember being told to remove these in the past and to manually adjust intellij to not insert them, I'll see if I can find the reason.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason we usually avoid them is that they look silly, and also aren't adding anything useful since Druid javadocs are mostly read in source form anyway. It's not a big deal either way. Maybe one day someone will add a style rule for it and get rid of them all.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Removed it.

* Remove a segment dir from this location. The segment size is subtracted from currSize.
*/
public synchronized void addSegmentDir(File segmentDir, DataSegment segment)
public synchronized void removeSegmentDir(File segmentDir, DataSegment segment)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking about it a bit, it doesn't seem particularly beneficial for StorageLocation to have to be aware of DataSegment and SegmentId, since DataSegment is only used as a shorthand to get the size and SegmentId, and the SegmentId is only used for logging. It seems like reworking this might make the implementation of StorageLocation a bit more simple and allow IntermediaryDataManager and SegmentLoaderLocalCacheManager to use the same reserve and remove methods. I would consider this optional however, up to you if you want to investigate.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already looked into that way. The thing is, the way to use StorageLocation is different in IntermediaryDataManager and SegmentLoaderLocalCacheManager. In IntermediaryDataManager, the compressed segment files are stored and registered in StorageLocation. In SegmentLoaderLocalCacheManager, the uncompressed segment directories are registered in StorageLocation. This led to use different size computations in removeFile(File) and removeSegmentDir(File, DataSegment). I wanted to make sure that the caller must be aware of what it's registering and call the right method.

I guess we could have an abstract StorageLocation class and its two implementations for different use cases. But I'm not sure it's super beneficial at this point because StorageLocation is still pretty simple.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, sgtm 👍

return;
}
}
catch (Exception e) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can catch narrower exception type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to print something in the log file so that operators could notice that something happened.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally considered an antipattern to catch everything just in case unless the exception is rethrown in the catch block. Could catch only IOExceptions?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was confused with what part you were saying. It sounds good. Fixed.

}
}
catch (Exception e) {
// Print only log here to try other locations as well.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Print only log" - a strange phrase. Perhaps it should be just "Only log"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.


private final File path;
private final long maxSize;
private final long maxSize; // in bytes
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can call this field maxSizeBytes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private final long maxSize; // in bytes
private final long freeSpaceToKeep;

// Set of files stored under the given path. All accesses must be synchronized with currSize.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please turn this comment into a Javadoc comment.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private final Set<File> files = new HashSet<>();

private volatile long currSize = 0;
// Current total size of files in bytes. All accesses must be synchronized with files.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


private volatile long currSize = 0;
// Current total size of files in bytes. All accesses must be synchronized with files.
private long currSize = 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"All accesses must be synchronized with files." can be replaced with @GuardedBy("this"). Other fields in this class should be annotated, too. See https://github.com/code-review-checklists/java-concurrency#guarded-by

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can call this field currSizeBytes

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good convention. Added.

boolean canHandle(SegmentId segmentId, long segmentSize)
{
if (available() < segment.getSize()) {
if (available() < segmentSize) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to call this method "availableSizeBytes"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

* This method is available for only unit tests. Production code must use {@link #reserve} instead.
*/
@VisibleForTesting
boolean canHandle(SegmentId segmentId, long segmentSize)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be annotated @GuardedBy("this"). See https://github.com/code-review-checklists/java-concurrency#guarded-by

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍

* Remove a segment dir from this location. The segment size is subtracted from currSize.
*/
public synchronized void addSegmentDir(File segmentDir, DataSegment segment)
public synchronized void removeSegmentDir(File segmentDir, DataSegment segment)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, sgtm 👍

@jihoonson
Copy link
Copy Markdown
Contributor Author

@leventov do you have more comments on this pr?

return;
}
}
catch (Exception e) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally considered an antipattern to catch everything just in case unless the exception is rethrown in the catch block. Could catch only IOExceptions?

// Current total size of files in bytes. All accesses must be synchronized with files.
private long currSize = 0;
/**
* Current total size of files in bytes. All accesses must be synchronized with files.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"All accesses must be synchronized with files." sentence is strange. It seems to me that it should be just removed, GuardedBy already says everything needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.


// Set of files stored under the given path. All accesses must be synchronized with currSize.
/**
* Set of files stored under the given path. All accesses must be synchronized with currSizeBytes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same about "All accesses must be synchronized with currSizeBytes."

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be under {@link #path}.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private static final EmittingLogger log = new EmittingLogger(SegmentLoaderLocalCacheManager.class);
private static final Comparator<StorageLocation> COMPARATOR = (left, right) ->
Longs.compare(right.available(), left.available());
Longs.compare(right.availableSizeBytes(), left.availableSizeBytes());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparator.comparingLong(StorageLocation::availableSizeBytes).reversed() would be clearer

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

private static final Comparator<StorageLocation> COMPARATOR = (left, right) ->
Longs.compare(right.availableSizeBytes(), left.availableSizeBytes());
private static final Comparator<StorageLocation> COMPARATOR = Comparator
.comparing(StorageLocation::availableSizeBytes)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be comparingLong?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah. Fixed.

@leventov leventov merged commit adf7baf into apache:master Jul 27, 2019
@clintropolis clintropolis added this to the 0.16.0 milestone Aug 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants