-
Notifications
You must be signed in to change notification settings - Fork 478
fixes split bug related to files w/o data for tablet #5833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Attempting to split a tablet that had files that did not have data for the
tablet would cause an error. There were two bugs. First bug was the
splits code would fail if a file went to zero child tablets. Second bug
was if a file had a fence range that was disjoint from data in the file,
then the FencedRFile code would fail. This happened be cause the code
would compute a range where the start was after the end.
Both of these situations can occur over time with concurrent splits, merges, and
bulk imports. For example the following could happen.
1. bulk import calculates tablets tha files go to
2. split add more tablets
3. bulk import adds files to the ranges it calculated before the split
happened. This could result in a tablet pointing to a file that has
no data for it.
4. Tablets are merged and fence ranges are added. If the file has no
data in the tablet range, then the fence range will be disjoint w/
the range of data in the file.
To fix this a new FileRange class was added that represents a tablet
range or an empty range. This code replaces two method for getting a
files first and last row that returned null when the file was empty.
The null was really confusing, explicitly representing empty in the
class makes the code easier to understand.
Using this new FileRange class, the split code and fenced rfile code
were fixed.
These problems were found when running the bulk randomwalk test.
| } | ||
|
|
||
| @Override | ||
| public Text getFirstRow() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where one of the two bugs was. This method and getLastRow could compute a last row that was before the start row when the fence did not overlap the first and last row in the file. Now the new code should return an empty FileRange for this case.
| @Override | ||
| public Text getFirstRow() throws IOException { | ||
| if (currentReaders.length == 0) { | ||
| return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This old code would return null when the file was empty. Now it returns something that explicitly means empty. Other code seemed confused about what null meant and would expand it to an infinite range, which was harmless and resulted in files with no data going to all children tablets when a tablet split. The new code returns some more specific than null to denote empty file.
| double numOverlapping = newTablets.keySet().stream().map(KeyExtent::toDataRange) | ||
| .filter(range -> range.clip(fileRange, true) != null).count(); | ||
|
|
||
| Preconditions.checkState(numOverlapping > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was one of the two bugs fixed. This code would throw an exception when a file overlapped zero child tablets during tablet split. However there are legitimate reasons this can happen. The new code logs a debug that the file does not overlap any tablets and drops the file.
Looked back at what 2.1 code did for this case. 2.1 code only splits a tablets into two tablets. In main it splits into 2 or more tablets in a single operation. The 2.1 code always assigned a file to one tablet or both when splitting, it never dropped a file.
|
When running bulk random walk test yesterday the test failed with data corruption. Not sure if these changes are the cause because without these changes the test usually fails because it hits the bugs this is fixing. So could be an existing problem. Plan to keep running the test. Also plan to merge #5786 into this branch before running more test, saw the failure w/o #5786 merged. |
| this.empty = false; | ||
| } | ||
|
|
||
| public FileRange expand(FileRange other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment about what this method does would be useful. I'm assuming that this method expands this FileRange to cover another?
core/src/main/java/org/apache/accumulo/core/iteratorsImpl/system/SequenceFileIterator.java
Outdated
Show resolved
Hide resolved
…em/SequenceFileIterator.java Co-authored-by: Dave Marion <dlmarion@apache.org>
Attempting to split a tablet that had files that did not have data for the tablet would cause an error. There were two bugs. First bug was the splits code would fail if a file went to zero child tablets. Second bug was if a file had a fence range that was disjoint from data in the file, then the FencedRFile code would fail. This happened be cause the code would compute a range where the start was after the end.
Both of these situations can occur over time with concurrent splits, merges, and bulk imports. For example the following could happen.
To fix this a new FileRange class was added that represents a tablet range or an empty range. This code replaces two method for getting a files first and last row that returned null when the file was empty. The null was really confusing, explicitly representing empty in the class makes the code easier to understand.
Using this new FileRange class, the split code and fenced rfile code were fixed.
These problems were found when running the bulk randomwalk test.