fixes split bug related to files w/o data for tablet #5833

keith-turner · 2025-08-27T19:10:57Z

Attempting to split a tablet that had files that did not have data for the tablet would cause an error. There were two bugs. First bug was the splits code would fail if a file went to zero child tablets. Second bug was if a file had a fence range that was disjoint from data in the file, then the FencedRFile code would fail. This happened be cause the code would compute a range where the start was after the end.

Both of these situations can occur over time with concurrent splits, merges, and bulk imports. For example the following could happen.

bulk import calculates tablets tha files go to
split add more tablets
bulk import adds files to the ranges it calculated before the split happened. This could result in a tablet pointing to a file that has no data for it.
Tablets are merged and fence ranges are added. If the file has no data in the tablet range, then the fence range will be disjoint w/ the range of data in the file.

To fix this a new FileRange class was added that represents a tablet range or an empty range. This code replaces two method for getting a files first and last row that returned null when the file was empty. The null was really confusing, explicitly representing empty in the class makes the code easier to understand.

Using this new FileRange class, the split code and fenced rfile code were fixed.

These problems were found when running the bulk randomwalk test.

Attempting to split a tablet that had files that did not have data for the tablet would cause an error. There were two bugs. First bug was the splits code would fail if a file went to zero child tablets. Second bug was if a file had a fence range that was disjoint from data in the file, then the FencedRFile code would fail. This happened be cause the code would compute a range where the start was after the end. Both of these situations can occur over time with concurrent splits, merges, and bulk imports. For example the following could happen. 1. bulk import calculates tablets tha files go to 2. split add more tablets 3. bulk import adds files to the ranges it calculated before the split happened. This could result in a tablet pointing to a file that has no data for it. 4. Tablets are merged and fence ranges are added. If the file has no data in the tablet range, then the fence range will be disjoint w/ the range of data in the file. To fix this a new FileRange class was added that represents a tablet range or an empty range. This code replaces two method for getting a files first and last row that returned null when the file was empty. The null was really confusing, explicitly representing empty in the class makes the code easier to understand. Using this new FileRange class, the split code and fenced rfile code were fixed. These problems were found when running the bulk randomwalk test.

keith-turner · 2025-08-28T15:32:02Z

core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java

    }

    @Override
-    public Text getFirstRow() throws IOException {


This is where one of the two bugs was. This method and getLastRow could compute a last row that was before the start row when the fence did not overlap the first and last row in the file. Now the new code should return an empty FileRange for this case.

keith-turner · 2025-08-28T15:34:07Z

core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java

    @Override
-    public Text getFirstRow() throws IOException {
-      if (currentReaders.length == 0) {
-        return null;


This old code would return null when the file was empty. Now it returns something that explicitly means empty. Other code seemed confused about what null meant and would expand it to an infinite range, which was harmless and resulted in files with no data going to all children tablets when a tablet split. The new code returns some more specific than null to denote empty file.

keith-turner · 2025-08-28T15:37:49Z

server/manager/src/main/java/org/apache/accumulo/manager/tableOps/split/UpdateTablets.java

-      double numOverlapping = newTablets.keySet().stream().map(KeyExtent::toDataRange)
-          .filter(range -> range.clip(fileRange, true) != null).count();
-
-      Preconditions.checkState(numOverlapping > 0);


This was one of the two bugs fixed. This code would throw an exception when a file overlapped zero child tablets during tablet split. However there are legitimate reasons this can happen. The new code logs a debug that the file does not overlap any tablets and drops the file.

Looked back at what 2.1 code did for this case. 2.1 code only splits a tablets into two tablets. In main it splits into 2 or more tablets in a single operation. The 2.1 code always assigned a file to one tablet or both when splitting, it never dropped a file.

keith-turner · 2025-08-28T15:44:29Z

When running bulk random walk test yesterday the test failed with data corruption. Not sure if these changes are the cause because without these changes the test usually fails because it hits the bugs this is fixing. So could be an existing problem. Plan to keep running the test. Also plan to merge #5786 into this branch before running more test, saw the failure w/o #5786 merged.

dlmarion · 2025-08-28T16:12:34Z

core/src/main/java/org/apache/accumulo/core/file/FileSKVIterator.java

+      this.empty = false;
+    }
+
+    public FileRange expand(FileRange other) {


A comment about what this method does would be useful. I'm assuming that this method expands this FileRange to cover another?

core/src/main/java/org/apache/accumulo/core/iteratorsImpl/system/SequenceFileIterator.java

…em/SequenceFileIterator.java Co-authored-by: Dave Marion <dlmarion@apache.org>

keith-turner added 2 commits August 27, 2025 18:54

Merge remote-tracking branch 'upstream/main' into fix-split-bug

8976da5

keith-turner mentioned this pull request Aug 27, 2025

adds ranges to table locks #5786

Merged

keith-turner commented Aug 28, 2025

View reviewed changes

keith-turner added 2 commits August 28, 2025 15:51

Merge branch 'main' into fix-split-bug

ac5fe26

format pom

977753b

dlmarion approved these changes Aug 28, 2025

View reviewed changes

keith-turner and others added 3 commits August 28, 2025 17:05

fix bug w/ apache#5786 changes found w/ rwalk

20540f8

Update core/src/main/java/org/apache/accumulo/core/iteratorsImpl/syst…

c7764df

…em/SequenceFileIterator.java Co-authored-by: Dave Marion <dlmarion@apache.org>

update docs and rename methods

d3329a7

keith-turner merged commit 72584da into apache:main Aug 28, 2025
8 checks passed

keith-turner deleted the fix-split-bug branch August 28, 2025 17:47

ctubbsii added this to the 4.0.0 milestone Aug 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixes split bug related to files w/o data for tablet #5833

fixes split bug related to files w/o data for tablet #5833

Uh oh!

keith-turner commented Aug 27, 2025

Uh oh!

keith-turner Aug 28, 2025

Uh oh!

keith-turner Aug 28, 2025

Uh oh!

keith-turner Aug 28, 2025

Uh oh!

keith-turner commented Aug 28, 2025 •

edited

Loading

Uh oh!

dlmarion Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fixes split bug related to files w/o data for tablet #5833

fixes split bug related to files w/o data for tablet #5833

Uh oh!

Conversation

keith-turner commented Aug 27, 2025

Uh oh!

keith-turner Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

keith-turner Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

keith-turner Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

keith-turner commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlmarion Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

keith-turner commented Aug 28, 2025 •

edited

Loading