Fixes reindexing bug with filter on long column#13386
Conversation
clintropolis
left a comment
There was a problem hiding this comment.
good find 👍
i believe this can also be a query time issue in certain cases since NotFilter and a few other things also use the method which is known to double close the time column
| currBufferNum = -1; | ||
| holder.close(); | ||
| holder = null; | ||
| buffer = null; |
There was a problem hiding this comment.
nit: I guess currBufferNum is the only thing that truly matters here, but just to be consistent maybe we should also null out longBuffer since we are doing that for everything else.
Additionally, while I am not aware of anywhere which is double closing any float or double columns, they do suffer from the same flaw of not resetting currBufferNum when closed, and so could also read from a buffer that has been returned to the pool if that happened,
- https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/data/BlockLayoutColumnarFloatsSupplier.java#L183
- https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/segment/data/BlockLayoutColumnarDoublesSupplier.java#L183
There was a problem hiding this comment.
+1 for fixing the other classes too.
There was a problem hiding this comment.
Thanks for pointing that out, fixed other supplier classes too.
| segmentDirectory = temporaryFolder.newFolder(); | ||
| dimensionsSpec = new DimensionsSpec( | ||
| ImmutableList.of( | ||
| StringDimensionSchema.create("s"), |
There was a problem hiding this comment.
This is a bit of a nit, but "s" and "str" and "stringCol" are all not too very long, but the longer ones actually convey more information on their own. I'd suggest either using "str" and "dbl" or "strCol" and "dblCol" or "stringCol" and "doubleCol", just to be more nice to the next person who reads the code.
There was a problem hiding this comment.
Just focused on refactoring the test to be used by custom dimensions and metrics spec. Updated in the latest commit.
| currBufferNum = -1; | ||
| holder.close(); | ||
| holder = null; | ||
| buffer = null; |
There was a problem hiding this comment.
+1 for fixing the other classes too.
clintropolis
left a comment
There was a problem hiding this comment.
👍 after CI (looks like spotbugs check is complaining about missing @Nullable on things https://app.travis-ci.com/github/apache/druid/jobs/589008836#L873)
|
Fixed spot bugs. Thanks, @clintropolis, for the review. |
* fixes BlockLayoutColumnarLongs close method to nullify internal buffer. * fixes other BlockLayoutColumnar supplier close methods to nullify internal buffers. * fix spotbugs (cherry picked from commit b091b32)
Description
Filtering on a long column while reindexing results in
__timecolumn values to be parsed the same as this long column.BlockLayoutColumnarLongsclose method is not nullifying the internal buffer. Hence the__timeLongBufferis overridden by this filtering column.DruidSegment input rows:
Reindex spec used:
Re-indexing (before changes):
The 2 final rows are thrownAway since internally, while loading
__timeLongBuffer, it's being overridden by longCol values while creatingFilteredOffsetbyQueryableIndexCursorSequenceBuilder.Re-indexing (after changes):
Key changed/added classes in this PR
BlockLayoutColumnarLongsSupplierThis PR has: