Fix num_rows in sys.segments#6888
Conversation
* Fix segmentMetadataInfo update in DruidSchema * Add numRows to SegmentMetadataHolder builder's constructor, so it's not overwritten * Rename SegSegmentSignature to setSegmentMetadataHolder and fix it so nested map is appended instead of recreated * Replace Map<String, Set<String>> segmentServerMap with Set<String> for num_replica
|
@surekhasaharan would you please fix the conflicts? |
| // DataSource -> Segment -> SegmentMetadataHolder(contains RowSignature) for that segment. | ||
| // Use TreeMap for segments so they are merged in deterministic order, from older to newer. | ||
| // This data structure need to be accessed in a thread-safe way since SystemSchema accesses it | ||
| // This data structure need to be accessed in a thread-safe way via lock Object |
There was a problem hiding this comment.
I think it's still better to say why the concurrency issue happens too.
There was a problem hiding this comment.
| { | ||
| Map<DataSegment, SegmentMetadataHolder> segmentsMetadata = schema.getSegmentMetadata(); | ||
| Set<DataSegment> segments = segmentsMetadata.keySet(); | ||
| Assert.assertEquals(segments.size(), 3); |
There was a problem hiding this comment.
The arguments have an order. The expected value should be the first argument.
| segmentsMetadata = schema.getSegmentMetadata(); | ||
| existingSegment = segments.stream().findFirst().orElse(null); | ||
| final SegmentMetadataHolder currentHolder = segmentsMetadata.get(existingSegment); | ||
| Assert.assertEquals(updatedHolder.getNumRows(), currentHolder.getNumRows()); |
There was a problem hiding this comment.
Please check other fields as well.
There was a problem hiding this comment.
this test was for numRows, but sure can add other fields as well.
There was a problem hiding this comment.
I think this kind of unit tests is to check the an entire state of a snapshot rather than a particular value of the snapshot. There's no reason to check individual values in different tests under the same situation.
| } | ||
|
|
||
| @Test | ||
| public void testSegmentMetadataHolderNumRows() |
There was a problem hiding this comment.
Would you please add a comment about what this method is testing? It's not intuitive from the code.
| Set<DataSegment> segments = segmentsMetadata.keySet(); | ||
| Assert.assertEquals(segments.size(), 3); | ||
| DataSegment existingSegment = segments.stream().findFirst().orElse(null); | ||
| Assert.assertFalse(existingSegment == null); |
There was a problem hiding this comment.
Assert.assertNotNull(existingSegment)
| ImmutableDruidServer server = null; | ||
| for (ImmutableDruidServer druidServer : druidServers) { | ||
| for (DataSegment segment : druidServer.getSegments()) { | ||
| if (segment == existingSegment) { |
There was a problem hiding this comment.
It looks that this is to find a druidServer holding existingSegment. Why not
druidServers.stream()
.flatMap(druidServer -> druidServer.getSegments().stream())
.filter(segment -> segment.equals(existingSegment))
.findAny()
.orElse(null)? You also may want to make existingSegments final.
There was a problem hiding this comment.
Also, I don't think you're comparing these segments with == for performance reason. Please use equals() instead.
There was a problem hiding this comment.
changed to use streams api instead of nested loops
| 2L, //partition_num | ||
| 1L, //num_replicas | ||
| 3L, //numRows | ||
| 2L, //numRows |
There was a problem hiding this comment.
Would you please explain what this change is for?
There was a problem hiding this comment.
this change is because I created another ROWS3 and index3 above, such that now (segment3, index3) is added to walker, to make the test code more intuitive.
|
|
||
| private void addSegment(final DruidServerMetadata server, final DataSegment segment) | ||
| @VisibleForTesting | ||
| protected void addSegment(final DruidServerMetadata server, final DataSegment segment) |
There was a problem hiding this comment.
Would this be better as package private instead of protected to minimize scope? Since there's no subclasses in other packages, is there a need to be protected?
There was a problem hiding this comment.
ah yeah, no need for this and others to be protected, can be changed to default package access.
|
|
||
| private void setSegmentSignature(final DataSegment segment, final SegmentMetadataHolder segmentMetadataHolder) | ||
| @VisibleForTesting | ||
| protected void setSegmentMetadataHolder(final DataSegment segment, final SegmentMetadataHolder segmentMetadataHolder) |
There was a problem hiding this comment.
Could this also be package-private?
| public long getNumReplicas() | ||
| { | ||
| return segmentServerMap.get(segmentId).size(); | ||
| return segmentServers.size(); |
There was a problem hiding this comment.
Why do we return long in this method when Set#size() returns an int? Might not be worth changing because it might blow up other stuff.
There was a problem hiding this comment.
It used to be int, and then got changed to long at some point. The reason is because we support nulls for long here, but not for ints ? Not very sure...but will likely leave it to be long here.
jihoonson
left a comment
There was a problem hiding this comment.
@surekhasaharan thanks for the quick fix. I left some trivial comments.
| // Use TreeMap for segments so they are merged in deterministic order, from older to newer. | ||
| // This data structure need to be accessed in a thread-safe way since SystemSchema accesses it | ||
| // This data structure need to be accessed in a thread-safe way via lock Object since segments can be added, | ||
| //removed, refreshed or accessed asynchronously |
There was a problem hiding this comment.
nit: please add a space after //.
There was a problem hiding this comment.
ok, yeah will add at all the places
| segmentsMetadata = schema.getSegmentMetadata(); | ||
| existingSegment = segments.stream().findFirst().orElse(null); | ||
| final SegmentMetadataHolder currentHolder = segmentsMetadata.get(existingSegment); | ||
| Assert.assertEquals(updatedHolder.getNumRows(), currentHolder.getNumRows()); |
There was a problem hiding this comment.
I think this kind of unit tests is to check the an entire state of a snapshot rather than a particular value of the snapshot. There's no reason to check individual values in different tests under the same situation.
| 2L, //partition_num | ||
| 1L, //num_replicas | ||
| 3L, //numRows | ||
| 2L, //numRows |
| .orElse(null); | ||
| Assert.assertNotNull(existingSegment); | ||
| final SegmentMetadataHolder existingHolder = segmentsMetadata.get(existingSegment); | ||
| //update SegmentMetadataHolder of existingSegment with numRows=5 |
There was a problem hiding this comment.
nit: please make the style of comments consistent. I think it's better to add a space after // because most of comments do.
| private static final int MAX_SEGMENTS_PER_QUERY = 15000; | ||
| private static final long IS_PUBLISHED = 0; | ||
| private static final long IS_AVAILABLE = 1; | ||
| private static final long NUM_ROWS = 0; |
There was a problem hiding this comment.
These three should really be called DEFAULT_IS_PUBLISHED, DEFAULT_IS_AVAILABLE, and DEFAULT_NUM_ROWS.
| // Use TreeMap for segments so they are merged in deterministic order, from older to newer. | ||
| // This data structure need to be accessed in a thread-safe way since SystemSchema accesses it | ||
| // This data structure need to be accessed in a thread-safe way via lock Object | ||
| private final Map<String, TreeMap<DataSegment, SegmentMetadataHolder>> segmentMetadataInfo = new HashMap<>(); |
There was a problem hiding this comment.
Instead of this comment, you could add @GuardedBy("lock"). It sends the same message, and could also help with automated bug detection.
There was a problem hiding this comment.
good suggestion, thanks.
jihoonson
left a comment
There was a problem hiding this comment.
LGTM after CI. @surekhasaharan thanks!
* Fix the bug with num_rows in sys.segments * Fix segmentMetadataInfo update in DruidSchema * Add numRows to SegmentMetadataHolder builder's constructor, so it's not overwritten * Rename SegSegmentSignature to setSegmentMetadataHolder and fix it so nested map is appended instead of recreated * Replace Map<String, Set<String>> segmentServerMap with Set<String> for num_replica * Remove unnecessary code and update test * Add unit test for num_rows * PR comments * change access modifier to default package level * minor changes to comments * PR comments
* Fix the bug with num_rows in sys.segments * Fix segmentMetadataInfo update in DruidSchema * Add numRows to SegmentMetadataHolder builder's constructor, so it's not overwritten * Rename SegSegmentSignature to setSegmentMetadataHolder and fix it so nested map is appended instead of recreated * Replace Map<String, Set<String>> segmentServerMap with Set<String> for num_replica * Remove unnecessary code and update test * Add unit test for num_rows * PR comments * change access modifier to default package level * minor changes to comments * PR comments
* Fix the bug with num_rows in sys.segments * Fix segmentMetadataInfo update in DruidSchema * Add numRows to SegmentMetadataHolder builder's constructor, so it's not overwritten * Rename SegSegmentSignature to setSegmentMetadataHolder and fix it so nested map is appended instead of recreated * Replace Map<String, Set<String>> segmentServerMap with Set<String> for num_replica * Remove unnecessary code and update test * Add unit test for num_rows * PR comments * change access modifier to default package level * minor changes to comments * PR comments
* Fix the bug with num_rows in sys.segments * Fix segmentMetadataInfo update in DruidSchema * Add numRows to SegmentMetadataHolder builder's constructor, so it's not overwritten * Rename SegSegmentSignature to setSegmentMetadataHolder and fix it so nested map is appended instead of recreated * Replace Map<String, Set<String>> segmentServerMap with Set<String> for num_replica * Remove unnecessary code and update test * Add unit test for num_rows * PR comments * change access modifier to default package level * minor changes to comments * PR comments
* Fix the bug with num_rows in sys.segments * Fix segmentMetadataInfo update in DruidSchema * Add numRows to SegmentMetadataHolder builder's constructor, so it's not overwritten * Rename SegSegmentSignature to setSegmentMetadataHolder and fix it so nested map is appended instead of recreated * Replace Map<String, Set<String>> segmentServerMap with Set<String> for num_replica * Remove unnecessary code and update test * Add unit test for num_rows * PR comments * change access modifier to default package level * minor changes to comments * PR comments
num_rowscolumn ofsys.segmentswas incorrectly reported for some case. It happened because thesegmentMetadataInfoinDruidSchemawas overwritten inaddSegmentwhen the segment had more than one replica. This PR fixes:DruidSchemanumRowstoSegmentMetadataHolderbuilder's constructor, so it's not overwrittensetSegmentSignaturetosetSegmentMetadataHolderMap<String, Set<String>>withSet<String>for num_replica