Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,10 @@ private Set<SegmentIdWithShardSpec> getPendingSegmentsForIntervalWithHandle(
final ResultIterator<byte[]> dbSegments =
handle.createQuery(
StringUtils.format(
"SELECT payload FROM %1$s WHERE dataSource = :dataSource AND start <= :end and %2$send%2$s >= :start",
// This query might fail if the year has a different number of digits
// See https://github.com/apache/druid/pull/11582 for a similar issue
// Using long for these timestamps instead of varchar would give correct time comparisons
"SELECT payload FROM %1$s WHERE dataSource = :dataSource AND start < :end and %2$send%2$s > :start",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the only material change in the patch: what's the idea behind changing <= to < and what's the expected benefit?

Copy link
Copy Markdown
Contributor Author

@kfaraz kfaraz Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to avoid selecting segments with intervals that only abut the search interval and don't have an actual overlap. Such intervals eventually get filtered out via the check on interval.overlaps(identifier.getInterval()) but only after the costly deserialization.

I have updated the description, which gives an example of such an interval.
This is still not a complete fix as the query can return incorrect results in cases where a string comparison of the two intervals is not feasible.

dbTables.getPendingSegmentsTable(), connector.getQuoteString()
)
)
Expand Down Expand Up @@ -584,7 +587,7 @@ private SegmentIdWithShardSpec allocatePendingSegmentWithSegmentLineageCheck(
.asBytes()
);

insertToMetastore(
insertPendingSegmentIntoMetastore(
handle,
newIdentifier,
dataSource,
Expand Down Expand Up @@ -662,7 +665,7 @@ private SegmentIdWithShardSpec allocatePendingSegment(
);

// always insert empty previous sequence id
insertToMetastore(handle, newIdentifier, dataSource, interval, "", sequenceName, sequenceNamePrevIdSha1);
insertPendingSegmentIntoMetastore(handle, newIdentifier, dataSource, interval, "", sequenceName, sequenceNamePrevIdSha1);

log.info("Allocated pending segment [%s] for sequence[%s] in DB", newIdentifier, sequenceName);

Expand Down Expand Up @@ -742,7 +745,7 @@ private static class CheckExistingSegmentIdResult
}
}

private void insertToMetastore(
private void insertPendingSegmentIntoMetastore(
Handle handle,
SegmentIdWithShardSpec newIdentifier,
String dataSource,
Expand Down Expand Up @@ -943,7 +946,7 @@ private SegmentIdWithShardSpec createNewSegment(

return new SegmentIdWithShardSpec(
dataSource,
overallMaxId.getInterval(),
interval,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to emphasize that the interval used in the SegmentIdWithShardSpec is always the same as the one passed to the method createNewSegment.

Preconditions.checkNotNull(newSegmentVersion, "newSegmentVersion"),
partialShardSpec.complete(
jsonMapper,
Expand Down Expand Up @@ -1468,4 +1471,5 @@ public int removeDataSourceMetadataOlderThan(long timestamp, @NotNull Set<String
}
);
}

}