Skip tombstone segment refresh in metadata cache#17025
Skip tombstone segment refresh in metadata cache#17025cryptoe merged 11 commits intoapache:masterfrom
Conversation
| return null; | ||
| } | ||
| return segmentMetadataInfo.get(datasource).get(segmentId); | ||
| return segmentMetadataInfo.getOrDefault(datasource, new ConcurrentSkipListMap<>()).get(segmentId); |
There was a problem hiding this comment.
Why would you create a new object if its not used. Less GC that way.
Isn't the older code more performant ?
There was a problem hiding this comment.
Reverted this change.
There was a problem hiding this comment.
My original branch was outdated, there was a race in the original implementation which was fixed in #16981.
| // Additionally, segment metadata queries, which are not yet implemented for tombstone segments | ||
| // (see: https://github.com/apache/druid/pull/12137) do not provide metadata for tombstones, | ||
| // leading to indefinite refresh attempts for these segments. | ||
| Set<SegmentId> segmentsWithoutTombstone = |
There was a problem hiding this comment.
Why do we need to materialize this.
Why can't we return a iterable which just skips the tombstone segments ?
We can always increment counters there no ?
There was a problem hiding this comment.
I have refactored to ensure that the stream is not materialized.
For count I have added a terminal operation but that wouldn't materialize the entire stream.
There was a problem hiding this comment.
I just changed the approach yesterday. Instead of filtering out the tombstone segments in the end before refresh, I am ensuring they are never marked for refresh.
A segment is marked for refresh in following scenarios:
- Segment is added.
- Datasource signature is built and schema for segment is missing.
- Metadata for the segment is fetched and schema for the segment is missing.
I have ensured that a tombstone segment never gets marked for refresh itself.
| markSegmentAsNeedRefresh(segmentId); | ||
| log.debug("SchemaMetadata for segmentId [%s] is absent.", segmentId); | ||
|
|
||
| if (entry.getValue().getSegment().isTombstone()) { |
There was a problem hiding this comment.
This block of code is repeated multiple times.
can be added in a method markForRefreshifnottombstone?
There was a problem hiding this comment.
Refactored it and also added the check for unused segment.
We do not want to mark an unused segment for refresh encountered while fetching metadata.
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache. It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache. The fix is to keep the segment in the metadata cache but skip them from refresh. This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache. It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache. The fix is to keep the segment in the metadata cache but skip them from refresh. This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache. It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache. The fix is to keep the segment in the metadata cache but skip them from refresh. This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
This PR apache#16890 introduced a change to skip adding tombstone segments to the cache. It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache. The fix is to keep the segment in the metadata cache but skip them from refresh. This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
This PR #16890 introduced a change to skip adding tombstone segments to the cache. It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache. The fix is to keep the segment in the metadata cache but skip them from refresh. This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.
Parent issue: #14989
This PR #16890 introduced a change to skip adding tombstone segments to the cache.
It turns out that as a side effect tombstone segments appear unavailable in the console. This happens because availability of a segment in Broker is determined from the metadata cache.
The fix is to keep the segment in the metadata cache but skip them from refresh.
This doesn't affect any functionality as metadata query for tombstone returns empty causing continuous refresh of those segments.