Skip to content

Do not allow double-wrap of segment reference in ReferenceCountingSegment#17943

Merged
clintropolis merged 10 commits intoapache:masterfrom
cecemei:test
May 7, 2025
Merged

Do not allow double-wrap of segment reference in ReferenceCountingSegment#17943
clintropolis merged 10 commits intoapache:masterfrom
cecemei:test

Conversation

@cecemei
Copy link
Copy Markdown
Contributor

@cecemei cecemei commented Apr 22, 2025

This PR is a follow-up of discussion.

We've decided to not allow ReferenceCountingSegment wrap another SegmentReference. Places that double-wraps:

  • DartDataSegmentProvider, which created a double-wrapped segment via using ReferenceCountingSegment in CompleteSegment. Using ReferenceCountingSegment in CompleteSegment should also be forbidden.
  • Tests in SegmentManagerTest, which is an easy fix.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 23, 2025
@cecemei cecemei marked this pull request as ready for review April 24, 2025 00:35
@cecemei cecemei changed the title test segment Do not allow double-wrap of segment reference in ReferenceCountingSegment Apr 24, 2025
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree that ReferenceCountingSegment shouldn't be able to wrap another SegmentReference since that would make it easy to make mistakes of not being correctly be tied to the reference counting of the wrapped segment.

Comment on lines +99 to +102
return new ReferenceCountingResourceHolder<>(new CompleteSegment(
null,
Objects.requireNonNull(segment.getBaseSegment())
), closer);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is worth a comment about what is going and why we are stripping the reference counting from the segment, however i'm not sure it is the ideal way to handle this, see other comment on CompleteSegment

Copy link
Copy Markdown
Contributor

@gianm gianm Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a clearer comment would be:

// It isn't necessary to pass down the SegmentReference to CompleteSegment, because we've
// already called segment.acquireReferences() and have attached the releaser to "closer".

Btw, a side note I have realized as part of thinking about this: in native, we apply the segment map fn first and then acquire a segment reference; whereas with Dart we acquire the reference first and then apply the segment map fn. IMO, the way we do it in Dart makes more sense. If native worked the same way, then segment map fns could use Segment rather than SegmentReference, and the call to ReferenceCountingSegment.wrapRootGenerationSegment in BaseLeafFrameProcessor wouldn't necessary. (There is no need to change this right now, it's fine, I'm just making an observation.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts: it might be nice to introduce a BaseSegment and make LookupSegment, QueryableIndexSegment extends that, also restricts ReferenceCountingSegment and CompleteSegment wrap with that. And it might be nice for ReferenceCountingSegment.acquireReferences to return a closable segment that would close up the reference, so the code doesn't have to deal with the closer logic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, a side note I have realized as part of thinking about this: in native, we apply the segment map fn first and then acquire a segment reference; whereas with Dart we acquire the reference first and then apply the segment map fn. IMO, the way we do it in Dart makes more sense. If native worked the same way, then segment map fns could use Segment rather than SegmentReference, and the call to ReferenceCountingSegment.wrapRootGenerationSegment in BaseLeafFrameProcessor wouldn't necessary. (There is no need to change this right now, it's fine, I'm just making an observation.)

I think if we did this it is maybe confusing for SegmentReference to exist at all (which is fair, it is kind of confusing), and instead should just be a ReferenceCountedObject that doesn't implement Segment and we use to get a Segment. This sounds like a pretty big change, though I guess would make things less ambiguous of whether or not we are dealing with the "real" Segment or the reference counting wrapper.

I do think that one nice thing of the way things are right now is that any bugs caused by having a stale reference to a reference counting segment that has actually been closed are maybe a bit clearer what is going on by using the SegmentReference as a proxy for operating on Segment because it returns null for all of the things if the backing segment has been closed/unmapped/dropped instead of what i imagine to be a variety of different errors if you had a stale reference to the real Segment object and tried to do things with it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re the stale reference, I think it might make more sense if we add isClosed to Segment interface. I feel the Segment is defined too broadly, and we should narrow the usage into:

  • BaseSegment, the real segment, close this usually means unlink the mmap.
  • ReferenceCountingSegment, this should only live in data nodes, places we actually care about the # of registered parties. It expose acquireSegmentReference, which registers a reference and returns a CloseableSegmentReference, close this means the base segment would be unlinked after all parties deregister, also acquireSegmentReference would return empty.
  • ClosableSegmentReference, this should replace the current SegmentReference, close this means deregister itself, but should have no direct impact to the base segment. segmentMapFn should operate on this.

Comment thread processing/src/main/java/org/apache/druid/segment/ReferenceCountingSegment.java Outdated
Comment thread processing/src/main/java/org/apache/druid/segment/CompleteSegment.java Outdated
Comment on lines +99 to +102
return new ReferenceCountingResourceHolder<>(new CompleteSegment(
null,
Objects.requireNonNull(segment.getBaseSegment())
), closer);
Copy link
Copy Markdown
Contributor

@gianm gianm Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO a clearer comment would be:

// It isn't necessary to pass down the SegmentReference to CompleteSegment, because we've
// already called segment.acquireReferences() and have attached the releaser to "closer".

Btw, a side note I have realized as part of thinking about this: in native, we apply the segment map fn first and then acquire a segment reference; whereas with Dart we acquire the reference first and then apply the segment map fn. IMO, the way we do it in Dart makes more sense. If native worked the same way, then segment map fns could use Segment rather than SegmentReference, and the call to ReferenceCountingSegment.wrapRootGenerationSegment in BaseLeafFrameProcessor wouldn't necessary. (There is no need to change this right now, it's fine, I'm just making an observation.)

Comment thread processing/src/main/java/org/apache/druid/segment/ReferenceCountingSegment.java Outdated
@cecemei cecemei requested review from clintropolis and gianm April 29, 2025 17:46
@clintropolis clintropolis merged commit c8b43ac into apache:master May 7, 2025
74 checks passed
@capistrant capistrant added this to the 34.0.0 milestone Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Segment Format and Ser/De GHA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants