Skip to content

Non-table segment should return null for getId#17960

Merged
kfaraz merged 15 commits intoapache:masterfrom
cecemei:segment
May 3, 2025
Merged

Non-table segment should return null for getId#17960
kfaraz merged 15 commits intoapache:masterfrom
cecemei:segment

Conversation

@cecemei
Copy link
Copy Markdown
Contributor

@cecemei cecemei commented Apr 29, 2025

Update Segment.getId() with @Nullable annotation. RowBasedSegment (used by external/lookup/inline segment) and FrameSegment returns null for getId(). Only segment backed by a table should have a SegmentId.

Besides that,

  • added a InMemoryFakeSegment class in TestSegmentUtils for easier testing.

This PR is a follow-up to #17774 (review).

Key changed/added classes in this PR
  • Segment
  • ArrayListSegment
  • FrameSegment

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@github-actions github-actions Bot added Area - Batch Ingestion Area - Segment Format and Ser/De Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Apr 29, 2025
@cecemei cecemei marked this pull request as ready for review April 30, 2025 22:51
@cecemei cecemei changed the title Null SegmentId for non-table segment Non-table segment should return null for getId Apr 30, 2025
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new approach is certainly much cleaner than passing around a bunch of SegmentId.dummy() objects.

I just wonder if there was any advantage to keeping a segment ID for these cases.
But from the patch, it looks like the dummy segment ID didn't serve any purpose anyway. So I think we should be good.

Left some suggestions on the changes.

((ExternalSegment) segment).externalInputSource().toString()
);
} else if (segment instanceof LookupSegment) {
return StringUtils.format("lookup input source: %s", segment.getId().getDataSource());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still return this string without the segment id?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated this

.build();

public LookupSegment(final String lookupName, final LookupExtractorFactory lookupExtractorFactory)
public LookupSegment(final LookupExtractorFactory lookupExtractorFactory)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably keep the lookup name argument and use it in asString method of this class.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated to keep lookupName and return it in asString

Comment thread processing/src/main/java/org/apache/druid/segment/ReferenceCountingSegment.java Outdated
Comment thread processing/src/main/java/org/apache/druid/segment/Segment.java Outdated
Comment thread processing/src/test/java/org/apache/druid/segment/TestSegmentUtils.java Outdated
Comment thread processing/src/test/java/org/apache/druid/segment/TestSegmentUtils.java Outdated
/**
* A test segment that is backed by a {@link RowBasedSegment}. This is used to test the {@link QueryableIndexSegment}.
*/
public static class InMemoryFakeSegment extends QueryableIndexSegment
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static class InMemoryFakeSegment extends QueryableIndexSegment
public static class InMemoryTestSegment extends QueryableIndexSegment

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented May 1, 2025

The main purpose is to be able to identify whether a segment is backed by a table or not, which seems like might be better if we just return null for getId(), non-table segment should not have a valid SegmentId. I opened a new PR to implement it: #17960. This PR is therefore deprecated.

@cecemei , another more obvious way to achieve this would be to simply add a new boolean method isBackedByTable() or similar to the Segment interface.
I am not sure if you have already evaluated that option.

@cecemei
Copy link
Copy Markdown
Contributor Author

cecemei commented May 1, 2025

The main purpose is to be able to identify whether a segment is backed by a table or not, which seems like might be better if we just return null for getId(), non-table segment should not have a valid SegmentId. I opened a new PR to implement it: #17960. This PR is therefore deprecated.

@cecemei , another more obvious way to achieve this would be to simply add a new boolean method isBackedByTable() or similar to the Segment interface. I am not sure if you have already evaluated that option.

We discussed that briefly, it would work but felt a bit unnecessary to just add a method for that. We've also considered using as(PhysicalSegmentInspector.class) to return non-null for the purpose, but it felt a bit mis-use. SegmentId feels like the right place to change that.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

Comment thread processing/src/main/java/org/apache/druid/query/lookup/LookupSegment.java Outdated
…egment.java

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>
Comment thread processing/src/test/java/org/apache/druid/query/lookup/LookupSegmentTest.java Outdated
@kfaraz kfaraz merged commit 26b568d into apache:master May 3, 2025
74 checks passed
@capistrant capistrant added this to the 34.0.0 milestone Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Area - Segment Format and Ser/De

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants