Skip to content

feat: Bounded Stream Supervisor#19372

Open
aho135 wants to merge 9 commits intoapache:masterfrom
aho135:bounded-stream-supervisor
Open

feat: Bounded Stream Supervisor#19372
aho135 wants to merge 9 commits intoapache:masterfrom
aho135:bounded-stream-supervisor

Conversation

@aho135
Copy link
Copy Markdown
Contributor

@aho135 aho135 commented Apr 24, 2026

Description

Introduces a new property to the Stream Supervisor spec IOConfig called boundedStreamConfig which allows operators to specify start and end offset ranges for short-lived supervised ingestion. This property modifies the main Supervisor run loop to only ingest from and monitor partitions specified in the boundedStreamConfig. After the offset range has been consumed the Supervisor will transition into a terminal state (COMPLETED). The motivation for this PR came out of #19191 which submits backfill tasks that are unsupervised. One this change is merged, 19191 can be enhanced to use the boundedStreamConfig so that the backfill tasks are supervised.

Release note

Adds a property called boundedStreamConfig to the SeekableStreamSupervisorIOConfig which allows operators to spin up a Supervisor that consumes only a specified offset range.


Key changed/added classes in this PR
  • SeekableStreamSupervisorIOConfig
  • BoundedStreamConfig
  • SeekableStreamSupervisor

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@aho135 aho135 changed the title Bounded stream supervisor feat: Bounded stream supervisor Apr 24, 2026
@aho135 aho135 changed the title feat: Bounded stream supervisor feat: Bounded Stream Supervisor Apr 24, 2026
@aho135 aho135 requested a review from abhishekrb19 April 24, 2026 17:48
exclusiveStartSequenceNumberPartitions,
generateSequenceName(
unfilteredStartingSequencesForSequenceName == null
? startingSequences

@Override
protected String createPartitionIdFromString(String partitionIdString)
{

@Override
protected String createPartitionIdFromString(String partitionIdString)
{
return partitionIdString;
}

@Override
return partitionIdString;
}

@Override
@Override
protected boolean isOffsetAtOrBeyond(String current, String target)
{
return Long.parseLong(current) >= Long.parseLong(target);
@Override
protected boolean isOffsetAtOrBeyond(String current, String target)
{
return Long.parseLong(current) >= Long.parseLong(target);
Copy link
Copy Markdown
Contributor

@abhishekrb19 abhishekrb19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aho135 - took a quick glance and the approach looks good to me overall!
I’m still going through some of the main files and just checkpointing my review. Do you think it would be possible to add a simple embedded test with the new config for some end-to-end coverage?

this.startSequenceNumbers = Preconditions.checkNotNull(startSequenceNumbers, "startSequenceNumbers");
this.endSequenceNumbers = Preconditions.checkNotNull(endSequenceNumbers, "endSequenceNumbers");

// Validation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a guard rail, I think it'll be good to have stricter checks for each partition so there's no unintended behavior with incorrect ranges specified:
startSequenceNumbers < endSequenceNumbers

import java.util.Map;

/**
* Configuration for bounded (one-time) stream processing with explicit start/end offsets.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you document that the start & end offsets are inclusive?

)
{
this.startSequenceNumbers = Preconditions.checkNotNull(startSequenceNumbers, "startSequenceNumbers");
this.endSequenceNumbers = Preconditions.checkNotNull(endSequenceNumbers, "endSequenceNumbers");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could throw DruidExceptions here with the admin persona for these validation errors here

SUSPENDED(true, false),
STOPPING(true, false);
STOPPING(true, false),
COMPLETED(true, false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, if it's possible to have a completed but unhealthy state? Or would that be UNHEALTHY_SUPERVISOR?

@Nullable @JsonProperty("emitTimeLagMetrics") Boolean emitTimeLagMetrics,
@Nullable @JsonProperty("serverPriorityToReplicas") Map<Integer, Integer> serverPriorityToReplicas
@Nullable @JsonProperty("serverPriorityToReplicas") Map<Integer, Integer> serverPriorityToReplicas,
@Nullable @JsonProperty("boundedStreamConfig") BoundedStreamConfig boundedStreamConfig
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do the bounds interplay with the supervisor’s behavior with other properties like taskDuration, completionTimeout, etc.?

I suppose if the bounds are too wide, tasks would still roll over and all existing properties would be honored as-is. The only additional behavioral change is when this optional config is specified and the end sequence numbers are reached, the supervisor eventually moves to a terminal COMPLETED state?

Copy link
Copy Markdown
Contributor

@abhishekrb19 abhishekrb19 Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would also be good to update some user-facing documentation in supervisor.md for this functionality

// Kinesis uses String as partition ID, so just return the string as-is
return partitionIdString;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Kinesis bounded completion compares offsets lexicographically

isOffsetAtOrBeyond uses current.compareTo(target), which compares Kinesis sequence numbers lexicographically. Kinesis sequence numbers are decimal strings whose numeric order is not generally the same as lexicographic order, especially across different digit lengths. A bounded supervisor can mark a shard complete before the numeric end offset is reached, or keep recreating tasks after it passed the end. Compare Kinesis offsets using their numeric value or the existing Kinesis sequence-number ordering semantics.

@@ -4255,6 +4418,23 @@ private OrderedSequenceNumber<SequenceOffsetType> getOffsetFromStorageForPartiti
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Bounded starts can be ignored when metadata exists

getOffsetFromStorageForPartition only falls back to boundedStreamConfig.startSequenceNumbers when no metadata/checkpoint offset exists. If a supervisor is reset or reconfigured with a requested bounded start while metadata storage still has an older offset, the stored metadata wins and the task starts from the stale position instead of the user-supplied bounded start. That can skip the requested backfill range or process a different interval than configured. Bounded mode should either clear/namespace old metadata for the run or explicitly prefer the configured start when initializing the bounded task group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants