feat: Reset offsets and backfill by aho135 · Pull Request #19191 · apache/druid

aho135 · 2026-03-20T21:46:46Z

For the initial implementation I've only done Kafka. I plan to implement Kinesis and Rabbit in subsequent PR's, followed by documentation for the new feature.

This change adds an endpoint called resetOffsetsAndBackfill to SupervisorResource. This is a useful feature for operating Druid clusters where the most recent data is the most important (such as alerting use cases).

Description

Adds an endpoint called resetOffsetsAndBackfill to automatically ingest skipped data in the case where the offset is reset to latest. This requires useEarliestOffset=false, useConcurrentLocks=true because there can be conflicting time intervals between the backfill task and the main supervisor tasks, useTransaction=false in order to disable metadata updates, and the Supervisor needs to be in a running state in order to call updatePartitionLagFromStream() to get the latest offsets

In addition, this change supports unsupervised SeekableStreamIndexTasks by setting useTransaction: false in the SeekableStreamIndexTaskIOConfig. The backfill tasks are one-off task submissions so the useTransaction flag was extended to disable any checkpointing (refer to changes in SeekableStreamIndexTaskRunner)

Limitations

Since these tasks are unsupervised, in the event of task failures it would be a little tricky for operators to understand what offset the backfill task has consumed until, in order to submit a new backfill task. Some kind of configurable retry policy for these tasks could be a useful follow up

Release note

Adds an optional parameter to the Supervisor reset endpoint to backfill the skipped data when the stream is reset to latest

Key changed/added classes in this PR

SupervisorResource
SupervisorManager
SeekableStreamSupervisor
KafkaSupervisor

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors. [I plan to add documentation for this feature in a follow-up after Rabbit/Kinesis are implemented]
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

…tition

…5/druid into reset-offsets-and-backfill

…upervised field

abhishekrb19

Thanks @aho135 for the feature! I’ve left some high-level comments. I’m slightly wary of having the supervisor manage unsupervised backfill tasks in the background (similar to the limitations like task failures/resiliency/reporting you already called out in the PR summary).

As an alternative design, I wonder if it would be cleaner to leverage multi-supervisor support and spin up a new backfill supervisor via an API. For example, the high-level API could compute the offset ranges (similar to what’s done in this PR), reset the running supervisor, and then submit a new backfill supervisor based on the active supervisor spec with a constructed transformSpec (similar to #18757
for reading topic partition-offset ranges). Another option could be introducing a separate ioConfig for a backfill supervisor, although that might be more involved and potentially overkill.

This would keep everything “supervised” and naturally address some of the limitations in the current design around task supervision. It would also provide a clear separation between the running and backfill supervisors from an operational standpoint.

What are your thoughts?

abhishekrb19 · 2026-04-08T14:28:10Z

+      // Get the backfillTaskCount from config
+      int backfillTaskCount = spec.getSpec().getIOConfig().getBackfillTaskCount();


Is there a reason to include this in the supervisor's ioConfig? It feels like a property that should be supplied by the operator when invoking the API, rather than being defined at the supervisor's ioConfig. It could instead default to taskCount / 2 (or a similar value) as part of the API invocation.

abhishekrb19 · 2026-04-08T14:36:53Z

-  public Response reset(@PathParam("id") final String id)
+  public Response reset(
+      @PathParam("id") final String id,
+      @QueryParam("backfill") Boolean backfill


IMO it would be cleaner to introduce a new endpoint /{id}/resetAndBackfill especially if we’re adding more backfill-specific properties

aho135 · 2026-04-09T23:00:32Z

Thanks @aho135 for the feature! I’ve left some high-level comments. I’m slightly wary of having the supervisor manage unsupervised backfill tasks in the background (similar to the limitations like task failures/resiliency/reporting you already called out in the PR summary).

As an alternative design, I wonder if it would be cleaner to leverage multi-supervisor support and spin up a new backfill supervisor via an API. For example, the high-level API could compute the offset ranges (similar to what’s done in this PR), reset the running supervisor, and then submit a new backfill supervisor based on the active supervisor spec with a constructed transformSpec (similar to #18757 for reading topic partition-offset ranges). Another option could be introducing a separate ioConfig for a backfill supervisor, although that might be more involved and potentially overkill.

This would keep everything “supervised” and naturally address some of the limitations in the current design around task supervision. It would also provide a clear separation between the running and backfill supervisors from an operational standpoint.

What are your thoughts?

Thanks for the review @abhishekrb19!

As an alternative design, I wonder if it would be cleaner to leverage multi-supervisor support and spin up a new backfill supervisor via an API

I was debating the same, but felt that this doesn't really fit into the existing StreamSupervisor implementation for these reasons:

taskDuration doesn't make sense in the context of a backfill task since ideally we would just want it to run to completion
Supervisors are long running. With this kind of short-lived Supervisor I think this requires some additional logic in the SupervisorManager to track the tasks to completion and then terminate the Supervisor.
These differences make me think it'd be worthwhile to implement a new kind of Supervisor that can handle task tracking and handle termination.

constructed transformSpec (similar to #18757 for reading topic partition-offset ranges)

I found the transformSpec approach a bit limiting because there we would be processing the entire range of data from earliest and dropping everything until we reach the needed offset. Wouldn't it be better to just directly use startSequenceNumbers/endSequenceNumbers and set partitionSequenceNumberMap like in this PR? Lmk if I'm missing something with the approach here though!

abhishekrb19 · 2026-04-13T18:13:36Z

I was debating the same, but felt that this doesn't really fit into the existing StreamSupervisor implementation for these reasons:

taskDuration doesn't make sense in the context of a backfill task since ideally we would just want it to run to completion

Supervisors are long running. With this kind of short-lived Supervisor I think this requires some additional logic in the SupervisorManager to track the tasks to completion and then terminate the Supervisor.

These differences make me think it'd be worthwhile to implement a new kind of Supervisor that can handle task tracking and handle termination.

Wouldn't it be better to just directly use startSequenceNumbers/endSequenceNumbers and set partitionSequenceNumberMap like in this PR

These are good points. I wonder if there’s any reason not to expose start and end sequence numbers in the supervisor config. Doing so would let the supervisor transition to a terminal state once those values are explicitly set and the specified offset range has been successfully read and published.

Also tagging @gianm @jtuglu1 @kfaraz - curious if you have any thoughts here.

FrankChen021 · 2026-04-25T05:13:42Z

+
+      // If backfillTaskCount is not provided, default to taskCount / 2
+      int taskCount = spec.getSpec().getIOConfig().getTaskCount();
+      int numBackfillTasks = backfillTaskCount != null ? backfillTaskCount : Math.max(1, taskCount / 2);


[P1] Reject non-positive backfill task counts before resetting offsets

backfillTaskCount comes directly from the new query parameter and can be 0 or negative. In that case numTasks becomes non-positive, the code later divides by numTasks, catches the exception, and silently skips backfill submission; because resetSupervisorAndBackfill has already reset the supervisor metadata to latest, this can acknowledge a reset while leaving the skipped range un-backfilled. Validate backfillTaskCount > 0 before performing the reset.

aho135 added 12 commits March 16, 2026 19:09

Update reset endpoint to return a map of skipped offsets for each par…

e552869

…tition

Checkstyle fixes

0c13ba8

Formatting fix for response

d5da02c

Support unsupervised task for SeekableStreamIndexTask

4034a98

Minor refactor

89fc7a9

Use addSequence helper method

84866f4

Automate backfill task submission for KafkaSupervisor

85ce022

Switch from java.util.Optional to com.google.common.base.Optional

543b9c7

Support for configurable number of backfill tasks

fa4b7a0

Rename skippedOffsetRange to backfillRange

ca99b94

Null check for startOffset

1f88656

Validate the main supervisor has useConcurrentLocks set to true

20e3cae

github-actions Bot added Area - Streaming Ingestion Area - Ingestion labels Mar 20, 2026

aho135 changed the title ~~Reset offsets and backfill~~ feat - Reset offsets and backfill Mar 20, 2026

Merge branch 'master' into reset-offsets-and-backfill

f94f804

aho135 changed the title ~~feat - Reset offsets and backfill~~ feat: Reset offsets and backfill Mar 20, 2026

aho135 added 2 commits March 20, 2026 14:55

javadoc update

ef17672

Merge branch 'reset-offsets-and-backfill' of https://github.com/aho13…

7d114e2

…5/druid into reset-offsets-and-backfill

github-advanced-security AI found potential problems Mar 20, 2026

View reviewed changes

aho135 added 10 commits March 25, 2026 15:49

Tweak logging

e3375d8

Address deprecation notices

87d75d3

Use existing useTransaction to disable checkpointing instead of new s…

fb0b041

…upervised field

Test coverage for calculateBackfillRange

2343b1b

Unit test fixes

ee2e619

Refactoring and Unit tests for calculateBackfillRange

8a5a61e

Add test cases for resetSupervisorAndBackfill

d46b5c8

Increase code coverage for SupervisorResourceTest

fbb3516

Unit tests for KafkaSupervisor submitBackfillTask

90f9d07

Fail early and don't reset it earliest/latest offsets are empty

c368e6a

Update SupervisorManager.java

66705c0

aho135 requested a review from abhishekrb19 March 31, 2026 21:11

abhishekrb19 reviewed Apr 8, 2026

View reviewed changes

aho135 added 3 commits April 10, 2026 10:29

Refactor into separate resetOffsetsAndBackfill endpoint

fe7e6bb

Set backfillTaskCount through API param instead of in IOConfig

4e9b68c

Unit test fix

7c4cef9

aho135 mentioned this pull request Apr 24, 2026

feat: Bounded Stream Supervisor #19372

Open

10 tasks

FrankChen021 reviewed Apr 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Reset offsets and backfill#19191

feat: Reset offsets and backfill#19191
aho135 wants to merge 29 commits intoapache:masterfrom
aho135:reset-offsets-and-backfill

aho135 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhishekrb19 left a comment •

edited

Loading

Uh oh!

abhishekrb19 Apr 8, 2026

Uh oh!

abhishekrb19 Apr 8, 2026

Uh oh!

aho135 commented Apr 9, 2026

Uh oh!

abhishekrb19 commented Apr 13, 2026

Uh oh!

FrankChen021 Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		// Get the backfillTaskCount from config
		int backfillTaskCount = spec.getSpec().getIOConfig().getBackfillTaskCount();

Conversation

aho135 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Limitations

Release note

Key changed/added classes in this PR

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abhishekrb19 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abhishekrb19 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

abhishekrb19 Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

aho135 commented Apr 9, 2026

Uh oh!

abhishekrb19 commented Apr 13, 2026

Uh oh!

FrankChen021 Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aho135 commented Mar 20, 2026 •

edited

Loading

abhishekrb19 left a comment •

edited

Loading