Skip to content

Do not clear pendingCompletionTaskGroups in clearAllocationInfo#18715

Merged
kfaraz merged 6 commits intoapache:masterfrom
amaechler:fix-autoscaler-duplicate-history
Nov 8, 2025
Merged

Do not clear pendingCompletionTaskGroups in clearAllocationInfo#18715
kfaraz merged 6 commits intoapache:masterfrom
amaechler:fix-autoscaler-duplicate-history

Conversation

@amaechler
Copy link
Copy Markdown
Contributor

Description

Fixes a bug where the SeekableStream supervisor autoscaler creates duplicate history entries every minTriggerScaleActionFrequencyMillis (default 10min) during scale-down operations, causing database pollution and preventing scale-down from completing.

Lots of help from Claude.

Problem

When the autoscaler scales down tasks, clearAllocationInfo() prematurely clears pendingCompletionTaskGroups, causing the supervisor to "forget" about tasks transitioning from READING to PUBLISHING state. On the next supervisor cycle, these tasks are rediscovered and re-added to activelyReadingTaskGroups, triggering another scale-down attempt and creating a duplicate history entry. This repeats every minTriggerScaleActionFrequencyMillis (default: 10 minutes). I saw hundreds of duplicate history entries, with entries created at exact 10-minute intervals.

The root cause is that the autoscaler has a built-in safeguard (line 480-496) to skip scale actions when pendingCompletionTaskGroups is non-empty, but this check is ineffective because clearAllocationInfo() clears the map immediately after tasks were moved there.

Solution

Preserve pendingCompletionTaskGroups in clearAllocationInfo(). This allows the autoscaler's existing skip logic to function correctly, preventing duplicate scale attempts until tasks naturally complete (removed by checkPendingCompletionTasks() every supervisor cycle).

Release note

Fixed a bug in the SeekableStream supervisor autoscaler where scale-down operations would create duplicate supervisor history entries. The autoscaler now correctly waits for tasks to complete before attempting subsequent scale operations.


Key changed/added classes in this PR
  • SeekableStreamSupervisor - Modified clearAllocationInfo() to preserve pendingCompletionTaskGroups

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @amaechler !

I have left a couple of minor suggestions.
We might want to add an embedded test for this race condition. But that need not block this bugfix.

@amaechler
Copy link
Copy Markdown
Contributor Author

Thanks @kfaraz for taking the time to review! I updated the wording a bit based on your suggestions. I'm not sure about how I could rewrite the test to be more high-level, so I kept the actual test for now.

Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix, @amaechler !

partitionOffsets.clear();

pendingCompletionTaskGroups.clear();
// Note: We intentionally do NOT clear pendingCompletionTaskGroups here.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the original line of code has already been removed, this comment seems out of place here. It has already been called out in the javadoc anyway.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Nov 7, 2025

I'm not sure about how I could rewrite the test to be more high-level, so I kept the actual test for now.

Fair enough, we can address that and add an embedded test in follow up PRs.

@kfaraz kfaraz merged commit aec5c4a into apache:master Nov 8, 2025
127 of 130 checks passed
santosh-d3vpl3x pushed a commit to santosh-d3vpl3x/druid that referenced this pull request Dec 13, 2025
Changes:
- Do not clear `pendingCompletionTaskGroups` in `clearAllocationInfo`
- Add unit test
@kgyrtkirk kgyrtkirk added this to the 36.0.0 milestone Jan 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants