Fix flaky MergeRollupMinionClusterIntegrationTest gauge assertions#18260
Merged
xiangfu0 merged 1 commit intoapache:masterfrom Apr 20, 2026
Merged
Fix flaky MergeRollupMinionClusterIntegrationTest gauge assertions#18260xiangfu0 merged 1 commit intoapache:masterfrom
xiangfu0 merged 1 commit intoapache:masterfrom
Conversation
Extends the polling pattern introduced in apache#18253 (for mergeRollupTaskNumBucketsToProcess) to the remaining five mergeRollupTaskDelayInNumBuckets.* gaugeExists checks in the same test class. The gauge is registered by MergeRollupTaskGenerator.createOrUpdateDelayMetrics and removed by resetDelayMetrics when a scheduleTasks call observes no eligible segments. The per-iteration body's assertNull(scheduleTasks(context).get(RealtimeToOfflineSegmentsTask)) probe triggers an extra synchronized scheduleTasks that can race with the previous merge task's segment-lineage commit, transiently resetting the gauge and causing the post-loop assertTrue(gaugeExists(...)) to flake on the same window that apache#18253 addressed. A new waitForGaugesToExist(String...) helper polls via TestUtils.waitForCondition with the existing TIMEOUT_IN_MS, and is used in testOfflineTableSingleLevelConcat, testOfflineTableSingleLevelConcatWithMetadataPush, testOfflineTableSingleLevelRollup, testOfflineTableMultiLevelConcat (both 45days + 90days atomically), and testRealtimeTableSingleLevelConcat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18260 +/- ##
============================================
+ Coverage 63.48% 63.51% +0.02%
Complexity 1627 1627
============================================
Files 3244 3244
Lines 197365 197365
Branches 30540 30540
============================================
+ Hits 125306 125351 +45
+ Misses 62019 61975 -44
+ Partials 10040 10039 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reduces flakiness in MergeRollupMinionClusterIntegrationTest by replacing one-shot gauge existence assertions with a polling helper, aligning the remaining mergeRollupTaskDelayInNumBuckets.* checks with the polling approach previously introduced for mergeRollupTaskNumBucketsToProcess.*.
Changes:
- Replaced direct
assertTrue(MetricValueUtils.gaugeExists(...))checks with a new polling helper in 5 test cases. - Added
waitForGaugesToExist(String... metricNames)that usesTestUtils.waitForConditionand the existingTIMEOUT_IN_MS.
Jackie-Jiang
approved these changes
Apr 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the polling pattern introduced in #18253 (for
mergeRollupTaskNumBucketsToProcess) to the remaining fivemergeRollupTaskDelayInNumBuckets.*gaugeExistschecks in the same test class.Root cause
The gauge is registered by
MergeRollupTaskGenerator.createOrUpdateDelayMetricsand removed byresetDelayMetricswhen ascheduleTaskscall observes no eligible segments for the table. Each per-iteration body'sprobe triggers an extra synchronized
scheduleTasksthat can race with the previous merge task's segment-lineage commit — transiently reaching theresetDelayMetricsbranch and removing the gauge. The post-loopassertTrue(MetricValueUtils.gaugeExists(...))then flakes on the same window that #18253 addressed for the processAll-mode test.Fix
Added a
waitForGaugesToExist(String...)helper that polls viaTestUtils.waitForConditionwith the existingTIMEOUT_IN_MS, mirroringwaitForExpectedNumBucketsToProcessintroduced in #18253. Replaced theassertTrue(gaugeExists(...))calls in:testOfflineTableSingleLevelConcattestOfflineTableSingleLevelConcatWithMetadataPushtestOfflineTableSingleLevelRolluptestOfflineTableMultiLevelConcat(polls45days+90daysatomically)testRealtimeTableSingleLevelConcatTest plan
./mvnw test-compile -pl pinot-integration-tests -ampasses../mvnw spotless:apply checkstyle:check license:format license:check -pl pinot-integration-testsclean.MergeRollupMinionClusterIntegrationTestsuccessfully across multiple runs.🤖 Generated with Claude Code