pkg/util/workloadrepo: stabilize flaky TestCreatePartition#67793
Conversation
|
@flaky-claw I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details. ⏳ This process typically takes 10-30 minutes depending on the complexity of the changes. ℹ️ Learn more details on Pantheon AI. |
|
Hi @flaky-claw. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughTestCreatePartition is stabilized by normalizing the partition anchor time to a deterministic noon timestamp before creating date-based partitions, and removing a redundant subsequent time reassignment. This ensures consistent partition behavior regardless of when the test runs near day boundaries. ChangesTest Timestamp Stabilization
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #67793 +/- ##
================================================
- Coverage 77.6079% 76.3120% -1.2959%
================================================
Files 1982 1995 +13
Lines 548895 575588 +26693
================================================
+ Hits 425986 439243 +13257
- Misses 122099 134741 +12642
- Partials 810 1604 +794
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/test mysql-test |
|
@yinsustart: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/check-issue-triage-complete |
|
/retest |
|
@wuhuizuo: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[LGTM Timeline notifier]Timeline:
|
|
@flaky-claw: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
I think this is a bit of AI slop, I am currently trying to find a better fix. |
mjonss
left a comment
There was a problem hiding this comment.
This is too much changes for the root cause.
Root cause
TestCreatePartition (pre-PR) does roughly this:
now := time.Now() // T1
// ... 6 validatePartitionCreation + 1 createTableWithParts using T1 ...
now = time.Now() // T2 (re-captured!)
wrk.setRepositoryDest(ctx, "table")
waitForTables(ctx, t, wrk, now) // checks partitions vs T2Partition names/values are derived as now.AddDate(0,0,1) formatted as a date string, and checkTableExistsByIS returns lastPartition.After(now.AddDate(0,0,1)) with day-granularity. If T1 and T2 land on different calendar days (test starts near 23:59:5x), partitions created in earlier test cases use T1's day while waitForTables validates against T2's day, producing an off-by-one and a flake. This is the wall-clock day-boundary issue called out in the PR description.
What this PR actually changes
The real fix is two lines:
- Anchor
nowonce at noon soAddDate(±N)is stable across the test's wall-clock duration. - Drop the
now = time.Now()re-capture beforewaitForTablesso the same anchorednowflows through the whole test.
Everything else is unrelated churn:
- New
//pkg/session+//pkg/session/sessionapiBUILD deps. testkit.TestKit->sessionapi.Sessionswap inTestCreatePartitiononly (sibling testsTestDropOldPartitionsetc. keep usingtk, so it is not even a consistent convention change).createTableWithPartssplit intocreateTableWithPartsByExec+createTableWithParts+createTableWithPartsForSession(three functions where one existed).validatePartitionCreationsignature changed (tk->se).
None of that interacts with time.Now() or partition arithmetic. The earlier review already raised this ("I don't understand why is this necessary?"), got an explanation only for the noon anchor, and never got a justification for the testkit -> session swap.
Suggested simpler fix
Just the two real lines, no helper changes, no BUILD edits, no testkit swap:
now := time.Now()
// AddDate / TO_DAYS arithmetic is day-granular; anchor at noon so the
// test's wall-clock duration cannot cross a day boundary mid-run.
now = time.Date(now.Year(), now.Month(), now.Day(), 12, 0, 0, 0, now.Location())
// ... unchanged test cases ...
// (delete the `now = time.Now()` line that precedes waitForTables)
wrk.setRepositoryDest(ctx, "table")
waitForTables(ctx, t, wrk, now)That is a ~4-line diff vs. the current +38/-16, fixes the actual root cause, and avoids touching helpers and BUILD deps that are not on the path to the flake.
|
@mjonss Thanks for the detailed root-cause analysis. I minimized the PR in 581fa1e: restored the original TestKit/helper path, removed the SessionAPI/Bazel dependency churn from the final PR diff, and kept only the wall-clock day-boundary fix: anchor Local validation passed:
I could not deterministically reproduce the pre-fix midnight flake locally, but the remaining diff now matches the root cause you described. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bb7133, mjonss, tiancaiamao The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
@mjonss: PRs from untrusted users cannot be marked as trusted with DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What problem does this PR solve?
Issue Number: close #67461
Problem Summary:
Flaky test
TestCreatePartitioninpkg/util/workloadrepointermittently fails, so this PR stabilizes that path.What changed and how does it work?
Root Cause
TEST_ISSUE: the original flake is
TestCreatePartitionusing unstable wall-clock day boundaries, and this address round fixes only the missing Bazel strict deps introduced by the prior plain-session test fix.Fix
Adding
//pkg/sessionand//pkg/session/sessionapitoworkloadrepo_testis necessary so the existingworker_test.goimports build under TiDB's Bazel strict-deps validation.Verification
Spec:
pkg/util/workloadrepo :: TestCreatePartitiontidb.go_flaky.defaultBASELINE_ONLYObserved result:
Required flaky gate passed.
Build safety gate passed.
Intent guard gate passed.
Gate checklist:
Commands:
go test -json ./pkg/util/workloadrepo -run '^TestCreatePartition$' -count=1go test -json ./pkg/util/workloadrepo -count=1make buildCheck List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Fixes #67461
Summary by CodeRabbit