Skip to content

pkg/util/workloadrepo: stabilize flaky TestCreatePartition#67793

Merged
ti-chi-bot[bot] merged 3 commits into
pingcap:masterfrom
flaky-claw:flakyfixer/case_8982396958ea-a1
May 17, 2026
Merged

pkg/util/workloadrepo: stabilize flaky TestCreatePartition#67793
ti-chi-bot[bot] merged 3 commits into
pingcap:masterfrom
flaky-claw:flakyfixer/case_8982396958ea-a1

Conversation

@flaky-claw
Copy link
Copy Markdown
Contributor

@flaky-claw flaky-claw commented Apr 15, 2026

What problem does this PR solve?

Issue Number: close #67461

Problem Summary:
Flaky test TestCreatePartition in pkg/util/workloadrepo intermittently fails, so this PR stabilizes that path.

What changed and how does it work?

Root Cause

TEST_ISSUE: the original flake is TestCreatePartition using unstable wall-clock day boundaries, and this address round fixes only the missing Bazel strict deps introduced by the prior plain-session test fix.

Fix

Adding //pkg/session and //pkg/session/sessionapi to workloadrepo_test is necessary so the existing worker_test.go imports build under TiDB's Bazel strict-deps validation.

Verification

Spec:

  • target: pkg/util/workloadrepo :: TestCreatePartition
  • strategy: tidb.go_flaky.default
  • plan mode: BASELINE_ONLY
  • requirements: required case must execute; no skip; repeat count = 1
  • baseline gates: required_flaky_gate, build_safety_gate, intent_guard_gate

Observed result:

  • status: passed
  • required case executed: yes
  • submission decision: ALLOWED
  • scope debt present: yes
    Required flaky gate passed.
    Build safety gate passed.
    Intent guard gate passed.

Gate checklist:

  • Required flaky gate: PASS
  • Build safety gate: PASS
  • Intent guard gate: PASS
  • Repo-wide advisory gate: SKIPPED
  • Feedback specific gate: SKIPPED

Commands:

  • go test -json ./pkg/util/workloadrepo -run '^TestCreatePartition$' -count=1
  • go test -json ./pkg/util/workloadrepo -count=1
  • make build

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Fixes #67461

Summary by CodeRabbit

  • Tests
    • Improved stability of partition behavior tests by normalizing timestamps near day boundaries, reducing flaky test results.

Review Change Stack

@ti-chi-bot ti-chi-bot Bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed labels Apr 15, 2026
@pantheon-ai
Copy link
Copy Markdown

pantheon-ai Bot commented Apr 15, 2026

@flaky-claw I've received your pull request and will start the review. I'll conduct a thorough review covering code quality, potential issues, and implementation details.

⏳ This process typically takes 10-30 minutes depending on the complexity of the changes.

ℹ️ Learn more details on Pantheon AI.

@ti-chi-bot ti-chi-bot Bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 15, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 15, 2026

Hi @flaky-claw. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4e4fd547-5139-4628-8e7d-1414eef2a67c

📥 Commits

Reviewing files that changed from the base of the PR and between 4310e3e and 581fa1e.

📒 Files selected for processing (1)
  • pkg/util/workloadrepo/worker_test.go

📝 Walkthrough

Walkthrough

TestCreatePartition is stabilized by normalizing the partition anchor time to a deterministic noon timestamp before creating date-based partitions, and removing a redundant subsequent time reassignment. This ensures consistent partition behavior regardless of when the test runs near day boundaries.

Changes

Test Timestamp Stabilization

Layer / File(s) Summary
Normalize partition anchor time to noon
pkg/util/workloadrepo/worker_test.go
now is reassigned to time.Date(..., 12, 0, 0, 0, ...) before partition creation to stabilize date-based partition behavior. A redundant now = time.Now() reassignment is removed so the normalized timestamp persists through subsequent assertions.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Suggested labels

size/S

Suggested reviewers

  • mjonss
  • hawkingrei

Poem

🐰 A timestamp at noon keeps flakiness at bay,
No more dancing 'round midnight to ruin the day!
The partition now stable, the test runs so clean,
The steadiest anchor that ever has been. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the problem (flaky test) and the package being modified, accurately reflecting the main change in the pull request.
Description check ✅ Passed The description includes the required issue reference, problem summary, explanation of changes and verification, and appropriate checklist selections. All critical sections are present and substantive.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.3120%. Comparing base (324c404) to head (581fa1e).
⚠️ Report is 146 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #67793        +/-   ##
================================================
- Coverage   77.6079%   76.3120%   -1.2959%     
================================================
  Files          1982       1995        +13     
  Lines        548895     575588     +26693     
================================================
+ Hits         425986     439243     +13257     
- Misses       122099     134741     +12642     
- Partials        810       1604       +794     
Flag Coverage Δ
integration 41.9429% <ø> (+7.6032%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <ø> (ø)
parser ∅ <ø> (∅)
br 48.3433% <ø> (-12.0927%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yinsustart
Copy link
Copy Markdown

/test mysql-test

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Apr 16, 2026

@yinsustart: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/test mysql-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/util/workloadrepo/worker_test.go Outdated
Comment thread pkg/util/workloadrepo/worker_test.go Outdated
@yinsustart
Copy link
Copy Markdown

/check-issue-triage-complete

@wuhuizuo
Copy link
Copy Markdown
Contributor

wuhuizuo commented May 3, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 3, 2026

@wuhuizuo: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@flaky-claw flaky-claw requested a review from henrybw May 9, 2026 04:37
@yinsustart yinsustart requested a review from bb7133 May 9, 2026 08:53
Copy link
Copy Markdown
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot Bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 14, 2026
@bb7133 bb7133 requested a review from mjonss May 14, 2026 06:03
@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels May 14, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 14, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-14 06:03:47.428017108 +0000 UTC m=+331995.960796437: ☑️ agreed by bb7133.
  • 2026-05-14 08:58:52.293768002 +0000 UTC m=+342500.826547321: ☑️ agreed by tiancaiamao.

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 14, 2026

@flaky-claw: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow 4310e3e link true /test fast_test_tiprow
tidb_parser_test 4310e3e link true /test tidb_parser_test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mjonss mjonss added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 14, 2026
@mjonss
Copy link
Copy Markdown
Contributor

mjonss commented May 14, 2026

I think this is a bit of AI slop, I am currently trying to find a better fix.

Copy link
Copy Markdown
Contributor

@mjonss mjonss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too much changes for the root cause.

Root cause

TestCreatePartition (pre-PR) does roughly this:

now := time.Now()                                  // T1
// ... 6 validatePartitionCreation + 1 createTableWithParts using T1 ...
now = time.Now()                                   // T2 (re-captured!)
wrk.setRepositoryDest(ctx, "table")
waitForTables(ctx, t, wrk, now)                    // checks partitions vs T2

Partition names/values are derived as now.AddDate(0,0,1) formatted as a date string, and checkTableExistsByIS returns lastPartition.After(now.AddDate(0,0,1)) with day-granularity. If T1 and T2 land on different calendar days (test starts near 23:59:5x), partitions created in earlier test cases use T1's day while waitForTables validates against T2's day, producing an off-by-one and a flake. This is the wall-clock day-boundary issue called out in the PR description.

What this PR actually changes

The real fix is two lines:

  1. Anchor now once at noon so AddDate(±N) is stable across the test's wall-clock duration.
  2. Drop the now = time.Now() re-capture before waitForTables so the same anchored now flows through the whole test.

Everything else is unrelated churn:

  • New //pkg/session + //pkg/session/sessionapi BUILD deps.
  • testkit.TestKit -> sessionapi.Session swap in TestCreatePartition only (sibling tests TestDropOldPartitions etc. keep using tk, so it is not even a consistent convention change).
  • createTableWithParts split into createTableWithPartsByExec + createTableWithParts + createTableWithPartsForSession (three functions where one existed).
  • validatePartitionCreation signature changed (tk -> se).

None of that interacts with time.Now() or partition arithmetic. The earlier review already raised this ("I don't understand why is this necessary?"), got an explanation only for the noon anchor, and never got a justification for the testkit -> session swap.

Suggested simpler fix

Just the two real lines, no helper changes, no BUILD edits, no testkit swap:

now := time.Now()
// AddDate / TO_DAYS arithmetic is day-granular; anchor at noon so the
// test's wall-clock duration cannot cross a day boundary mid-run.
now = time.Date(now.Year(), now.Month(), now.Day(), 12, 0, 0, 0, now.Location())
// ... unchanged test cases ...
// (delete the `now = time.Now()` line that precedes waitForTables)
wrk.setRepositoryDest(ctx, "table")
waitForTables(ctx, t, wrk, now)

That is a ~4-line diff vs. the current +38/-16, fixes the actual root cause, and avoids touching helpers and BUILD deps that are not on the path to the flake.

@ti-chi-bot ti-chi-bot Bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 15, 2026
@flaky-claw
Copy link
Copy Markdown
Contributor Author

@mjonss Thanks for the detailed root-cause analysis. I minimized the PR in 581fa1e: restored the original TestKit/helper path, removed the SessionAPI/Bazel dependency churn from the final PR diff, and kept only the wall-clock day-boundary fix: anchor now at noon and avoid re-capturing time.Now() before waitForTables.

Local validation passed:

  • ./tools/check/failpoint-go-test.sh pkg/util/workloadrepo -run '^TestCreatePartition$' -count=1
  • make lint

I could not deterministically reproduce the pre-fix midnight flake locally, but the remaining diff now matches the root cause you described.

@yinsustart yinsustart requested review from bb7133 and mjonss May 15, 2026 02:29
Copy link
Copy Markdown
Contributor

@mjonss mjonss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 17, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bb7133, mjonss, tiancaiamao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [bb7133,mjonss,tiancaiamao]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mjonss
Copy link
Copy Markdown
Contributor

mjonss commented May 17, 2026

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 17, 2026

@mjonss: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mjonss mjonss removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 17, 2026
@ti-chi-bot ti-chi-bot Bot merged commit 70ff16c into pingcap:master May 17, 2026
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky test: TestCreatePartition in pkg/util/workloadrepo

7 participants