Skip to content

ddl: optimize temp index worker in highly conflicting case#61445

Merged
ti-chi-bot[bot] merged 7 commits into
pingcap:masterfrom
tangenta:add-index-opt-merge
Jul 18, 2025
Merged

ddl: optimize temp index worker in highly conflicting case#61445
ti-chi-bot[bot] merged 7 commits into
pingcap:masterfrom
tangenta:add-index-opt-merge

Conversation

@tangenta
Copy link
Copy Markdown
Contributor

@tangenta tangenta commented May 31, 2025

What problem does this PR solve?

Issue Number: close #61433

Problem Summary:

When there are lots of conflicts between DDL internal txn and DML txn, temp index worker can hardly make any progress.

What changed and how does it work?

  • Remove the LockKeys in DDL internal txn from merge index worker.
    • if both DDL txn and DML txn modify the same index key, there will be a conflict and DDL txn will retry. (after #62387)
    • if DDL txn modifies the index key, DML txn only modifies the row key, there will be no conflict anymore (this is the key improvement).
  • Delete temp index records after applying them to the original index to eliminate reentrant cost.
  • Reduce the batch count of DDL txn when suffering frequent conflicts.
  • Add metrics to monitor the progress of merging temp index:
    • tidb_ddl_temp_index_write: the counter of writing to temp index (single write).
    • tidb_ddl_temp_index_double_write: the counter of writing to temp index (double write).
    • tidb_ddl_temp_index_scan: the counter of scanned temp index value records.
    • tidb_ddl_temp_index_merge: the counter of merged temp index value records.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-triage-completed release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 31, 2025
@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 31, 2025

Hi @tangenta. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 31, 2025

Codecov Report

Attention: Patch coverage is 65.84507% with 97 lines in your changes missing coverage. Please review.

Project coverage is 74.9535%. Comparing base (65c3cf2) to head (f5af1ce).
Report is 23 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #61445        +/-   ##
================================================
+ Coverage   72.8617%   74.9535%   +2.0918%     
================================================
  Files          1755       1806        +51     
  Lines        485565     498079     +12514     
================================================
+ Hits         353791     373328     +19537     
+ Misses       110115     101555      -8560     
- Partials      21659      23196      +1537     
Flag Coverage Δ
integration 48.9491% <26.4084%> (?)
unit 72.2380% <65.8450%> (+0.1262%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.7804% <ø> (+0.1131%) ⬆️
parser ∅ <ø> (∅)
br 63.1240% <ø> (+16.7814%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ti-chi-bot ti-chi-bot Bot added needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. labels Jul 7, 2025
Copy link
Copy Markdown
Collaborator

@Benjamin2037 Benjamin2037 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jul 15, 2025
@ti-chi-bot ti-chi-bot Bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 15, 2025
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jul 15, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-07-15 07:07:55.600622079 +0000 UTC m=+2588328.323801046: ☑️ agreed by Benjamin2037.
  • 2025-07-15 09:01:21.775627137 +0000 UTC m=+2595134.498806119: ☑️ agreed by D3Hunter.

@Benjamin2037
Copy link
Copy Markdown
Collaborator

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jul 15, 2025

@tangenta: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow e0bd538 link true /test fast_test_tiprow

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tangenta added 2 commits July 15, 2025 20:37
Signed-off-by: tangenta <tangenta@126.com>
Signed-off-by: tangenta <tangenta@126.com>
// Lock the corresponding row keys so that it doesn't modify the index KVs
// that are changing by a pessimistic transaction.
rowKey := tablecodec.EncodeRecordKey(w.table.RecordPrefix(), idxRecord.handle)
err := txn.LockKeys(context.Background(), new(kv.LockCtx), rowKey)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it risky to remove locking row key for all back fill cases?

Copy link
Copy Markdown
Contributor

@zimulala zimulala Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR(#39936) adds LockKeys. Is it being removed here because the related issues have been resolved?
Additionally, the PR description doesn't seem to mention removing the LockKeys.

Copy link
Copy Markdown
Contributor

@zimulala zimulala Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the reason has been updated in the PR description. If you can, please briefly describe the manual testing

@tangenta
Copy link
Copy Markdown
Contributor Author

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented Jul 16, 2025

@tangenta: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread pkg/ddl/index_merge_tmp.go Outdated
return nil
})
if attempts <= 1 {
w.batchCnt = min(int(vardef.GetDDLReorgBatchSize()), w.batchCnt*2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If attempts <= 1, it follows the original logic. So, can this assignment be moved before the for loop, and could the condition check be removed?"​

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to put here so it is close to w.batchCnt /= 2 logic. It indicates that we will multiply by two in each iteration if we encountered a lot of conflicts previously.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was no -- operation in this attempt, so <=1 was the first attempt, and there was no need to revert to the original value the first time

// Lock the corresponding row keys so that it doesn't modify the index KVs
// that are changing by a pessimistic transaction.
rowKey := tablecodec.EncodeRecordKey(w.table.RecordPrefix(), idxRecord.handle)
err := txn.LockKeys(context.Background(), new(kv.LockCtx), rowKey)
Copy link
Copy Markdown
Contributor

@zimulala zimulala Jul 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR(#39936) adds LockKeys. Is it being removed here because the related issues have been resolved?
Additionally, the PR description doesn't seem to mention removing the LockKeys.

tangenta added 2 commits July 17, 2025 18:14
Signed-off-by: tangenta <tangenta@126.com>
Signed-off-by: tangenta <tangenta@126.com>
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Jul 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Benjamin2037, D3Hunter, zimulala

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label Jul 18, 2025
@ti-chi-bot ti-chi-bot Bot merged commit 35d9646 into pingcap:master Jul 18, 2025
29 checks passed
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jul 18, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-7.5: #62491.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jul 18, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #62492.
But this PR has conflicts, please resolve them!

@ti-chi-bot ti-chi-bot Bot added the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label Jul 18, 2025
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jul 18, 2025
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Copy Markdown
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #62495.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-7.5 Should cherry pick this PR to release-7.5 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fast reorg create index is stuck at the stage of merging temp index, and the ddl job fails after retries

6 participants