Skip to content

test: add integration test about concurrent complex transactions#2472

Merged
ti-chi-bot[bot] merged 7 commits into
pingcap:masterfrom
hongyunyan:complex_transaction
Oct 3, 2025
Merged

test: add integration test about concurrent complex transactions#2472
ti-chi-bot[bot] merged 7 commits into
pingcap:masterfrom
hongyunyan:complex_transaction

Conversation

@hongyunyan
Copy link
Copy Markdown
Collaborator

@hongyunyan hongyunyan commented Oct 3, 2025

What problem does this PR solve?

Issue Number: ref #442

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 3, 2025
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @hongyunyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust integration test designed to validate TiCDC's ability to handle concurrent and complex transactional workloads. The test simulates a diverse set of operations across e-commerce, banking, and social networking scenarios, ensuring data consistency and replication integrity under high-stress conditions. This enhancement is crucial for verifying the stability and correctness of TiCDC when dealing with real-world, multi-table, and multi-operation transactions.

Highlights

  • New Integration Test: Introduced a comprehensive integration test named complex_transaction to simulate concurrent, complex transactional workloads.
  • Diverse Workload Simulation: The test simulates various real-world scenarios including e-commerce (order creation, cancellation, inventory adjustments), banking (single and multi-account transfers), and social networking (post creation, likes, comments, follows, user activity updates).
  • Robust Transaction Logic: Each simulated transaction type involves multiple DML operations across several tables, with built-in retry mechanisms for transient errors like deadlocks or lock wait timeouts.
  • Automated Data Consistency Verification: The test leverages sync_diff_inspector to automatically verify data consistency between the upstream and downstream databases after the workload execution.
  • CI Integration: The new integration test has been added to the CI pipeline's heavy MySQL test groups, ensuring continuous validation of TiCDC's behavior under complex transaction scenarios.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@hongyunyan
Copy link
Copy Markdown
Collaborator Author

/pull-cdc-mysql-integration-heavy

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new integration test for concurrent complex transactions, which is a great addition for ensuring system stability under heavy, mixed workloads. The implementation is comprehensive, covering various transaction types across e-commerce, banking, and social networking domains. My review focuses on improving correctness, maintainability, and addressing a few potential race conditions and bugs in the test logic. Key feedback includes fixing a race condition in the workload generation loop, correcting transaction handling for insufficient stock scenarios, and improving the logic for complex mixed transactions. I've also included some suggestions for code simplification and style.

Comment on lines +159 to +163
affected, _ := result.RowsAffected()
if affected == 0 {
// Insufficient stock, rollback
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

When there is insufficient stock, the function returns nil. The deferred tx.Rollback() will execute, so the transaction is aborted. However, the caller runWorkload will interpret the nil return as a success and incorrectly increment the success and transaction counters. This is a correctness bug. The function should return an error to indicate failure.

Suggested change
affected, _ := result.RowsAffected()
if affected == 0 {
// Insufficient stock, rollback
return nil
}
affected, _ := result.RowsAffected()
if affected == 0 {
// Insufficient stock, rollback and report error.
return errors.New("insufficient stock")
}

Comment on lines +114 to +145
for {
current := atomic.LoadInt64(&txnCounter)
if current >= *totalTxns {
break
}

// Select random transaction type
txType := selectTransactionType()

err := executor.ExecuteTransaction(ctx, txType)
if err != nil {
atomic.AddInt64(&failCount, 1)
log.Warn("Transaction failed",
zap.Int("worker", workerID),
zap.String("type", txType),
zap.Error(err))
// Continue on error, don't fail the whole test
continue
}

atomic.AddInt64(&successCount, 1)
count := atomic.AddInt64(&txnCounter, 1)

// Periodic logging
if count%1000 == 0 {
log.Info("Progress",
zap.Int64("completed", count),
zap.Int64("total", *totalTxns),
zap.Int64("success", atomic.LoadInt64(&successCount)),
zap.Int64("failed", atomic.LoadInt64(&failCount)))
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The loop termination logic has a race condition. Multiple workers can read txnCounter when its value is close to *totalTxns and all proceed, causing the total number of transactions to exceed the limit by up to concurrency - 1.

Additionally, txnCounter is only incremented on success. The flag description "total number of transactions to execute" is ambiguous, but typically implies the number of attempts. The current logic ties it to successful transactions.

A better approach is to atomically increment a counter at the start of each loop iteration to claim a "ticket". This resolves the race and clarifies that total-txns refers to attempted transactions.

			for {
				count := atomic.AddInt64(&txnCounter, 1)
				if count > *totalTxns {
					// We've gone over the limit, so we shouldn't execute.
					// Decrement to correct the final count.
					atomic.AddInt64(&txnCounter, -1)
					break
				}

				// Select random transaction type
				txType := selectTransactionType()

				err := executor.ExecuteTransaction(ctx, txType)
				if err != nil {
					atomic.AddInt64(&failCount, 1)
					log.Warn("Transaction failed",
						zap.Int("worker", workerID),
						zap.String("type", txType),
						zap.Error(err))
					// Continue on error, don't fail the whole test
					continue
				}

				atomic.AddInt64(&successCount, 1)

				// Periodic logging
				if count%1000 == 0 {
					log.Info("Progress",
						zap.Int64("completed", count),
						zap.Int64("total", *totalTxns),
						zap.Int64("success", atomic.LoadInt64(&successCount)),
						zap.Int64("failed", atomic.LoadInt64(&failCount)))
				}
			}

Comment on lines +684 to +691
case 1: // Update account
accountID := randomUserID()
delta := float64(rand.Intn(200)-100) * 10.0
if delta > 0 {
_, err = tx.ExecContext(ctx,
`UPDATE accounts SET balance = balance + ?, total_in = total_in + ?, version = version + 1 WHERE account_id = ?`,
delta, delta, accountID)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues in this part of the complexMixedTxn logic:

  1. When delta is negative (a withdrawal), the operation is skipped. This makes the "complex mixed" transaction less realistic as it only handles deposits.
  2. The UPDATE statement for a positive delta is missing the updated_at = ? field, which is inconsistent with other update operations in this test suite.

This should be fixed to correctly handle withdrawals and maintain data consistency.

Suggested change
case 1: // Update account
accountID := randomUserID()
delta := float64(rand.Intn(200)-100) * 10.0
if delta > 0 {
_, err = tx.ExecContext(ctx,
`UPDATE accounts SET balance = balance + ?, total_in = total_in + ?, version = version + 1 WHERE account_id = ?`,
delta, delta, accountID)
}
case 1: // Update account
accountID := randomUserID()
delta := float64(rand.Intn(200)-100) * 10.0
if delta > 0 {
_, err = tx.ExecContext(ctx,
`UPDATE accounts SET balance = balance + ?, total_in = total_in + ?, version = version + 1, updated_at = ? WHERE account_id = ?`,
delta, delta, now, accountID)
} else if delta < 0 {
_, err = tx.ExecContext(ctx,
`UPDATE accounts SET balance = balance + ?, total_out = total_out - ?, version = version + 1, updated_at = ? WHERE account_id = ? AND balance >= ?`,
delta, delta, now, accountID, -delta)
}

log.Fatal("DSN must be provided")
}

rand.Seed(time.Now().UnixNano())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

rand.Seed has been deprecated since Go 1.20. The global random number generator is now automatically seeded with a high-quality seed. You can safely remove this line if the project's Go version is 1.20 or higher.

Comment on lines +89 to +93
ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()

if err = db.PingContext(ctx); err != nil {
log.Fatal("Failed to ping database", zap.String("dsn", dsn), zap.Error(err))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variables ctx and cancel are shadowed here. While this is a common pattern, it can make code harder to read and debug, especially when dealing with multiple nested contexts. Consider using different names for the inner context and cancel function to avoid shadowing.

    pingCtx, pingCancel := context.WithTimeout(ctx, 5*time.Second)
	defer pingCancel()

	if err = db.PingContext(pingCtx); err != nil {
		log.Fatal("Failed to ping database", zap.String("dsn", dsn), zap.Error(err))

}
items = append(items, item)
}
rows.Close()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

rows.Close() is already deferred on line 221. This explicit call is redundant and can be removed for cleaner code.

Comment on lines +753 to +755
return contains(errMsg, "Deadlock") ||
contains(errMsg, "Lock wait timeout") ||
contains(errMsg, "try again later")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This can be simplified by using strings.Contains from the standard library. After this change, the now-unused contains and findSubstring functions (lines 758-771) can be removed.

Suggested change
return contains(errMsg, "Deadlock") ||
contains(errMsg, "Lock wait timeout") ||
contains(errMsg, "try again later")
return strings.Contains(errMsg, "Deadlock") ||
strings.Contains(errMsg, "Lock wait timeout") ||
strings.Contains(errMsg, "try again later")

@hongyunyan
Copy link
Copy Markdown
Collaborator Author

/pull-cdc-mysql-integration-heavy

@hongyunyan
Copy link
Copy Markdown
Collaborator Author

/test pull-cdc-mysql-integration-heavy

@ti-chi-bot ti-chi-bot Bot added the lgtm label Oct 3, 2025
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Oct 3, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-10-03 08:00:57.173630371 +0000 UTC m=+417237.430361761: ☑️ agreed by lidezhu.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented Oct 3, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flowbehappy, lidezhu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [flowbehappy,lidezhu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@hongyunyan
Copy link
Copy Markdown
Collaborator Author

/retest

@ti-chi-bot ti-chi-bot Bot merged commit 0ebea70 into pingcap:master Oct 3, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants