test: add integration test about concurrent complex transactions by hongyunyan · Pull Request #2472 · pingcap/ticdc

hongyunyan · 2025-10-03T04:02:24Z

What problem does this PR solve?

Issue Number: ref #442

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

gemini-code-assist · 2025-10-03T04:02:42Z

Summary of Changes

Hello @hongyunyan, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust integration test designed to validate TiCDC's ability to handle concurrent and complex transactional workloads. The test simulates a diverse set of operations across e-commerce, banking, and social networking scenarios, ensuring data consistency and replication integrity under high-stress conditions. This enhancement is crucial for verifying the stability and correctness of TiCDC when dealing with real-world, multi-table, and multi-operation transactions.

Highlights

New Integration Test: Introduced a comprehensive integration test named complex_transaction to simulate concurrent, complex transactional workloads.
Diverse Workload Simulation: The test simulates various real-world scenarios including e-commerce (order creation, cancellation, inventory adjustments), banking (single and multi-account transfers), and social networking (post creation, likes, comments, follows, user activity updates).
Robust Transaction Logic: Each simulated transaction type involves multiple DML operations across several tables, with built-in retry mechanisms for transient errors like deadlocks or lock wait timeouts.
Automated Data Consistency Verification: The test leverages sync_diff_inspector to automatically verify data consistency between the upstream and downstream databases after the workload execution.
CI Integration: The new integration test has been added to the CI pipeline's heavy MySQL test groups, ensuring continuous validation of TiCDC's behavior under complex transaction scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

hongyunyan · 2025-10-03T04:03:05Z

/pull-cdc-mysql-integration-heavy

gemini-code-assist

Code Review

This pull request introduces a new integration test for concurrent complex transactions, which is a great addition for ensuring system stability under heavy, mixed workloads. The implementation is comprehensive, covering various transaction types across e-commerce, banking, and social networking domains. My review focuses on improving correctness, maintainability, and addressing a few potential race conditions and bugs in the test logic. Key feedback includes fixing a race condition in the workload generation loop, correcting transaction handling for insufficient stock scenarios, and improving the logic for complex mixed transactions. I've also included some suggestions for code simplification and style.

gemini-code-assist · 2025-10-03T04:05:09Z

+		affected, _ := result.RowsAffected()
+		if affected == 0 {
+			// Insufficient stock, rollback
+			return nil
+		}


When there is insufficient stock, the function returns nil. The deferred tx.Rollback() will execute, so the transaction is aborted. However, the caller runWorkload will interpret the nil return as a success and incorrectly increment the success and transaction counters. This is a correctness bug. The function should return an error to indicate failure.

Suggested change

affected, _ := result.RowsAffected()

if affected == 0 {

// Insufficient stock, rollback

return nil

}

affected, _ := result.RowsAffected()

if affected == 0 {

// Insufficient stock, rollback and report error.

return errors.New("insufficient stock")

}

gemini-code-assist · 2025-10-03T04:05:10Z

+			for {
+				current := atomic.LoadInt64(&txnCounter)
+				if current >= *totalTxns {
+					break
+				}
+
+				// Select random transaction type
+				txType := selectTransactionType()
+
+				err := executor.ExecuteTransaction(ctx, txType)
+				if err != nil {
+					atomic.AddInt64(&failCount, 1)
+					log.Warn("Transaction failed",
+						zap.Int("worker", workerID),
+						zap.String("type", txType),
+						zap.Error(err))
+					// Continue on error, don't fail the whole test
+					continue
+				}
+
+				atomic.AddInt64(&successCount, 1)
+				count := atomic.AddInt64(&txnCounter, 1)
+
+				// Periodic logging
+				if count%1000 == 0 {
+					log.Info("Progress",
+						zap.Int64("completed", count),
+						zap.Int64("total", *totalTxns),
+						zap.Int64("success", atomic.LoadInt64(&successCount)),
+						zap.Int64("failed", atomic.LoadInt64(&failCount)))
+				}
+			}


The loop termination logic has a race condition. Multiple workers can read txnCounter when its value is close to *totalTxns and all proceed, causing the total number of transactions to exceed the limit by up to concurrency - 1.

Additionally, txnCounter is only incremented on success. The flag description "total number of transactions to execute" is ambiguous, but typically implies the number of attempts. The current logic ties it to successful transactions.

A better approach is to atomically increment a counter at the start of each loop iteration to claim a "ticket". This resolves the race and clarifies that total-txns refers to attempted transactions.

for { count := atomic.AddInt64(&txnCounter, 1) if count > *totalTxns { // We've gone over the limit, so we shouldn't execute. // Decrement to correct the final count. atomic.AddInt64(&txnCounter, -1) break } // Select random transaction type txType := selectTransactionType() err := executor.ExecuteTransaction(ctx, txType) if err != nil { atomic.AddInt64(&failCount, 1) log.Warn("Transaction failed", zap.Int("worker", workerID), zap.String("type", txType), zap.Error(err)) // Continue on error, don't fail the whole test continue } atomic.AddInt64(&successCount, 1) // Periodic logging if count%1000 == 0 { log.Info("Progress", zap.Int64("completed", count), zap.Int64("total", *totalTxns), zap.Int64("success", atomic.LoadInt64(&successCount)), zap.Int64("failed", atomic.LoadInt64(&failCount))) } }

gemini-code-assist · 2025-10-03T04:05:10Z

+		case 1: // Update account
+			accountID := randomUserID()
+			delta := float64(rand.Intn(200)-100) * 10.0
+			if delta > 0 {
+				_, err = tx.ExecContext(ctx,
+					`UPDATE accounts SET balance = balance + ?, total_in = total_in + ?, version = version + 1 WHERE account_id = ?`,
+					delta, delta, accountID)
+			}


There are two issues in this part of the complexMixedTxn logic:

When delta is negative (a withdrawal), the operation is skipped. This makes the "complex mixed" transaction less realistic as it only handles deposits.

The UPDATE statement for a positive delta is missing the updated_at = ? field, which is inconsistent with other update operations in this test suite.

This should be fixed to correctly handle withdrawals and maintain data consistency.

Suggested change

case 1: // Update account

accountID := randomUserID()

delta := float64(rand.Intn(200)-100) * 10.0

if delta > 0 {

_, err = tx.ExecContext(ctx,

`UPDATE accounts SET balance = balance + ?, total_in = total_in + ?, version = version + 1 WHERE account_id = ?`,

delta, delta, accountID)

}

case 1: // Update account

accountID := randomUserID()

delta := float64(rand.Intn(200)-100) * 10.0

if delta > 0 {

_, err = tx.ExecContext(ctx,

`UPDATE accounts SET balance = balance + ?, total_in = total_in + ?, version = version + 1, updated_at = ? WHERE account_id = ?`,

delta, delta, now, accountID)

} else if delta < 0 {

_, err = tx.ExecContext(ctx,

`UPDATE accounts SET balance = balance + ?, total_out = total_out - ?, version = version + 1, updated_at = ? WHERE account_id = ? AND balance >= ?`,

delta, delta, now, accountID, -delta)

}

gemini-code-assist · 2025-10-03T04:05:10Z

+		log.Fatal("DSN must be provided")
+	}
+
+	rand.Seed(time.Now().UnixNano())


rand.Seed has been deprecated since Go 1.20. The global random number generator is now automatically seeded with a high-quality seed. You can safely remove this line if the project's Go version is 1.20 or higher.

gemini-code-assist · 2025-10-03T04:05:10Z

+	ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
+	defer cancel()
+
+	if err = db.PingContext(ctx); err != nil {
+		log.Fatal("Failed to ping database", zap.String("dsn", dsn), zap.Error(err))


The variables ctx and cancel are shadowed here. While this is a common pattern, it can make code harder to read and debug, especially when dealing with multiple nested contexts. Consider using different names for the inner context and cancel function to avoid shadowing.

pingCtx, pingCancel := context.WithTimeout(ctx, 5*time.Second) defer pingCancel() if err = db.PingContext(pingCtx); err != nil { log.Fatal("Failed to ping database", zap.String("dsn", dsn), zap.Error(err))

gemini-code-assist · 2025-10-03T04:05:10Z

+		}
+		items = append(items, item)
+	}
+	rows.Close()


rows.Close() is already deferred on line 221. This explicit call is redundant and can be removed for cleaner code.

gemini-code-assist · 2025-10-03T04:05:10Z

+	return contains(errMsg, "Deadlock") ||
+		contains(errMsg, "Lock wait timeout") ||
+		contains(errMsg, "try again later")


This can be simplified by using strings.Contains from the standard library. After this change, the now-unused contains and findSubstring functions (lines 758-771) can be removed.

Suggested change

return contains(errMsg, "Deadlock") ||

contains(errMsg, "Lock wait timeout") ||

contains(errMsg, "try again later")

return strings.Contains(errMsg, "Deadlock") ||

strings.Contains(errMsg, "Lock wait timeout") ||

strings.Contains(errMsg, "try again later")

hongyunyan · 2025-10-03T06:30:47Z

/pull-cdc-mysql-integration-heavy

hongyunyan · 2025-10-03T06:31:48Z

/test pull-cdc-mysql-integration-heavy

ti-chi-bot · 2025-10-03T08:00:57Z

[LGTM Timeline notifier]

Timeline:

2025-10-03 08:00:57.173630371 +0000 UTC m=+417237.430361761: ☑️ agreed by lidezhu.

ti-chi-bot · 2025-10-03T08:57:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: flowbehappy, lidezhu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [flowbehappy,lidezhu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hongyunyan · 2025-10-03T09:21:28Z

/retest

hongyunyan added 6 commits September 25, 2025 17:48

update

16225aa

Merge branch 'master' of https://github.com/pingcap/ticdc

3addb73

Merge branch 'master' of https://github.com/pingcap/ticdc

76a6855

Merge branch 'master' of https://github.com/pingcap/ticdc

bd6ad27

update

2d79177

update

211042f

ti-chi-bot Bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 3, 2025

gemini-code-assist Bot reviewed Oct 3, 2025

View reviewed changes

update

23ade55

lidezhu approved these changes Oct 3, 2025

View reviewed changes

ti-chi-bot Bot added the lgtm label Oct 3, 2025

ti-chi-bot Bot added approved and removed do-not-merge/needs-linked-issue labels Oct 3, 2025

flowbehappy approved these changes Oct 3, 2025

View reviewed changes

ti-chi-bot Bot merged commit 0ebea70 into pingcap:master Oct 3, 2025
16 checks passed

Conversation

hongyunyan commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Uh oh!

gemini-code-assist Bot commented Oct 3, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

hongyunyan commented Oct 3, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

hongyunyan commented Oct 3, 2025

Uh oh!

hongyunyan commented Oct 3, 2025

Uh oh!

ti-chi-bot Bot commented Oct 3, 2025

[LGTM Timeline notifier]

Uh oh!

ti-chi-bot Bot commented Oct 3, 2025

Uh oh!

hongyunyan commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hongyunyan commented Oct 3, 2025 •

edited

Loading