Skip to content

Conversation

@JNSimba
Copy link
Member

@JNSimba JNSimba commented Dec 10, 2025

What problem does this PR solve?

Issue Number: close #58896

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 10, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba JNSimba changed the title [Proposal] Extend streaming job to support MySQL synchronization [Feature] Extend streaming job to support MySQL synchronization Dec 10, 2025
@JNSimba JNSimba changed the title [Feature] Extend streaming job to support MySQL synchronization [Feature](Streaming Job) Extend streaming job to support MySQL synchronization Dec 10, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends streaming jobs to support MySQL synchronization via CDC (Change Data Capture), enabling users to sync data from MySQL databases to Doris in real-time. The implementation includes a new CDC client service and modifications to the streaming job framework.

Key Changes:

  • Introduces a CDC client Spring Boot application that interfaces with MySQL using Flink CDC connectors
  • Adds support for FROM MySQL TO Database syntax in job creation
  • Implements split-based data reading for both snapshot and binlog phases
  • Adds RPC endpoints for BE-FE communication to handle CDC operations

Reviewed changes

Copilot reviewed 85 out of 85 changed files in this pull request and generated no comments.

Show a summary per file
File Description
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job.groovy Regression test for MySQL streaming job with CDC
gensrc/proto/internal_service.proto Adds RPC interface for CDC client communication
fs_brokers/cdc_client/** Complete CDC client implementation using Spring Boot
fe/fe-core/.../streaming/** Extends streaming job framework with multi-table task support
fe/fe-core/.../offset/jdbc/** JDBC offset provider for tracking MySQL binlog positions
fe/fe-core/.../util/StreamingJobUtils.java Utility functions for streaming job management
docker/thirdparties/docker-compose/mysql/my.cnf Enables MySQL binlog for CDC

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hello-stephen

This comment was marked as outdated.

JNSimba added a commit that referenced this pull request Dec 23, 2025
…onization (#58898)

### What problem does this PR solve?

Issue Number: close #58896
yiguolei pushed a commit that referenced this pull request Dec 26, 2025
…MySQL synchronization #58898 (#59228)

Cherry-picked from #58898

Co-authored-by: wudi <wudi@selectdb.com>
JNSimba added a commit that referenced this pull request Jan 9, 2026
…59705)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #58898
github-actions bot pushed a commit that referenced this pull request Jan 9, 2026
…59705)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #58898
JNSimba added a commit that referenced this pull request Jan 12, 2026
…pty tables (#59735)

### What problem does this PR solve?

Fix the issue of synchronization failure under empty tables 
Related PR: #58898
github-actions bot pushed a commit that referenced this pull request Jan 12, 2026
…pty tables (#59735)

### What problem does this PR solve?

Fix the issue of synchronization failure under empty tables 
Related PR: #58898
JNSimba added a commit that referenced this pull request Jan 13, 2026
### What problem does this PR solve?

fix show task error info when task timeout

Related PR: #58898
JNSimba added a commit that referenced this pull request Jan 13, 2026
…59760)

### What problem does this PR solve?

fix get remote meta failed to pause streaming job

Releate PR: #58898
github-actions bot pushed a commit that referenced this pull request Jan 13, 2026
### What problem does this PR solve?

fix show task error info when task timeout

Related PR: #58898
github-actions bot pushed a commit that referenced this pull request Jan 13, 2026
…59760)

### What problem does this PR solve?

fix get remote meta failed to pause streaming job

Releate PR: #58898
zzzxl1993 pushed a commit to zzzxl1993/doris that referenced this pull request Jan 13, 2026
…pache#59705)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: apache#58898
zzzxl1993 pushed a commit to zzzxl1993/doris that referenced this pull request Jan 13, 2026
…pty tables (apache#59735)

### What problem does this PR solve?

Fix the issue of synchronization failure under empty tables 
Related PR: apache#58898
zzzxl1993 pushed a commit to zzzxl1993/doris that referenced this pull request Jan 13, 2026
…e#59784)

### What problem does this PR solve?

fix show task error info when task timeout

Related PR: apache#58898
zzzxl1993 pushed a commit to zzzxl1993/doris that referenced this pull request Jan 13, 2026
…pache#59760)

### What problem does this PR solve?

fix get remote meta failed to pause streaming job

Releate PR: apache#58898
JNSimba added a commit that referenced this pull request Jan 14, 2026
…#59828)

### What problem does this PR solve?

Related PR: #58898

1. The length of varchar needs to be multiplied by 3 when creating the
table.
2. Columns are ordered according to the primary key.
3. Unsupported column types will result in an error.
github-actions bot pushed a commit that referenced this pull request Jan 14, 2026
…#59828)

### What problem does this PR solve?

Related PR: #58898

1. The length of varchar needs to be multiplied by 3 when creating the
table.
2. Columns are ordered according to the primary key.
3. Unsupported column types will result in an error.
JNSimba added a commit that referenced this pull request Jan 15, 2026
… remainsplit relay problem (#59883)

### What problem does this PR solve?
 
Related PR:  #58898
After the Job is created for the first time, starting from the initial
offset,
the task for the first split is scheduled, When the task status is
running or failed,
If FE restarts, the split needs to be restore from the meta again.
github-actions bot pushed a commit that referenced this pull request Jan 15, 2026
… remainsplit relay problem (#59883)

### What problem does this PR solve?
 
Related PR:  #58898
After the Job is created for the first time, starting from the initial
offset,
the task for the first split is scheduled, When the task status is
running or failed,
If FE restarts, the split needs to be restore from the meta again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Extend streaming job to support MySQL synchronization

9 participants