Skip to content

[Proposal] Extend streaming job to support MySQL synchronization #58896

@JNSimba

Description

@JNSimba

Background

#56191 Streaming jobs have been introduced, and continuous synchronization on S3 is now supported.

This proposal involves extending support for MySQL data sources.
Unlike S3, in MySQL, a single binlog entry may involve multiple tables, making it impossible to represent using TVF.

Grammar

Create mysql sync

CREATE JOB mysql_db_sync
ON STREAMING
FROM MYSQL (
    "jdbc_url" = "jdbc:mysql://127.0.0.1:3306",
    "driver_url" = "mysql-connector-j-8.0.31.jar",
    "driver_class" = "com.mysql.cj.jdbc.Driver",
    "user" = "root",
    "password" = "",
    "database" = "mysqldb",
    "include_tables" = "user_info,student", 
    "offset" = "initial"
)
TO DATABASE target_test_db (
)

Show job like common other streaming job

select * from job(type='insert') where ExecuteType = 'streaming'

Design

Image

Introducing the CdcClient role:

  1. Responsible for consuming full and incremental data from MySQL.
  2. Starts during job scheduling and terminates following the BE process.

Limitations

  1. CdcClient can only guarantee at-least-once semantics, but idempotency can be achieved using the primary key table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions