split-exec: A sleek and efficient database Online DML change tool with zero online impact, perceptible progress, and controllable latency.
current status: developing, Maybe product first available release at April 15 2024
When using a database (such as MySQL or PostgreSQL), we often need to perform DML operations that can affect a large amount of rows. This type of SQL, referred to as "Big-SQL" in the following text.
For example, when data expires, we may want to:
-
Mark old data as deleted. This can be done using a SQL update statement like:
update online_task status='deleted' where create_time<'2023-02-01' -
Delete old data to reclaim space. This can be achieved with an SQL delete statement like:
delete from online_task where create_time<'2023-02-01' -
Migrate data from the live table to an archive table. This can be done using an SQL insert statement like:
insert into archive_task select * from online_task where create_time<'2023-02-01
However, executing Big-SQL directly on table may cause many troubles, such as:
- Coarse-grained locks held by
Big-SQLduring executing impact live queries. Big-SQLinvolving large amounts of data cause extensive CPU and IO operations.- Replaying
Big-SQLalso costs too long time, during replaying replicas have data latency.
Risk Summary: overhead connection limit, connection denied accumulation of slow queries data latency.
Otherwise, executing Big-SQL directly like a black-box, we are unable to estimate how much more time is required.
To address this issue, we aim to split a large SQL query into smaller sub-queries based on the primary key or unique key. Additionally, while executing these sub-queries, we monitor the data latency between the master and replicas.
The sample design of flow is like:

TODO: add
TODO