Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Sep 18, 2025

What problem does this PR solve?

closed #56191

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

sollhui and others added 17 commits September 9, 2025 16:07
…5790)

### What problem does this PR solve?

introduce streaming job schedule task
### What problem does this PR solve?

1. Add Create StreamingJob and Alter Job
2. add job and task tvf schema
3. add offset api
### What problem does this PR solve?

 Introduce streaming task scheduler to schedule all streaming tasks.
…pache#55862)

### What problem does this PR solve?

1. add StreamingInsertTask For StreamJob
2. Improve StreamInsertJob
3. add insertcommand rewrite tvf params
…#55918)

### What problem does this PR solve?

Implement offset persistence and replay logic(shared noting mode).
### What problem does this PR solve?

1. add S3 Stream job split offset
2. fix stream job create bug
### What problem does this PR solve?

Fix streaming job problem
### What problem does this PR solve?

 Add fetch meta and fix rewrite tvf problem
…d mode (apache#55975)

### What problem does this PR solve?

Implement offset persistence and replay in cloud mode.
…che#56056)

### What problem does this PR solve?

Register listener id when begin transaction to ensure before/after
commit logic would be executed.
### What problem does this PR solve?

 Add create job case and fix job bug
…re exactly-once semantics (apache#56135)

### What problem does this PR solve?

Add task commit check and job event lock to ensure exactly-once
semantics.
### What problem does this PR solve?

Fix compile error
### What problem does this PR solve?

Fix register callback id invalid.
…ay in cloud mode" (apache#56149)

### What problem does this PR solve?

Revert "implement offset persistence and replay in cloud mode"
### What problem does this PR solve?

Fix Alter Job and schedule bug etc
@Thearas
Copy link
Contributor

Thearas commented Sep 18, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Sep 18, 2025

run buildall

1 similar comment
@sollhui
Copy link
Contributor Author

sollhui commented Sep 18, 2025

run buildall

@sollhui sollhui force-pushed the introduce_streaming_job branch from 975d1da to bed4094 Compare September 18, 2025 03:45
@sollhui
Copy link
Contributor Author

sollhui commented Sep 18, 2025

run buildall

@sollhui sollhui force-pushed the introduce_streaming_job branch from bed4094 to 7377e32 Compare September 18, 2025 03:52
@sollhui
Copy link
Contributor Author

sollhui commented Sep 18, 2025

run buildall

@JNSimba JNSimba requested a review from Copilot September 18, 2025 04:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces streaming job functionality for incremental load operations in Apache Doris. It implements a new type of job that can continuously consume data from external sources (like S3) and incrementally load it into Doris tables.

Key changes include:

  • Added streaming job execution type and associated infrastructure
  • Implemented S3-based offset provider for tracking incremental data consumption
  • Created new protobuf definitions for streaming job metadata and transaction attachments
  • Added ALTER JOB command for modifying streaming job properties

Reviewed Changes

Copilot reviewed 57 out of 57 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
gensrc/proto/cloud.proto Added protobuf definitions for streaming job metadata and transaction attachments
fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/ Core streaming job implementation including job, task, and properties classes
fe/fe-core/src/main/java/org/apache/doris/job/offset/ Offset provider framework for tracking data consumption progress
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/ Command classes for creating and altering streaming jobs
fe/fe-core/src/main/java/org/apache/doris/fs/ File system extensions for batch listing with limits
cloud/src/meta-service/ Cloud mode metadata service extensions for streaming job progress
regression-test/suites/job_p0/streaming_job/ Integration test for streaming insert job functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

JNSimba and others added 2 commits September 23, 2025 16:44
### What problem does this PR solve?

improve job api
@sollhui sollhui force-pushed the introduce_streaming_job branch from 35398e1 to 698fd51 Compare September 23, 2025 08:49
@sollhui
Copy link
Contributor Author

sollhui commented Sep 23, 2025

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 68.57% (96/140) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.84% (1546/1844)
Line Coverage 67.86% (27727/40857)
Region Coverage 68.23% (13646/20000)
Branch Coverage 58.51% (7282/12446)

@JNSimba
Copy link
Member

JNSimba commented Sep 23, 2025

run performance

@doris-robot
Copy link

TPC-DS: Total hot run time: 2766 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 698fd519e5df4146e61acdd03a8097e959f812e3, data reload: false

query1	1034	18	12	12
query2	7124	20	17	17
query3	7630	13	12	12
query4	26876	12	11	11
query5	4459	14	11	11
query6	391	12	11	11
query7	5420	11	11	11
query8	359	20	19	19
query9	9194	12	10	10
query10	731	12	11	11
query11	16088	11	11	11
query12	175	11	9	9
query13	1738	11	10	10
query14	10886	16	15	15
query15	398	10	9	9
query16	7193	11	10	10
query17	1536	11	10	10
query18	3062	11	10	10
query19	225	10	8	8
query20	132	12	10	10
query21	216	10	9	9
query22	4058	9	9	9
query23	33714	32	13	13
query24	9455	12	11	11
query25	722	10	8	8
query26	1064	11	9	9
query27	3481	10	9	9
query28	6195	11	9	9
query29	1225	11	9	9
query30	678	11	10	10
query31	1739	10	10	10
query32	119	10	9	9
query33	1042	10	8	8
query34	1585	830	518	518
query35	1007	10	9	9
query36	1020	10	9	9
query37	119	10	9	9
query38	3594	11	9	9
query39	1478	727	735	727
query40	228	9	8	8
query41	82	12	38	12
query42	160	10	9	9
query43	489	10	9	9
query44	1372	9	9	9
query45	387	10	9	9
query46	1216	10	9	9
query47	1812	10	9	9
query48	402	10	9	9
query49	1141	10	9	9
query50	783	9	9	9
query51	3928	10	8	8
query52	121	9	10	9
query53	245	10	10	10
query54	712	10	10	10
query55	93	11	9	9
query56	356	11	11	11
query57	1229	9	9	9
query58	371	10	9	9
query59	2689	9	9	9
query60	383	10	9	9
query61	180	10	9	9
query62	818	10	9	9
query63	264	10	10	10
query64	4156	11	10	10
query65	4033	10	9	9
query66	1091	11	9	9
query67	16555	16	11	11
query68	3586	11	10	10
query69	614	12	10	10
query70	1400	10	10	10
query71	457	353	358	353
query72	7392	10	10	10
query73	536	11	11	11
query74	9530	10	9	9
query75	3451	11	10	10
query76	2574	10	9	9
query77	953	12	13	12
query78	9950	29	12	12
query79	1130	10	11	10
query80	1121	11	11	11
query81	683	11	12	11
query82	1411	9	10	9
query83	437	12	11	11
query84	334	10	9	9
query85	1660	10	10	10
query86	580	10	9	9
query87	3870	9	9	9
query88	2860	16	11	11
query89	414	10	10	10
query90	2123	10	8	8
query91	175	9	9	9
query92	80	11	10	10
query93	1145	9	9	9
query94	1177	11	11	11
query95	574	11	11	11
query96	446	10	9	9
query97	3132	9	9	9
query98	249	235	223	223
query99	1471	10	9	9
Total cold run time: 287973 ms
Total hot run time: 2766 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 0.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 698fd519e5df4146e61acdd03a8097e959f812e3, data reload: false

query1	0.06	0.02	0.01
query2	0.11	0.01	0.00
query3	0.28	0.01	0.01
query4	1.75	0.00	0.01
query5	0.29	0.01	0.00
query6	1.65	0.00	0.00
query7	0.05	0.00	0.01
query8	0.07	0.00	0.01
query9	0.63	0.00	0.00
query10	0.60	0.00	0.00
query11	0.17	0.00	0.01
query12	0.16	0.01	0.01
query13	0.66	0.00	0.00
query14	1.06	0.00	0.00
query15	0.88	0.00	0.00
query16	0.39	0.00	0.00
query17	1.09	0.00	0.00
query18	0.22	0.00	0.00
query19	2.26	0.01	0.00
query20	0.02	0.01	0.01
query21	15.94	0.00	0.00
query22	6.27	0.00	0.00
query23	16.24	0.00	0.00
query24	1.46	0.00	0.00
query25	0.20	0.00	0.00
query26	0.17	0.01	0.00
query27	0.14	0.01	0.01
query28	1.30	0.01	0.01
query29	13.13	0.00	0.01
query30	0.31	0.01	0.01
query31	2.21	0.00	0.00
query32	5.83	0.00	0.00
query33	4.34	0.01	0.00
query34	7.63	0.00	0.00
query35	6.47	0.01	0.00
query36	0.69	0.00	0.00
query37	0.11	0.00	0.00
query38	0.07	0.00	0.00
query39	0.05	0.00	0.00
query40	0.18	0.00	0.00
query41	0.10	0.00	0.00
query42	0.07	0.00	0.00
query43	0.06	0.00	0.00
Total cold run time: 95.37 s
Total hot run time: 0.07 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.40% (17626/33637)
Line Coverage 37.62% (160017/425323)
Region Coverage 32.13% (121814/379127)
Branch Coverage 33.49% (53414/159502)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.15% (23391/32876)
Line Coverage 57.55% (244244/424434)
Region Coverage 52.96% (203537/384301)
Branch Coverage 54.58% (87453/160239)

@JNSimba
Copy link
Member

JNSimba commented Sep 23, 2025

run cloud p0

@JNSimba
Copy link
Member

JNSimba commented Sep 23, 2025

run external

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 62.30% (613/984) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Member

JNSimba commented Sep 23, 2025

run p0

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100% (0/0) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.16% (23394/32876)
Line Coverage 57.55% (244248/424434)
Region Coverage 52.97% (203563/384301)
Branch Coverage 54.58% (87458/160239)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 62.80% (618/984) 🎉
Increment coverage report
Complete coverage report

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 24, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@JNSimba JNSimba merged commit c73d225 into apache:master Sep 24, 2025
31 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Introduce streaming job for incremental load

10 participants