-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[feat](streaming job) Introduce streaming job for incremental load #56175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…5790) ### What problem does this PR solve? introduce streaming job schedule task
### What problem does this PR solve? 1. Add Create StreamingJob and Alter Job 2. add job and task tvf schema 3. add offset api
### What problem does this PR solve? Introduce streaming task scheduler to schedule all streaming tasks.
…pache#55862) ### What problem does this PR solve? 1. add StreamingInsertTask For StreamJob 2. Improve StreamInsertJob 3. add insertcommand rewrite tvf params
…#55918) ### What problem does this PR solve? Implement offset persistence and replay logic(shared noting mode).
### What problem does this PR solve? 1. add S3 Stream job split offset 2. fix stream job create bug
### What problem does this PR solve? Fix streaming job problem
### What problem does this PR solve? Add fetch meta and fix rewrite tvf problem
…d mode (apache#55975) ### What problem does this PR solve? Implement offset persistence and replay in cloud mode.
…che#56056) ### What problem does this PR solve? Register listener id when begin transaction to ensure before/after commit logic would be executed.
### What problem does this PR solve? Add create job case and fix job bug
…re exactly-once semantics (apache#56135) ### What problem does this PR solve? Add task commit check and job event lock to ensure exactly-once semantics.
### What problem does this PR solve? Fix compile error
### What problem does this PR solve? Fix register callback id invalid.
…ay in cloud mode" (apache#56149) ### What problem does this PR solve? Revert "implement offset persistence and replay in cloud mode"
…and replay in cloud mode"" (apache#56156) Reverts apache#56149
### What problem does this PR solve? Fix Alter Job and schedule bug etc
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
1 similar comment
|
run buildall |
975d1da to
bed4094
Compare
|
run buildall |
bed4094 to
7377e32
Compare
|
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces streaming job functionality for incremental load operations in Apache Doris. It implements a new type of job that can continuously consume data from external sources (like S3) and incrementally load it into Doris tables.
Key changes include:
- Added streaming job execution type and associated infrastructure
- Implemented S3-based offset provider for tracking incremental data consumption
- Created new protobuf definitions for streaming job metadata and transaction attachments
- Added ALTER JOB command for modifying streaming job properties
Reviewed Changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| gensrc/proto/cloud.proto | Added protobuf definitions for streaming job metadata and transaction attachments |
| fe/fe-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/ | Core streaming job implementation including job, task, and properties classes |
| fe/fe-core/src/main/java/org/apache/doris/job/offset/ | Offset provider framework for tracking data consumption progress |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/ | Command classes for creating and altering streaming jobs |
| fe/fe-core/src/main/java/org/apache/doris/fs/ | File system extensions for batch listing with limits |
| cloud/src/meta-service/ | Cloud mode metadata service extensions for streaming job progress |
| regression-test/suites/job_p0/streaming_job/ | Integration test for streaming insert job functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterJobCommand.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/plans/commands/AlterJobCommand.java
Show resolved
Hide resolved
...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java
Outdated
Show resolved
Hide resolved
...-core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertJob.java
Outdated
Show resolved
Hide resolved
...core/src/main/java/org/apache/doris/job/extensions/insert/streaming/StreamingInsertTask.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/fs/obj/S3ObjStorage.java
Outdated
Show resolved
Hide resolved
### What problem does this PR solve? improve job api
35398e1 to
698fd51
Compare
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run performance |
TPC-DS: Total hot run time: 2766 ms |
ClickBench: Total hot run time: 0.07 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run cloud p0 |
|
run external |
FE Regression Coverage ReportIncrement line coverage |
|
run p0 |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
What problem does this PR solve?
closed #56191
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)