-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
What is problem of old load framework
There are different type of load use the different framework of load.
-
The most directly problem is that the same feature of load has been implemented many times in different type of load. The duplicated code is too complex to maintenance and enhance.
-
Also, the plan of broker and mini load cost more disk I/O and network transform. The performance of stream load is better then broker and mini load
-
Due to the non-streaming plan of broker load, it could not support to load the large size of file in limited memory.
What is the solution
Add a new framework of load which uses the unified load plan(streaming load plan) in be and quickly schedule logic in fe.
The design of solution
-
According to the new framework of load in fe:
The new stage of load consists of PENDING -> LOADING -> FINISHED which removes the ETL stage. Also, the new framework uses the TaskCallback and TxnStateChangeCallback instead of the LoadChecker which is used to trigger the next step of load.
In the new framework
Step1: New scheduler named LoadScheduler only support the broker load now. It will pick the PENDING jobs and call the execute function of jobs which is performed to begin txn and submit tasks. After this step, the state of job will be changed to LOADING.
Step2.0~2.N: The OnTaskFinished will be invoked after the task is finished. It is used to update the progress of job or trigger the next step of load. If all of tasks are finished, the txn will be committed.
Step3: The AfterVisible will be invoked after txn is visible which is used to change job state to FINISHED. -
According to the broker load
Step1: The pending task will be submit by execute function of broker load job. There are preparations that need to be done on pending task.
Step2.1: OnPendingTaskFinished which is sub-function of OnTaskFinsihed will be invoked after pending task is finished. It is used to submit the loading task which is created based on attachment of pending task.
Step2.2: OnLoadingTaskFinished which is sub-function of OnTaskFinsihed too will be invoked after loding task is finished. It is used to record the commit info and commit txn when all of task has been finished.
Step3: The AfterVisible will be invoked after txn is visible which is used to change job state to FINISHED. -
According to the new load plan:
The broker load, mini load, multi mini load will use the BrokerScanNode and OlapTableSink instead of CSVScanNode and DataSplitSink. The new load plan is streaming which support to load the huge size of file without Exceeding memory limit.