-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
When using multiload to realize the atomic import of multi-table data, although the interface is basically the same as the stream load, the function is not fully supported. In the process of importing traditional data into doris, it needs to support multi-table transaction support. The original multiLoad implementation needs to be improved once. And change the actual import plan using streaming.
The design plan uses the original api interface, the data is still downloaded and temporarily stored on the be, fe still stores the imported meta information, but the new plan is used in the commit phase, and the streaming import is not used directly through the etl process to execute the plan Refer to broker load to generate an execution plan similar to broker load. The data reading is changed from http of broker load to reading local files, and the rest is basically the same as broker load.
The basic process is as follows:
_multi_startStart import transaction to create txn_loadFE records imported meta-information, generates and saves data similar to BrokerFileGroup, and BE downloads and temporarily stores the data_multi_commitgenerates an execution plan on the FE side. The generation process refers to Broker load, uses streaming import, and sends it to be for execution, waiting for the execution to complete. API returns_multi_abortand _multi_desc remain the same as before
The parameters used in _load are the same as before, and the Header parameters are the same as steamload
It should be noted here that unlike broker load, the final plan generated by multiload can only be executed sequentially on the same node. This is mainly to ensure the order of imported files
When importing data from DBMS may also need Bacth Delete and sequence column support