Skip to content

Support full StreamLoad feature in multiload #4715

@yangzhg

Description

@yangzhg

When using multiload to realize the atomic import of multi-table data, although the interface is basically the same as the stream load, the function is not fully supported. In the process of importing traditional data into doris, it needs to support multi-table transaction support. The original multiLoad implementation needs to be improved once. And change the actual import plan using streaming.

The design plan uses the original api interface, the data is still downloaded and temporarily stored on the be, fe still stores the imported meta information, but the new plan is used in the commit phase, and the streaming import is not used directly through the etl process to execute the plan Refer to broker load to generate an execution plan similar to broker load. The data reading is changed from http of broker load to reading local files, and the rest is basically the same as broker load.

The basic process is as follows:

  • _multi_start Start import transaction to create txn
  • _load FE records imported meta-information, generates and saves data similar to BrokerFileGroup, and BE downloads and temporarily stores the data
  • _multi_commit generates an execution plan on the FE side. The generation process refers to Broker load, uses streaming import, and sends it to be for execution, waiting for the execution to complete. API returns
  • _multi_abort and _multi_desc remain the same as before

The parameters used in _load are the same as before, and the Header parameters are the same as steamload
It should be noted here that unlike broker load, the final plan generated by multiload can only be executed sequentially on the same node. This is mainly to ensure the order of imported files

When importing data from DBMS may also need Bacth Delete and sequence column support

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions