Skip to content

Conversation

@dujl
Copy link
Contributor

@dujl dujl commented May 24, 2022

Proposed changes

Issue Number: #9557

Support query hudi external table in Doris.
This pr support query cow and mor hudi table.
When hudi table is a mor table, only support read optimized query mode.

This is the second pr to support hudi external table.

Problem Summary:

Describe the overview of changes.

The propose of the pr is:
support query cow and mor hudi table.

Design

  1. generate scan range in Fe.
    Create a HudiScanNode to generate scan range parameters.
    HudiScanNode use HoodieParquetInputFormat to get all scan splits and assemble brokerRangeDesc
  2. Scan hudi data in Be.
    we use broker_scan_node to scan parquet files that send by fe.
  3. test case

query cow table

mysql> select * from t_hudi_cow;
+---------------------+-----------------------+--------------------+------------------------+--------------------------------------------------------------------------+------+------+-------+
| _hoodie_commit_time | _hoodie_commit_seqno  | _hoodie_record_key | _hoodie_partition_path | _hoodie_file_name                                                        | uuid | name | price |
+---------------------+-----------------------+--------------------+------------------------+--------------------------------------------------------------------------+------+------+-------+
| 20220520205304451   | 20220520205304451_0_1 | uuid:1             |                        | 6683497c-5a6c-4ccb-81f9-90673da5ab6a-0_0-42-42_20220520205304451.parquet |    1 | a1   |    20 |
+---------------------+-----------------------+--------------------+------------------------+--------------------------------------------------------------------------+------+------+-------+
1 row in set (0.45 sec)

query mor table

mysql> select * from t_hudi_mor;
+---------------------+-----------------------+--------------------+------------------------+-----------------------------------------------------------------------------+------+---------+-------+------+
| _hoodie_commit_time | _hoodie_commit_seqno  | _hoodie_record_key | _hoodie_partition_path | _hoodie_file_name                                                           | id   | name    | price | ts   |
+---------------------+-----------------------+--------------------+------------------------+-----------------------------------------------------------------------------+------+---------+-------+------+
| 20220520205326437   | 20220520205326437_0_2 | id:1               |                        | bbee0b74-9a04-45ae-b95d-12b448d82813-0_0-98-2079_20220520205326437.parquet  |    1 | a1      |    20 | 1000 |
| 20220522100249363   | 20220522100249363_0_1 | id:2               |                        | e9f6a051-2cfd-4cdd-94de-9d2d917dffb8-0_0-29-2007_20220522100249363.parquet  |    2 | b1      |    20 | 1000 |
| 20220522100310109   | 20220522100310109_0_2 | id:3               |                        | e9f6a051-2cfd-4cdd-94de-9d2d917dffb8-0_0-77-4028_20220522100310109.parquet  |    3 | b1      |    20 | 1000 |
| 20220522100324753   | 20220522100324753_0_3 | id:4               |                        | e9f6a051-2cfd-4cdd-94de-9d2d917dffb8-0_0-125-6049_20220522100324753.parquet |    4 | b1      |    20 | 1000 |
| 20220522100339091   | 20220522100339091_0_4 | id:5               |                        | e9f6a051-2cfd-4cdd-94de-9d2d917dffb8-0_0-173-8070_20220522100339091.parquet |    5 | b1      |    20 | 1000 |
| 20220522101838691   | 20220522101838691_0_1 | id:7               |                        | 018a955f-f1aa-4404-b997-e3abe72623fb-0_0-29-2009_20220522101838691.parquet  |    7 | insert1 |    20 | 1000 |
| 20220522101838691   | 20220522101838691_0_2 | id:6               |                        | 018a955f-f1aa-4404-b997-e3abe72623fb-0_0-29-2009_20220522101838691.parquet  |    6 | insert2 |    20 | 1000 |
+---------------------+-----------------------+--------------------+------------------------+-----------------------------------------------------------------------------+------+---------+-------+------+
7 rows in set (0.60 sec)

query cow partition table

mysql> select * from t_hudi_cow_partition;
mysql> select * from t_hudi_cow_partition where id >3;
+---------------------+-----------------------+--------------------+------------------------+-----------------------------------------------------------------------------+------+--------+-------+------+
| _hoodie_commit_time | _hoodie_commit_seqno  | _hoodie_record_key | _hoodie_partition_path | _hoodie_file_name                                                           | id   | name   | price | ts   |
+---------------------+-----------------------+--------------------+------------------------+-----------------------------------------------------------------------------+------+--------+-------+------+
| 20220522192429630   | 20220522192429630_0_3 | id:31              | dt=2022-5-21           | 27e3edf3-900f-45e9-adbc-b09d6b065aaf-0_0-137-6071_20220522192429630.parquet |   31 | name31 |   300 | NULL |
| 20220522192557945   | 20220522192557945_0_4 | id:41              | dt=2022-5-22           | 60c2d68c-3318-4896-9af1-54f63324bd48-0_0-193-8109_20220522192557945.parquet |   41 | name41 |   400 | NULL |
| 20220522192557945   | 20220522192557945_1_5 | id:51              | dt=2022-5-23           | 913fe002-b5ba-4106-8894-34a25fe2dbb3-0_1-199-8110_20220522192557945.parquet |   51 | name51 |   500 | NULL |
+---------------------+-----------------------+--------------------+------------------------+-----------------------------------------------------------------------------+------+--------+-------+------+
3 rows in set (0.61 sec)

query the hudi table with some column information specified

mysql> select * from t_hudi_mor_with_part_schema;
+------+---------+-------+------+
| id   | name    | price | ts   |
+------+---------+-------+------+
|    1 | a1      |    20 | 1000 |
|    2 | b1      |    20 | 1000 |
|    3 | b1      |    20 | 1000 |
|    4 | b1      |    20 | 1000 |
|    5 | b1      |    20 | 1000 |
|    7 | insert1 |    20 | 1000 |
|    6 | insert2 |    20 | 1000 |
+------+---------+-------+------+
7 rows in set (0.42 sec)

query two columns from hudi table

mysql> select id, name  from t_hudi_cow_partition;
+------+--------+
| id   | name   |
+------+--------+
|    2 | name2  |
|    3 | name3  |
|   31 | name31 |
|   51 | name51 |
|    1 | a1     |
|   41 | name41 |
+------+--------+
6 rows in set (0.85 sec)

Checklist(Required)

  1. Does it affect the original behavior: (Yes/No/I Don't know)No
  2. Has unit tests been added: (Yes/No/No Need)Yes
  3. Has document been added or modified: (Yes/No/No Need)No Need
  4. Does it need to update dependencies: (Yes/No)Yes
  5. Are there any changes that cannot be rolled back: (Yes/No)No

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions github-actions bot added area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner labels May 24, 2022
@dujl dujl changed the title support query hudi external table [feature-wip](hudi) Step2: Support query hudi external table(include cow and mor table) #9559 May 24, 2022
@morningman morningman added the dev/backlog waiting to be merged in future dev branch label May 24, 2022
@dujl dujl changed the title [feature-wip](hudi) Step2: Support query hudi external table(include cow and mor table) #9559 [feature](hudi) Step2: Support query hudi external table(include cow and mor table) #9559 May 25, 2022
@morningman morningman added this to the v1.2 milestone May 26, 2022
morningman
morningman previously approved these changes May 26, 2022
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 26, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label May 27, 2022
@dujl
Copy link
Contributor Author

dujl commented May 27, 2022

@morningman @Jibing-Li please help to review

@dujl dujl requested a review from morningman May 28, 2022 12:43
@dujl dujl changed the title [feature](hudi) Step2: Support query hudi external table(include cow and mor table) #9559 [feature](hudi) Step2: Support query hudi external table(include cow and mor table) May 29, 2022
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 29, 2022
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 8092439 into apache:master May 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. area/load Issues or PRs related to all kinds of load area/planner Issues or PRs related to the query planner dev/backlog waiting to be merged in future dev branch reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants