[feature](hive)append support for struct and map column type on textfile format of hive table #22347

hubgeter · 2023-07-28T09:09:00Z

Proposed changes

Issue Number: close #xxx

after pr #21514

append support for struct and map column type on textfile format of hive table.
optimizer code that array column type.

+------+------------------------------------+
| id   | perf                               |
+------+------------------------------------+
| 1    | {"key1":"value1", "key2":"value2"} |
| 1    | {"key1":"value1", "key2":"value2"} |
| 2    | {"name":"John", "age":"30"}        |
+------+------------------------------------+

+---------+------------------+
| column1 | column2          |
+---------+------------------+
|       1 | {10, "data1", 1} |
|       2 | {20, "data2", 0} |
|       3 | {30, "data3", 1} |
+---------+------------------+

Summarizes support for complex types(support assign delimiter) :

array< primitive_type > and array< array< ... > >
map< primitive_type , primitive_type >
Struct< primitive_type , primitive_type ... >

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

github-actions · 2023-07-28T09:15:52Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-07-29T16:16:45Z

clang-tidy review says "All clean, LGTM! 👍"

morningman · 2023-07-30T14:19:45Z

run buildall

hello-stephen · 2023-07-30T15:23:45Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 48.07 seconds
stream load tsv: 507 seconds loaded 74807831229 Bytes, about 140 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17163567337 Bytes

morningman · 2023-08-03T08:42:04Z

fe/fe-core/src/main/java/org/apache/doris/planner/external/HiveScanNode.java

+                }
+
+                //support map<primitive_type,primitive_type>
+                if (slot.getType().isMapType()) {


Suggested change

if (slot.getType().isMapType()) {

else if (slot.getType().isMapType()) {

morningman · 2023-08-03T08:42:16Z

fe/fe-core/src/main/java/org/apache/doris/planner/external/HiveScanNode.java

+                }
+
+                //support Struct< primitive_type,primitive_type ... >
+                if (slot.getType().isStructType()) {


Suggested change

if (slot.getType().isStructType()) {

else if (slot.getType().isStructType()) {

morningman · 2023-08-03T08:44:12Z

be/src/vec/exec/format/csv/csv_reader.h

    std::string _value_separator;
    std::string _line_delimiter;
-    std::string _array_delimiter;
+    //    std::string _array_delimiter;


morningman · 2023-08-03T08:46:25Z

be/src/exec/text_converter.cpp

-bool TextConverter::write_vec_column(const SlotDescriptor* slot_desc,
-                                     vectorized::IColumn* nullable_col_ptr, const char* data,
-                                     size_t len, bool copy_string, bool need_escape, size_t rows) {
+bool TextConverter::write_date(const TypeDescriptor& type_desc,


Suggested change

bool TextConverter::write_date(const TypeDescriptor& type_desc,

bool TextConverter::write_data(const TypeDescriptor& type_desc,

morningman · 2023-08-03T08:47:53Z

be/src/exec/text_converter.cpp

    // \N means it's NULL
-    if (slot_desc->is_nullable()) {
+    std::string col_type_name = col_ptr->get_name();
+    if (col_type_name.substr(0, 8) == "Nullable") {


Why change to this?

Unable to obtain sub SlotDescriptor from SlotDescriptor . So use vectorized::IColumn * -> get_name to determine whether the column type can be null.

github-actions · 2023-08-03T09:53:19Z

clang-tidy review says "All clean, LGTM! 👍"

hubgeter · 2023-08-03T09:59:01Z

run buildall

hubgeter · 2023-08-03T10:17:03Z

run buildall

github-actions · 2023-08-03T10:27:35Z

clang-tidy review says "All clean, LGTM! 👍"

hello-stephen · 2023-08-03T11:13:55Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.35 seconds
stream load tsv: 518 seconds loaded 74807831229 Bytes, about 137 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 30 seconds loaded 861443392 Bytes, about 27 MB/s
insert into select: 29.9 seconds inserted 10000000 Rows, about 334K ops/s
storage size: 17162494781 Bytes

hubgeter · 2023-08-04T02:57:29Z

run buildall

github-actions · 2023-08-04T02:57:56Z

clang-tidy review says "All clean, LGTM! 👍"

github-actions · 2023-08-04T03:00:13Z

clang-tidy review says "All clean, LGTM! 👍"

hello-stephen · 2023-08-04T03:45:49Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.54 seconds
stream load tsv: 514 seconds loaded 74807831229 Bytes, about 138 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.6 seconds inserted 10000000 Rows, about 337K ops/s
storage size: 17161995543 Bytes

zhangguoqiang666 · 2023-08-07T12:04:29Z

run external

…ile format of hive table

github-actions · 2023-08-08T05:36:17Z

clang-tidy review says "All clean, LGTM! 👍"

kaka11chen

LGTM

github-actions · 2023-08-08T11:12:45Z

PR approved by anyone and no changes requested.

morningman · 2023-08-08T12:34:41Z

run buildall

hello-stephen · 2023-08-08T13:33:10Z

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.84 seconds
stream load tsv: 513 seconds loaded 74807831229 Bytes, about 139 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17162055537 Bytes

morningman

LGTM

github-actions · 2023-08-09T15:49:36Z

PR approved by at least one committer and no changes requested.

…ile format of hive table (#22347) 1. append support for struct and map column type on textfile format of hive table. 2. optimizer code that array column type. ```mysql +------+------------------------------------+ | id | perf | +------+------------------------------------+ | 1 | {"key1":"value1", "key2":"value2"} | | 1 | {"key1":"value1", "key2":"value2"} | | 2 | {"name":"John", "age":"30"} | +------+------------------------------------+ ``` ```mysql +---------+------------------+ | column1 | column2 | +---------+------------------+ | 1 | {10, "data1", 1} | | 2 | {20, "data2", 0} | | 3 | {30, "data3", 1} | +---------+------------------+ ``` Summarizes support for complex types(support assign delimiter) : 1. array< primitive_type > and array< array< ... > > 2. map< primitive_type , primitive_type > 3. Struct< primitive_type , primitive_type ... >

hubgeter changed the title ~~Hive map struct~~ [feature](hive)append support for struct and map column type on textfile format of hive table Jul 28, 2023

morningman added the dev/2.0.1 label Jul 30, 2023

morningman reviewed Aug 3, 2023

View reviewed changes

hubgeter force-pushed the hive_map_struct branch from 18175d6 to 19af504 Compare August 3, 2023 09:46

hubgeter force-pushed the hive_map_struct branch from b3c18cd to 6ce9c9d Compare August 4, 2023 02:49

hubgeter added 7 commits August 8, 2023 13:28

[feature](hive)append support for struct and map column type on textf…

9ee774e

…ile format of hive table

[feature](hive)append support for struct and map column type on textf…

cbb4be3

…ile format of hive table

[feature](hive)append support for struct and map column type on textf…

2290e86

…ile format of hive table

[feature](hive)append support for struct and map column type on textf…

d47e708

…ile format of hive table

[feature](hive)append support for struct and map column type on textf…

0cd70fb

…ile format of hive table

[feature](hive)append support for struct and map column type on textf…

fa41081

…ile format of hive table

[feature](hive)append support for struct and map column type on textf…

a9138aa

…ile format of hive table

hubgeter force-pushed the hive_map_struct branch from 2191046 to a9138aa Compare August 8, 2023 05:28

kaka11chen approved these changes Aug 8, 2023

View reviewed changes

github-actions bot added the reviewed label Aug 8, 2023

morningman approved these changes Aug 9, 2023

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 9, 2023

morningman merged commit f1db6bd into apache:master Aug 10, 2023

xiaokang added dev/2.0.1-merged and removed dev/2.0.1 labels Aug 11, 2023

xiaokang mentioned this pull request Aug 30, 2023

Release Note 2.0.1 #23640

Closed

	if (slot.getType().isMapType()) {
	else if (slot.getType().isMapType()) {

	if (slot.getType().isStructType()) {
	else if (slot.getType().isStructType()) {

	bool TextConverter::write_date(const TypeDescriptor& type_desc,
	bool TextConverter::write_data(const TypeDescriptor& type_desc,

[feature](hive)append support for struct and map column type on textfile format of hive table #22347

[feature](hive)append support for struct and map column type on textfile format of hive table #22347

Uh oh!

Conversation

hubgeter commented Jul 28, 2023 • edited by morningman Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Further comments

Uh oh!

github-actions bot commented Jul 28, 2023

Uh oh!

github-actions bot commented Jul 29, 2023

Uh oh!

morningman commented Jul 30, 2023

Uh oh!

hello-stephen commented Jul 30, 2023

Uh oh!

morningman Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

morningman Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

morningman Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

morningman Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

morningman Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

hubgeter Aug 3, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 3, 2023

Uh oh!

hubgeter commented Aug 3, 2023

Uh oh!

hubgeter commented Aug 3, 2023

Uh oh!

github-actions bot commented Aug 3, 2023

Uh oh!

hello-stephen commented Aug 3, 2023

Uh oh!

hubgeter commented Aug 4, 2023

Uh oh!

github-actions bot commented Aug 4, 2023

Uh oh!

github-actions bot commented Aug 4, 2023

Uh oh!

hello-stephen commented Aug 4, 2023

Uh oh!

zhangguoqiang666 commented Aug 7, 2023

Uh oh!

github-actions bot commented Aug 8, 2023

Uh oh!

kaka11chen left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 8, 2023

Uh oh!

morningman commented Aug 8, 2023

Uh oh!

hello-stephen commented Aug 8, 2023

Uh oh!

morningman left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Aug 9, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

hubgeter commented Jul 28, 2023 •

edited by morningman

Loading