Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Mar 19, 2024

Proposed changes

Followup: #31128
This optimization allows doris to correctly read struct type data after changing the schema from hive.

Changing struct schema in hive:

hive> create table struct_test(id int,sf struct<f1: int, f2: string>) stored as parquet;

hive> insert into struct_test values
    >           (1, named_struct('f1', 1, 'f2', 's1')),
    >           (2, named_struct('f1', 2, 'f2', 's2')),
    >           (3, named_struct('f1', 3, 'f2', 's3'));

hive> alter table struct_test change sf sf struct<f1:int, f3:string>;

hive> select * from struct_test;
OK
1	{"f1":1,"f3":null}
2	{"f1":2,"f3":null}
3	{"f1":3,"f3":null}
Time taken: 5.298 seconds, Fetched: 3 row(s)

The previous result of doris was:

mysql> select * from struct_test;
+------+-----------------------+
| id   | sf                    |
+------+-----------------------+
|    1 | {"f1": 1, "f3": "s1"} |
|    2 | {"f1": 2, "f3": "s2"} |
|    3 | {"f1": 3, "f3": "s3"} |
+------+-----------------------+

Now the result is same as hive:

mysql> select * from struct_test;
+------+-----------------------+
| id   | sf                    |
+------+-----------------------+
|    1 | {"f1": 1, "f3": null} |
|    2 | {"f1": 2, "f3": null} |
|    3 | {"f1": 3, "f3": null} |
+------+-----------------------+

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@suxiaogang223 suxiaogang223 marked this pull request as draft March 19, 2024 02:58
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@suxiaogang223 suxiaogang223 force-pushed the hive_complex_type_change branch from 1e51e30 to a5cdbc5 Compare March 20, 2024 03:08
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@suxiaogang223 suxiaogang223 force-pushed the hive_complex_type_change branch from a5cdbc5 to ad3d61b Compare March 20, 2024 07:31
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@suxiaogang223 suxiaogang223 marked this pull request as ready for review March 20, 2024 09:14
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@AshinGau
Copy link
Member

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 21, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@zy-kkk
Copy link
Member

zy-kkk commented Mar 21, 2024

run buildall

@suxiaogang223 suxiaogang223 force-pushed the hive_complex_type_change branch from 239de18 to 0f8ee1b Compare March 21, 2024 03:08
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.26% (8728/24751)
Line Coverage: 27.07% (71437/263874)
Region Coverage: 26.31% (37068/140880)
Branch Coverage: 23.23% (18958/81624)
Coverage Report: http://coverage.selectdb-in.cc/coverage/239de186759e80a4fab6e31c1d5172dba18ad47a_239de186759e80a4fab6e31c1d5172dba18ad47a/report/index.html

@zy-kkk
Copy link
Member

zy-kkk commented Mar 21, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.26% (8728/24751)
Line Coverage: 27.07% (71425/263874)
Region Coverage: 26.31% (37068/140880)
Branch Coverage: 23.22% (18954/81624)
Coverage Report: http://coverage.selectdb-in.cc/coverage/0f8ee1b761998602b2326ebab4a9ba47725f3cc1_0f8ee1b761998602b2326ebab4a9ba47725f3cc1/report/index.html

impl insert null when struct's field is missing

check field of sturct must be nullable

fix get_rep_level and get_rep_level of StructColumnReader
@suxiaogang223 suxiaogang223 force-pushed the hive_complex_type_change branch from 0f8ee1b to 04e3a36 Compare March 21, 2024 06:07
@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.27% (8730/24750)
Line Coverage: 27.09% (71493/263910)
Region Coverage: 26.32% (37079/140885)
Branch Coverage: 23.23% (18966/81630)
Coverage Report: http://coverage.selectdb-in.cc/coverage/04e3a3699c5468a665e1f3ead99f4659d94ab9c7_04e3a3699c5468a665e1f3ead99f4659d94ab9c7/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit d584532 into apache:master Mar 22, 2024
@suxiaogang223 suxiaogang223 deleted the hive_complex_type_change branch March 22, 2024 03:24
yiguolei pushed a commit that referenced this pull request Mar 22, 2024
Followup: #31128
This optimization allows doris to correctly read struct type data after changing the schema from hive.

## Changing  struct schema  in hive:
```sql
hive> create table struct_test(id int,sf struct<f1: int, f2: string>) stored as parquet;

hive> insert into struct_test values
    >           (1, named_struct('f1', 1, 'f2', 's1')),
    >           (2, named_struct('f1', 2, 'f2', 's2')),
    >           (3, named_struct('f1', 3, 'f2', 's3'));

hive> alter table struct_test change sf sf struct<f1:int, f3:string>;

hive> select * from struct_test;
OK
1	{"f1":1,"f3":null}
2	{"f1":2,"f3":null}
3	{"f1":3,"f3":null}
Time taken: 5.298 seconds, Fetched: 3 row(s)
```

The previous result of doris was:
```sql
mysql> select * from struct_test;
+------+-----------------------+
| id   | sf                    |
+------+-----------------------+
|    1 | {"f1": 1, "f3": "s1"} |
|    2 | {"f1": 2, "f3": "s2"} |
|    3 | {"f1": 3, "f3": "s3"} |
+------+-----------------------+
```

Now the result is same as hive:

```sql
mysql> select * from struct_test;
+------+-----------------------+
| id   | sf                    |
+------+-----------------------+
|    1 | {"f1": 1, "f3": null} |
|    2 | {"f1": 2, "f3": null} |
|    3 | {"f1": 3, "f3": null} |
+------+-----------------------+
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants