-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](parquet) Support hive struct schema change #32438
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt](parquet) Support hive struct schema change #32438
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
1e51e30 to
a5cdbc5
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
a5cdbc5 to
ad3d61b
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
LGTM |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
239de18 to
0f8ee1b
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
|
run buildall |
|
TeamCity be ut coverage result: |
impl insert null when struct's field is missing check field of sturct must be nullable fix get_rep_level and get_rep_level of StructColumnReader
0f8ee1b to
04e3a36
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
|
TeamCity be ut coverage result: |
morningman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Followup: #31128 This optimization allows doris to correctly read struct type data after changing the schema from hive. ## Changing struct schema in hive: ```sql hive> create table struct_test(id int,sf struct<f1: int, f2: string>) stored as parquet; hive> insert into struct_test values > (1, named_struct('f1', 1, 'f2', 's1')), > (2, named_struct('f1', 2, 'f2', 's2')), > (3, named_struct('f1', 3, 'f2', 's3')); hive> alter table struct_test change sf sf struct<f1:int, f3:string>; hive> select * from struct_test; OK 1 {"f1":1,"f3":null} 2 {"f1":2,"f3":null} 3 {"f1":3,"f3":null} Time taken: 5.298 seconds, Fetched: 3 row(s) ``` The previous result of doris was: ```sql mysql> select * from struct_test; +------+-----------------------+ | id | sf | +------+-----------------------+ | 1 | {"f1": 1, "f3": "s1"} | | 2 | {"f1": 2, "f3": "s2"} | | 3 | {"f1": 3, "f3": "s3"} | +------+-----------------------+ ``` Now the result is same as hive: ```sql mysql> select * from struct_test; +------+-----------------------+ | id | sf | +------+-----------------------+ | 1 | {"f1": 1, "f3": null} | | 2 | {"f1": 2, "f3": null} | | 3 | {"f1": 3, "f3": null} | +------+-----------------------+ ```
Proposed changes
Followup: #31128
This optimization allows doris to correctly read struct type data after changing the schema from hive.
Changing struct schema in hive:
The previous result of doris was:
Now the result is same as hive: