Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Aug 16, 2023

Proposed changes

one_array_json.json : This file has only 10 lines .
before :

select * from HDFS(                             
"uri" = "hdfs://127.0.0.1:8120/user/doris/preinstalled_data/json_format_test/one_array_json.json",  
 "fs.defaultFS"= "hdfs://127.0.0.1:8120",                             
"hadoop.username" = "doris",                            
 "format" = "json",                            
 "strip_outer_array" = "true",                             
 "read_json_by_line" = "false" ) limit 13;
+------+-----------+---------+
| id   | city      | code    |
+------+-----------+---------+
| 1    | beijing   | 1454547 |
| 2    | shanghai  | 1244264 |
| 3    | guangzhou | 528369  |
| 4    | shenzhen  | 594201  |
| 5    | hangzhou  | 594201  |
| 6    | nanjing   | 2345672 |
| 7    | wuhan     | 2345673 |
| 8    | chengdu   | 2345674 |
| 9    | xian      | 2345675 |
| 10   | hefei     | 2345676 |
| 1    | beijing   | 1454547 |
| 2    | shanghai  | 1244264 |
| 3    | guangzhou | 528369  |
+------+-----------+---------+
13 rows in set (0.13 sec)

when you select * from ... , read json file will not stop

after :

select * from HDFS(                             
"uri" = "hdfs://127.0.0.1:8120/user/doris/preinstalled_data/json_format_test/one_array_json.json",                             "fs.defaultFS"= "hdfs://127.0.0.1:8120",                             
"hadoop.username" = "doris",                             
"format" = "json",                             
"strip_outer_array" = "true",                              
"read_json_by_line" = "false" ) limit 21;
+------+-----------+---------+
| id   | city      | code    |
+------+-----------+---------+
| 1    | beijing   | 1454547 |
| 2    | shanghai  | 1244264 |
| 3    | guangzhou | 528369  |
| 4    | shenzhen  | 594201  |
| 5    | hangzhou  | 594201  |
| 6    | nanjing   | 2345672 |
| 7    | wuhan     | 2345673 |
| 8    | chengdu   | 2345674 |
| 9    | xian      | 2345675 |
| 10   | hefei     | 2345676 |
+------+-----------+---------+
10 rows in set (0.03 sec)

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

morningman
morningman previously approved these changes Aug 17, 2023
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 17, 2023
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor

run buildall

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.49 seconds
stream load tsv: 538 seconds loaded 74807831229 Bytes, about 132 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.3 seconds inserted 10000000 Rows, about 341K ops/s
storage size: 17162212033 Bytes

Copy link
Contributor

@BePPPower BePPPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Aug 18, 2023
@hubgeter
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hello-stephen
Copy link
Contributor

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.71 seconds
stream load tsv: 541 seconds loaded 74807831229 Bytes, about 131 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.2 seconds inserted 10000000 Rows, about 342K ops/s
storage size: 17162453274 Bytes

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 18, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eldenmoon eldenmoon merged commit 419e922 into apache:master Aug 18, 2023
xiaokang pushed a commit that referenced this pull request Aug 18, 2023
…3062)

* [fix](json)Fix the bug that does not stop when reading json files
airborne12 pushed a commit to airborne12/apache-doris that referenced this pull request Aug 21, 2023
…ache#23062)

* [fix](json)Fix the bug that does not stop when reading json files
@xiaokang xiaokang mentioned this pull request Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants