Skip to content

Conversation

@zxealous
Copy link
Contributor

@zxealous zxealous commented Feb 19, 2024

Proposed changes

Issue Number: close #xxx
cherry-pick from master #25280

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…r uri (apache#25280)

return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.

we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3(
"uri" = "https://error_uri/exp_1.csv",
"s3.access_key"= "xx",
"s3.secret_key" = "yy",
"format" = "csv") limit 10;

Empty set (1.29 sec)
```
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@zxealous
Copy link
Contributor Author

run buildall

Copy link
Contributor

@BePPPower BePPPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 50199 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bdc69822b3a1f01985078b44cd0db94db812771b, data reload: false

------ Round 1 ----------------------------------
q1	17533	4434	4314	4314
q2	2025	147	139	139
q3	10280	1864	1946	1864
q4	10101	1236	1322	1236
q5	8433	3977	4003	3977
q6	231	125	123	123
q7	2042	1621	1602	1602
q8	9302	2730	2724	2724
q9	10735	11411	10780	10780
q10	8647	3494	3510	3494
q11	426	235	239	235
q12	465	293	298	293
q13	18345	4031	4037	4031
q14	345	325	324	324
q15	508	456	453	453
q16	689	597	593	593
q17	1140	948	943	943
q18	7308	6793	6879	6793
q19	1693	1595	1518	1518
q20	534	306	315	306
q21	4463	4134	4060	4060
q22	511	410	397	397
Total cold run time: 115756 ms
Total hot run time: 50199 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4357	4295	4275	4275
q2	326	217	220	217
q3	4183	4150	4151	4150
q4	2770	2739	2741	2739
q5	7294	7238	7171	7171
q6	240	119	119	119
q7	3235	2833	2808	2808
q8	4353	4504	4513	4504
q9	17138	17056	16848	16848
q10	4218	4280	4269	4269
q11	758	689	708	689
q12	1022	855	856	855
q13	4213	3787	3795	3787
q14	457	420	425	420
q15	502	456	463	456
q16	756	714	701	701
q17	3837	3898	3925	3898
q18	8772	8882	8846	8846
q19	1710	1721	1652	1652
q20	2345	2150	2095	2095
q21	8500	8520	8545	8520
q22	1060	971	938	938
Total cold run time: 82046 ms
Total hot run time: 79957 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 19, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

TPC-DS: Total hot run time: 242019 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bdc69822b3a1f01985078b44cd0db94db812771b, data reload: false

query1	918	387	383	383
query2	6516	2335	2021	2021
query3	6915	203	214	203
query4	19922	18137	17945	17945
query5	19703	6491	6484	6484
query6	281	219	232	219
query7	4147	301	299	299
query8	272	237	223	223
query9	3176	2730	2672	2672
query10	412	286	306	286
query11	11337	10763	10715	10715
query12	121	78	68	68
query13	5703	638	629	629
query14	16909	13191	13415	13191
query15	384	231	239	231
query16	6444	266	269	266
query17	1720	1457	854	854
query18	2345	408	414	408
query19	212	149	146	146
query20	79	77	75	75
query21	186	93	98	93
query22	5336	5127	4914	4914
query23	32671	31937	32044	31937
query24	7123	6515	6476	6476
query25	530	436	423	423
query26	608	169	157	157
query27	2049	296	300	296
query28	6050	2222	2199	2199
query29	2902	2729	2747	2729
query30	243	162	163	162
query31	888	717	751	717
query32	63	62	61	61
query33	387	253	242	242
query34	863	476	487	476
query35	1125	936	927	927
query36	1602	1518	1528	1518
query37	90	61	59	59
query38	3086	2900	2949	2900
query39	1381	1329	1325	1325
query40	211	101	96	96
query41	39	36	31	31
query42	90	85	87	85
query43	591	589	540	540
query44	1138	711	723	711
query45	238	223	228	223
query46	1242	968	982	968
query47	1966	1678	1676	1676
query48	982	682	695	682
query49	609	381	366	366
query50	884	603	635	603
query51	5559	5423	5371	5371
query52	94	77	89	77
query53	454	323	315	315
query54	2657	2474	2458	2458
query55	88	73	75	73
query56	224	196	202	196
query57	1113	1060	1127	1060
query58	216	209	206	206
query59	3621	3435	3165	3165
query60	209	210	191	191
query61	88	80	81	80
query62	791	472	517	472
query63	476	340	338	338
query64	2591	1512	1400	1400
query65	3642	3567	3546	3546
query66	770	364	360	360
query67	15596	16837	16186	16186
query68	10285	687	669	669
query69	565	329	344	329
query70	2309	1608	1898	1608
query71	381	305	312	305
query72	6526	3411	3406	3406
query73	740	343	327	327
query74	6302	5858	5894	5858
query75	5395	3725	3711	3711
query76	6696	1217	1232	1217
query77	1169	257	266	257
query78	32564	50326	50343	50326
query79	15070	629	634	629
query80	4544	381	397	381
query81	567	234	234	234
query82	941	98	91	91
query83	335	132	139	132
query84	259	66	67	66
query85	1758	278	267	267
query86	441	387	362	362
query87	3250	2960	3016	2960
query88	6943	2367	2334	2334
query89	421	276	294	276
query90	2418	215	210	210
query91	153	115	116	115
query92	61	49	51	49
query93	5085	592	573	573
query94	1543	201	206	201
query95	1102	1046	1064	1046
query96	653	334	325	325
query97	6482	6295	6419	6295
query98	183	173	168	168
query99	3697	888	897	888
Total cold run time: 349716 ms
Total hot run time: 242019 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.85 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bdc69822b3a1f01985078b44cd0db94db812771b, data reload: false

query1	0.03	0.02	0.02
query2	0.06	0.02	0.02
query3	0.24	0.06	0.06
query4	1.81	0.06	0.06
query5	0.54	0.53	0.51
query6	1.23	0.62	0.62
query7	0.01	0.01	0.01
query8	0.03	0.02	0.02
query9	0.51	0.49	0.49
query10	0.53	0.53	0.54
query11	0.11	0.09	0.09
query12	0.10	0.09	0.09
query13	0.61	0.61	0.61
query14	0.79	0.79	0.80
query15	0.78	0.75	0.75
query16	0.36	0.38	0.36
query17	1.04	1.02	0.99
query18	0.24	0.23	0.26
query19	1.90	1.86	1.86
query20	0.01	0.01	0.00
query21	15.51	0.56	0.55
query22	1.65	1.89	1.88
query23	17.28	1.09	0.98
query24	5.05	0.84	2.79
query25	1.47	0.12	0.11
query26	0.30	0.15	0.13
query27	0.10	0.10	0.11
query28	7.45	0.72	0.72
query29	12.62	2.32	2.12
query30	0.55	0.53	0.54
query31	2.82	0.37	0.38
query32	3.43	0.49	0.49
query33	3.09	3.05	3.05
query34	15.27	4.80	4.80
query35	4.81	4.81	4.84
query36	1.08	1.01	1.02
query37	0.06	0.05	0.04
query38	0.03	0.02	0.02
query39	0.02	0.01	0.02
query40	0.16	0.14	0.14
query41	0.07	0.01	0.01
query42	0.02	0.01	0.01
query43	0.02	0.02	0.02
Total cold run time: 103.79 s
Total hot run time: 30.85 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit bdc69822b3a1f01985078b44cd0db94db812771b with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       21.2 seconds inserted 10000000 Rows, about 471K ops/s

@yiguolei yiguolei merged commit 395e882 into apache:branch-2.0 Feb 19, 2024
@zxealous zxealous deleted the fix-tvf-read-empty-file-fail branch April 28, 2024 09:30
mongo360 pushed a commit to mongo360/doris that referenced this pull request Aug 16, 2024
…r uri (apache#25280) (apache#31110)

return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.

we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3(
"uri" = "https://error_uri/exp_1.csv",
"s3.access_key"= "xx",
"s3.secret_key" = "yy",
"format" = "csv") limit 10;

Empty set (1.29 sec)
```

Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. kind/test reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants