Skip to content

Conversation

@wuwenchi
Copy link
Contributor

Proposed changes

Issue #31442

  1. delete file according query id
  2. delete write path after insert

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

}

return doRecursiveDeleteFiles(directory, deleteEmptyDir);
String queryId = DebugUtil.printId(ConnectContext.get().queryId());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to save query id when generating this HiveTransaction.
Because ConnectContext.get() is a thread local var, it may be null if we run this transaction in another thread

for (RemoteFile file : allFiles) {
String fileName = file.getName();
if (!deleteIfExists(new Path(fileName))) {
if (file.getName().startsWith(queryId)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use contains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the files generated by be all start with queryId, it may be that startsWith is more appropriate than the contain check.

@wuwenchi
Copy link
Contributor Author

run buildall

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 38594 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a41a97bb522e217cb83aafb85f88ac32d4e47924, data reload: false

------ Round 1 ----------------------------------
q1	17622	4290	4226	4226
q2	2013	191	188	188
q3	11089	1251	1173	1173
q4	10816	866	747	747
q5	7816	2786	2709	2709
q6	218	136	138	136
q7	1012	608	620	608
q8	9552	2126	2082	2082
q9	7634	6727	6595	6595
q10	8527	3525	3507	3507
q11	456	224	226	224
q12	515	220	224	220
q13	17756	2936	2951	2936
q14	269	237	233	233
q15	526	483	480	480
q16	507	404	392	392
q17	968	662	641	641
q18	7371	6760	6619	6619
q19	1623	1513	1526	1513
q20	640	300	303	300
q21	3437	2948	2761	2761
q22	357	304	309	304
Total cold run time: 110724 ms
Total hot run time: 38594 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4217	4187	4190	4187
q2	358	260	268	260
q3	2980	2739	2786	2739
q4	1844	1579	1580	1579
q5	5358	5367	5318	5318
q6	208	123	125	123
q7	2269	1856	1897	1856
q8	3203	3334	3330	3330
q9	8558	8566	8539	8539
q10	3892	3696	3730	3696
q11	587	478	471	471
q12	741	604	607	604
q13	16317	2887	2952	2887
q14	301	276	268	268
q15	512	478	473	473
q16	476	417	417	417
q17	1769	1471	1443	1443
q18	7448	7407	7459	7407
q19	1675	1522	1533	1522
q20	1986	1773	1757	1757
q21	4933	4726	4793	4726
q22	540	448	440	440
Total cold run time: 70172 ms
Total hot run time: 54042 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183828 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a41a97bb522e217cb83aafb85f88ac32d4e47924, data reload: false

query1	923	371	376	371
query2	6478	2524	2434	2434
query3	6665	199	201	199
query4	24881	21253	21370	21253
query5	4175	410	409	409
query6	268	173	173	173
query7	4584	285	286	285
query8	228	163	172	163
query9	8497	2340	2375	2340
query10	575	248	270	248
query11	14788	14282	14123	14123
query12	139	90	86	86
query13	1634	373	373	373
query14	10066	7905	7734	7734
query15	262	187	191	187
query16	8235	286	264	264
query17	1942	588	593	588
query18	2113	283	282	282
query19	324	151	154	151
query20	91	84	86	84
query21	199	124	127	124
query22	4999	4825	4773	4773
query23	33584	32741	32852	32741
query24	11834	2908	2911	2908
query25	644	400	364	364
query26	1742	148	148	148
query27	2974	311	307	307
query28	7544	2011	1992	1992
query29	1010	616	593	593
query30	306	170	168	168
query31	944	723	710	710
query32	91	52	52	52
query33	737	250	237	237
query34	1112	462	476	462
query35	830	711	699	699
query36	1040	883	875	875
query37	275	72	69	69
query38	3352	3207	3178	3178
query39	1550	1508	1540	1508
query40	267	126	121	121
query41	44	42	42	42
query42	102	92	99	92
query43	577	547	533	533
query44	1198	761	722	722
query45	281	264	267	264
query46	1098	721	727	721
query47	1924	1840	1850	1840
query48	359	292	296	292
query49	1133	364	362	362
query50	749	375	395	375
query51	6622	6535	6492	6492
query52	100	91	90	90
query53	355	272	273	272
query54	300	222	249	222
query55	74	68	75	68
query56	252	233	217	217
query57	1228	1120	1128	1120
query58	222	194	194	194
query59	3488	3372	3178	3178
query60	246	219	226	219
query61	86	85	87	85
query62	645	433	441	433
query63	304	280	274	274
query64	6173	4071	3750	3750
query65	3057	2998	3030	2998
query66	1389	343	325	325
query67	15475	14984	15018	14984
query68	5280	538	551	538
query69	473	292	305	292
query70	1226	1195	1136	1136
query71	1403	1260	1258	1258
query72	6348	2587	2415	2415
query73	710	318	318	318
query74	6874	6480	6452	6452
query75	3469	2678	2599	2599
query76	3375	971	948	948
query77	542	256	258	256
query78	10799	10085	10201	10085
query79	2874	521	518	518
query80	1939	423	421	421
query81	532	242	241	241
query82	770	92	92	92
query83	320	162	169	162
query84	259	86	83	83
query85	1994	271	262	262
query86	474	286	314	286
query87	3419	3231	3295	3231
query88	4610	2396	2388	2388
query89	485	381	376	376
query90	2095	177	175	175
query91	123	94	109	94
query92	59	45	45	45
query93	4003	512	500	500
query94	1339	176	178	176
query95	395	300	289	289
query96	591	263	265	263
query97	3151	2903	2936	2903
query98	233	217	218	217
query99	1277	890	869	869
Total cold run time: 290824 ms
Total hot run time: 183828 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.7 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a41a97bb522e217cb83aafb85f88ac32d4e47924, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.07
query5	0.50	0.50	0.50
query6	1.47	0.72	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.54	0.50	0.50
query10	0.56	0.55	0.57
query11	0.15	0.12	0.12
query12	0.14	0.11	0.12
query13	0.60	0.58	0.58
query14	0.76	0.78	0.77
query15	0.84	0.80	0.81
query16	0.36	0.38	0.37
query17	1.03	1.00	1.00
query18	0.22	0.23	0.24
query19	1.74	1.70	1.80
query20	0.02	0.01	0.02
query21	15.45	0.67	0.65
query22	4.20	7.13	2.11
query23	18.29	1.37	1.34
query24	1.71	0.28	0.22
query25	0.15	0.08	0.08
query26	0.27	0.17	0.18
query27	0.08	0.08	0.08
query28	13.33	0.99	0.97
query29	12.62	3.29	3.26
query30	0.26	0.06	0.07
query31	2.86	0.38	0.38
query32	3.27	0.46	0.46
query33	2.82	2.84	2.84
query34	17.13	4.43	4.40
query35	4.49	4.47	4.47
query36	0.64	0.45	0.48
query37	0.18	0.16	0.15
query38	0.15	0.14	0.14
query39	0.05	0.03	0.04
query40	0.17	0.13	0.15
query41	0.09	0.04	0.05
query42	0.05	0.04	0.04
query43	0.04	0.04	0.03
Total cold run time: 109.33 s
Total hot run time: 30.7 s

morningman
morningman previously approved these changes Apr 17, 2024
Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 17, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@wuwenchi
Copy link
Contributor Author

run p0

@wuwenchi
Copy link
Contributor Author

run external

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 18, 2024
@wuwenchi
Copy link
Contributor Author

run buildall

1 similar comment
@wuwenchi
Copy link
Contributor Author

run buildall

@wuwenchi
Copy link
Contributor Author

run compile

@doris-robot
Copy link

TPC-H: Total hot run time: 38445 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ff8e43b76598f94d1383c36df4fb9737f48fbdba, data reload: false

------ Round 1 ----------------------------------
q1	17632	4243	4199	4199
q2	2010	188	183	183
q3	10446	1207	1146	1146
q4	10184	802	794	794
q5	7494	2680	2668	2668
q6	225	132	134	132
q7	999	596	589	589
q8	9224	2066	2028	2028
q9	7313	6542	6507	6507
q10	8548	3514	3512	3512
q11	459	227	230	227
q12	419	220	213	213
q13	17754	2905	2937	2905
q14	269	221	231	221
q15	510	487	482	482
q16	521	384	374	374
q17	947	668	700	668
q18	7310	6813	6728	6728
q19	5307	1520	1536	1520
q20	696	310	309	309
q21	3473	2848	2744	2744
q22	357	303	296	296
Total cold run time: 112097 ms
Total hot run time: 38445 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4332	4235	4219	4219
q2	371	269	275	269
q3	3011	2769	2748	2748
q4	1859	1533	1612	1533
q5	5383	5352	5282	5282
q6	209	123	121	121
q7	2241	1784	1836	1784
q8	3193	3328	3303	3303
q9	8579	8538	8576	8538
q10	4049	3817	3898	3817
q11	615	524	491	491
q12	813	630	616	616
q13	17403	3220	3182	3182
q14	332	285	287	285
q15	523	493	478	478
q16	532	423	457	423
q17	1833	1509	1516	1509
q18	7927	8116	7840	7840
q19	1726	1579	1567	1567
q20	1965	1799	1883	1799
q21	8069	4968	4962	4962
q22	539	471	498	471
Total cold run time: 75504 ms
Total hot run time: 55237 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185365 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ff8e43b76598f94d1383c36df4fb9737f48fbdba, data reload: false

query1	880	360	357	357
query2	6188	2534	2511	2511
query3	6670	201	202	201
query4	22740	21374	21345	21345
query5	4134	395	400	395
query6	265	177	175	175
query7	4599	305	288	288
query8	233	172	181	172
query9	8557	2375	2370	2370
query10	421	242	258	242
query11	14701	14226	14216	14216
query12	136	90	89	89
query13	1635	368	363	363
query14	9836	7444	7869	7444
query15	248	181	183	181
query16	8218	270	261	261
query17	1980	585	571	571
query18	2118	284	285	284
query19	331	151	161	151
query20	91	88	88	88
query21	199	133	133	133
query22	5140	4901	4812	4812
query23	33786	33077	33244	33077
query24	11086	2986	3026	2986
query25	582	366	377	366
query26	667	156	164	156
query27	2264	353	379	353
query28	6053	2087	2054	2054
query29	871	619	616	616
query30	305	180	180	180
query31	985	770	762	762
query32	101	54	51	51
query33	665	234	243	234
query34	1121	475	494	475
query35	844	703	699	699
query36	1070	923	903	903
query37	114	88	68	68
query38	3503	3354	3399	3354
query39	1634	1575	1599	1575
query40	176	133	131	131
query41	48	42	43	42
query42	104	99	98	98
query43	607	553	553	553
query44	1122	738	733	733
query45	278	300	275	275
query46	1131	771	708	708
query47	2041	1966	1917	1917
query48	367	295	303	295
query49	814	358	378	358
query50	792	386	388	386
query51	6863	6767	6730	6730
query52	99	91	88	88
query53	341	282	279	279
query54	301	234	234	234
query55	88	75	73	73
query56	261	231	227	227
query57	1296	1219	1186	1186
query58	221	208	219	208
query59	3563	3532	3276	3276
query60	238	235	227	227
query61	89	87	83	83
query62	606	442	441	441
query63	296	272	271	271
query64	4815	3715	3982	3715
query65	3051	3051	3019	3019
query66	744	328	327	327
query67	15513	14942	14931	14931
query68	5175	524	526	524
query69	462	288	316	288
query70	1172	1157	1195	1157
query71	1383	1262	1270	1262
query72	6366	2788	2442	2442
query73	707	316	318	316
query74	6971	6463	6408	6408
query75	3332	2664	2656	2656
query76	2717	1046	978	978
query77	376	269	251	251
query78	10978	10364	10097	10097
query79	9280	519	518	518
query80	2113	444	436	436
query81	508	240	236	236
query82	1460	91	92	91
query83	271	162	160	160
query84	263	83	81	81
query85	1897	330	265	265
query86	462	290	274	274
query87	3539	3300	3276	3276
query88	5114	2399	2387	2387
query89	563	372	363	363
query90	1899	179	179	179
query91	142	96	96	96
query92	63	45	52	45
query93	6621	502	494	494
query94	1094	184	176	176
query95	385	289	292	289
query96	618	263	261	261
query97	3160	2925	2980	2925
query98	229	218	215	215
query99	1267	850	855	850
Total cold run time: 291423 ms
Total hot run time: 185365 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ff8e43b76598f94d1383c36df4fb9737f48fbdba, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.08
query5	0.49	0.49	0.48
query6	1.47	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.05	0.04
query9	0.55	0.48	0.49
query10	0.55	0.56	0.55
query11	0.16	0.11	0.12
query12	0.15	0.12	0.11
query13	0.60	0.57	0.58
query14	0.75	0.76	0.77
query15	0.83	0.80	0.80
query16	0.35	0.36	0.37
query17	0.95	0.99	1.02
query18	0.22	0.26	0.22
query19	1.85	1.68	1.79
query20	0.02	0.01	0.01
query21	15.40	0.65	0.65
query22	4.17	7.17	2.01
query23	18.33	1.36	1.31
query24	1.87	0.27	0.20
query25	0.15	0.10	0.08
query26	0.26	0.17	0.17
query27	0.08	0.07	0.08
query28	13.30	1.00	0.98
query29	12.64	3.30	3.26
query30	0.27	0.06	0.05
query31	2.87	0.38	0.38
query32	3.29	0.46	0.46
query33	2.77	2.84	2.82
query34	17.21	4.39	4.38
query35	4.44	4.43	4.55
query36	0.65	0.46	0.46
query37	0.18	0.16	0.15
query38	0.15	0.14	0.14
query39	0.04	0.04	0.04
query40	0.16	0.16	0.14
query41	0.10	0.05	0.05
query42	0.05	0.04	0.05
query43	0.04	0.03	0.05
Total cold run time: 109.45 s
Total hot run time: 30.37 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit ff8e43b76598f94d1383c36df4fb9737f48fbdba with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      33 seconds loaded 861443392 Bytes, about 24 MB/s
Insert into select:       13.5 seconds inserted 10000000 Rows, about 740K ops/s

@wuwenchi
Copy link
Contributor Author

run feut

1 similar comment
@wuwenchi
Copy link
Contributor Author

run feut

@wuwenchi
Copy link
Contributor Author

run p0

@wuwenchi
Copy link
Contributor Author

run feut

@wuwenchi
Copy link
Contributor Author

run p0

@wuwenchi
Copy link
Contributor Author

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 19, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit bbdc699 into apache:master Apr 19, 2024
morningman pushed a commit that referenced this pull request Apr 19, 2024
Issue #31442

1. delete file according query id
2. delete write path after insert
morningman pushed a commit to morningman/doris that referenced this pull request Apr 30, 2024
Issue apache#31442

1. delete file according query id
2. delete write path after insert
dataroaring pushed a commit that referenced this pull request May 1, 2024
…4.0 (#34371)

* [feature](insert)use optional location and add hive regression test (#33153)

* [feature](iceberg)The new DDL syntax is added to create iceberg partitioned tables (#33338)

support partition by :

```
create table tb1 (c1 string, ts datetime) engine = iceberg partition by (c1, day(ts)) () properties ("a"="b")
```

* [Enhancement](hive-writer) Adjust table sink exchange rebalancer params. (#33397)

Issue Number:  #31442

Change table sink exchange rebalancer params to node level and adjust these params to improve write performance by better balance.

rebalancer params:
```
DEFINE_mInt64(table_sink_partition_write_min_data_processed_rebalance_threshold,
              "26214400"); // 25MB
// Minimum partition data processed to rebalance writers in exchange when partition writing
DEFINE_mInt64(table_sink_partition_write_min_partition_data_processed_rebalance_threshold,
              "15728640"); // 15MB
```

* [feature](profile) add transaction statistics for profile (#33488)

1. commit total time
2. fs operator total time
     rename file count
     rename dir count
     delete dir count
3. add partition total time
    add partition count
4. update partition total time
    update partition count
like:
```
      -  Transaction  Commit  Time:  906ms
          -  FileSystem  Operator  Time:  833ms
              -  Rename  File  Count:  4
              -  Rename  Dir  Count:  0
              -  Delete  Dir  Count:  0
          -  HMS  Add  Partition  Time:  0ms
              -  HMS  Add  Partition  Count:  0
          -  HMS  Update  Partition  Time:  68ms
              -  HMS  Update  Partition  Count:  4
```

* [feature](iceberg) add iceberg transaction implement (#33629)

Issue #31442

add iceberg transaction

* [feature](insert)support default value when create hive table (#33666)

Issue Number: #31442

hive3 support create table with column's default value
if use hive3, we can write default value to table

* [refactor](filesystem)refactor `filesystem` interface (#33361)

1. Remame`list` to `globList` . The path of this `list` needs to have a wildcard character, and the corresponding hdfs interface is `globStatus`, so the modified name is `globList`.
2. If you only need to view files based on paths, you can use the `listFiles` operation.
3. Merge `listLocatedFiles` function into `listFiles` function.

* [opt](meta-cache) refine the meta cache (#33449)

1. Use `caffeine` instead of `guava cache` to get better performace
2. Add a new class `CacheFactory`

    All (Async)LoadingCache should be built from `CacheFactory`

3. Use separator executor for different caches

    1. rowCountRefreshExecutor
      For row count cache.
      Row count cache is an async loading cache, and we can ignore the result
      if cache missing or thread pool is full.
      So use a separate executor for this cache.

    2.  commonRefreshExecutor
      For other caches. Other caches are sync loading cache.
      But commonRefreshExecutor will be used for async refresh.
      That is, if cache entry is missing, the cache value will be loaded in caller thread, sychronously.
      if cache entry need refresh, it will be reloaded in commonRefreshExecutor.

    3. fileListingExecutor
      File listing is a heavy operation, so use a separate executor for it.
      For fileCache, the refresh operation will still use commonRefreshExecutor to trigger refresh.
      And fileListingExecutor will be used to list file.

4. Change the refresh and expire logic of caches

    For most of caches, set `refreshAfterWrite` strategy, so that
    even if the cache entry is expired, the old entry can still be
    used while new entry is being loaded.

5. Add new global variable `enable_get_row_count_from_file_list`

    Default is true, if false, will disable getting row count from file list

* [bugfix](hive)delete write path after hive insert (#33798)

Issue #31442

1. delete file according query id
2. delete write path after insert

* [Enhancement](multi-catalog) Rewrite `S3URI` to remove tricky virtual bucket mechanism and support different uri styles by flags. (#33858)

Many domestic cloud vendors are compatible with the s3 protocol. However, early versions of s3 client will only generate path style http requests (aws/aws-sdk-java-v2#763) when encountering endpoints that do not start with s3, while some cloud vendors only support virtual host style http request.

Therefore, Doris used `forceVirtualHosted` in `S3URI` to convert it into a virtual hosted path and implemented it through path style.
For example:
For s3 uri `s3://my-bucket/data/file.txt`, It will eventually be parsed into:
- virtualBucket: my-bucket
- Bucket: data (bucket must be set, otherwise the s3 client will report an error) Especially this step is particularly tricky because of the limitations of the s3 client.
- Key: file.txt

 The path style mode is used to generate an http request similar to the virtual host by setting the endpoint to virtualBucket + original endpoint, setting the bucket and key.
**However, the bucket and key here are inconsistent with the original concepts of s3, but the aws client happens to be able to generate an http request similar to the virtual host through the path style mode.**

However, after #30799 we have upgrade the aws sdk version from 2.17.257 to 2.20.131. The current aws s3 client can already generate a virtual host by third party by default style of http request. So in #31111 need to set the path style option, let the s3 client use doris' virtual bucket mechanism to continue working.

**Finally, the virtual bucket mechanism is too confusing and tricky, and we no longer need it with the new version of s3 client.**

### Resolution:

Rewrite `S3URI` to remove tricky virtual bucket mechanism and support different uri styles by flags.

This class represents a fully qualified location in S3 for input/output operations expressed as as URI.
 #### For AWS S3, URI common styles:
  - AWS Client Style(Hadoop S3 Style): `s3://my-bucket/path/to/file?versionId=abc123&partNumber=77&partNumber=88`
  - Virtual Host Style: `https://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
  - Path Style: `https://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
 
  Regarding the above-mentioned common styles, we can use <code>isPathStyle</code> to control whether to use path style
  or virtual host style.
  "Virtual host style" is the currently mainstream and recommended approach to use, so the default value of
  <code>isPathStyle</code> is false.
 
  #### Other Styles:
  - Virtual Host AWS Client (Hadoop S3) Mixed Style:
    `s3://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
  - Path AWS Client (Hadoop S3) Mixed Style:
     `s3://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
 
  For these two styles, we can use <code>isPathStyle</code> and <code>forceParsingByStandardUri</code>
  to control whether to use.
  Virtual Host AWS Client (Hadoop S3) Mixed Style: <code>isPathStyle = false && forceParsingByStandardUri = true</code>
  Path AWS Client (Hadoop S3) Mixed Style: <code>isPathStyle = true && forceParsingByStandardUri = true</code>
 
  When the incoming location is url encoded, the encoded string will be returned.
  For <code>getKey()</code>, <code>getQueryParams()</code> will return the encoding string

* [improvement](hive)add the `queryid` to the temporary file path (#34278)

`_temp_<table_name>` to `_temp_<queryid>_<table_name>`.
Prevent users from having a table with the name `_temp_<table_name>`.

So as to partition temp dir

* [feature](Cloud) Load index data into index cache when writing data (#34046)

* [Feature](hive-writer) Implements s3 file committer. (#33937)

Issue Number: #31442

[Feature] (hive-writer) Implements s3 file committer. 

S3 committer will start multipart uploading all files on BE side, and then complete multipart upload these files on FE side. If you do not complete multi parts of a file, the file will not be visible. So in this way, the atomicity of a single file can be guaranteed. But it still cannot guarantee the atomicity of multiple files. Because hive committers have best-effort semantics, this shortens the inconsistent time window.

## ChangeList:
- Add `used_by_s3_committer` in `FileWriterOptions` on BE side to start multi-part uploading files, then complete multi-part uploading files on FE side.
- `cosn://`use s3 client on FE side, because it need to complete multi-part uploading files on FE side.
-  Add `Status directoryExists(String dir)` and `Status deleteDirectory` in `FileSystem`.

---------

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants