Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Apr 21, 2024

Proposed changes

Issue Number: #31442

[Feature] (hive-writer) Implements s3 file committer.

S3 committer will start multipart uploading all files on BE side, and then complete multipart upload these files on FE side. If you do not complete multi parts of a file, the file will not be visible. So in this way, the atomicity of a single file can be guaranteed. But it still cannot guarantee the atomicity of multiple files. Because hive committers have best-effort semantics, this shortens the inconsistent time window.

ChangeList:

  • Add used_by_s3_committer in FileWriterOptions on BE side to start multi-part uploading files, then complete multi-part uploading files on FE side.
  • cosn://use s3 client on FE side, because it need to complete multi-part uploading files on FE side.
  • Add Status directoryExists(String dir) and Status deleteDirectory in FileSystem.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.37% (8916/25210)
Line Coverage: 27.08% (73280/270570)
Region Coverage: 26.22% (37856/144372)
Branch Coverage: 23.03% (19275/83690)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a93a483147791500f08fb5186c4769d91dbbe9d0_a93a483147791500f08fb5186c4769d91dbbe9d0/report/index.html

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen kaka11chen marked this pull request as ready for review April 23, 2024 06:32
@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen kaka11chen force-pushed the s3_file_committer branch 2 times, most recently from 2f31d72 to 70ad11b Compare April 23, 2024 09:31
@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buidall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 28, 2024
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buidall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@morningman
Copy link
Contributor

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41613 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b3c8c5034c054e4715e178c0994dfa2dfbd3217f, data reload: false

------ Round 1 ----------------------------------
q1	17609	4329	4292	4292
q2	2015	192	193	192
q3	10477	1246	1252	1246
q4	10205	795	803	795
q5	7523	2749	2810	2749
q6	234	138	133	133
q7	963	560	558	558
q8	9225	2199	2140	2140
q9	9393	6835	6851	6835
q10	9104	4043	4015	4015
q11	430	240	245	240
q12	507	224	222	222
q13	17384	3134	3226	3134
q14	291	225	238	225
q15	514	475	465	465
q16	492	405	427	405
q17	979	705	704	704
q18	8471	7752	7712	7712
q19	5558	1609	1566	1566
q20	643	324	314	314
q21	5379	4185	3401	3401
q22	345	270	275	270
Total cold run time: 117741 ms
Total hot run time: 41613 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4618	4438	4401	4401
q2	380	268	264	264
q3	3160	2945	2903	2903
q4	1960	1573	1618	1573
q5	5446	5557	5563	5557
q6	210	126	133	126
q7	1849	1477	1488	1477
q8	3290	3461	3444	3444
q9	8792	8937	8863	8863
q10	4110	3900	3744	3744
q11	580	478	480	478
q12	791	660	624	624
q13	17086	3177	3152	3152
q14	318	279	279	279
q15	519	476	488	476
q16	498	457	460	457
q17	1846	1539	1533	1533
q18	8051	7605	7428	7428
q19	1655	1536	1590	1536
q20	1999	1796	1792	1792
q21	11688	4746	4847	4746
q22	570	498	501	498
Total cold run time: 79416 ms
Total hot run time: 55351 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.50% (8929/25152)
Line Coverage: 27.20% (73637/270760)
Region Coverage: 26.37% (38025/144173)
Branch Coverage: 23.17% (19380/83626)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b3c8c5034c054e4715e178c0994dfa2dfbd3217f_b3c8c5034c054e4715e178c0994dfa2dfbd3217f/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 186438 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b3c8c5034c054e4715e178c0994dfa2dfbd3217f, data reload: false

query1	915	374	355	355
query2	6421	2439	2402	2402
query3	6641	219	224	219
query4	26262	21365	21341	21341
query5	4125	413	419	413
query6	292	189	191	189
query7	4588	284	278	278
query8	239	190	184	184
query9	8692	2312	2283	2283
query10	429	248	237	237
query11	14721	14139	14191	14139
query12	139	90	85	85
query13	1641	365	383	365
query14	10456	7467	8343	7467
query15	240	176	170	170
query16	8072	268	273	268
query17	1701	572	558	558
query18	2097	286	290	286
query19	241	154	152	152
query20	94	90	88	88
query21	195	128	129	128
query22	5077	4905	4808	4808
query23	34016	33213	33123	33123
query24	10623	3045	3022	3022
query25	591	403	389	389
query26	703	154	149	149
query27	2197	320	335	320
query28	5913	1996	1995	1995
query29	884	658	588	588
query30	243	151	153	151
query31	978	741	710	710
query32	87	51	53	51
query33	633	247	241	241
query34	890	478	480	478
query35	812	694	679	679
query36	1087	921	871	871
query37	107	65	65	65
query38	3176	3011	3033	3011
query39	1566	1555	1676	1555
query40	215	123	122	122
query41	39	38	37	37
query42	105	96	94	94
query43	559	552	544	544
query44	1059	724	744	724
query45	279	267	261	261
query46	1070	718	757	718
query47	1926	1853	1818	1818
query48	377	296	294	294
query49	838	394	411	394
query50	774	389	380	380
query51	6820	6770	6796	6770
query52	103	90	91	90
query53	352	287	272	272
query54	304	231	228	228
query55	78	71	73	71
query56	238	221	223	221
query57	1222	1152	1115	1115
query58	213	197	202	197
query59	3317	3182	3133	3133
query60	252	234	236	234
query61	92	86	88	86
query62	635	455	444	444
query63	305	277	281	277
query64	8202	7161	7172	7161
query65	3096	3000	3045	3000
query66	783	337	338	337
query67	15530	15333	15086	15086
query68	7100	537	545	537
query69	568	313	308	308
query70	1170	1052	1095	1052
query71	475	274	272	272
query72	8202	2597	2409	2409
query73	723	325	327	325
query74	6524	6158	6073	6073
query75	4058	2666	2666	2666
query76	4350	972	974	972
query77	668	272	268	268
query78	11022	10277	10216	10216
query79	8784	522	532	522
query80	1830	438	424	424
query81	515	227	217	217
query82	1529	98	89	89
query83	252	166	164	164
query84	268	84	92	84
query85	1414	273	256	256
query86	470	293	296	293
query87	3283	3082	3131	3082
query88	5256	2399	2401	2399
query89	573	372	362	362
query90	1995	181	176	176
query91	123	98	96	96
query92	58	47	46	46
query93	7123	501	484	484
query94	1184	186	177	177
query95	384	301	301	301
query96	606	275	265	265
query97	3147	2908	2988	2908
query98	240	220	216	216
query99	1216	889	858	858
Total cold run time: 301677 ms
Total hot run time: 186438 ms

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 29, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit 2a05a52 into apache:master Apr 29, 2024
morningman pushed a commit that referenced this pull request Apr 29, 2024
morningman pushed a commit to morningman/doris that referenced this pull request Apr 30, 2024
Issue Number: apache#31442

[Feature] (hive-writer) Implements s3 file committer. 

S3 committer will start multipart uploading all files on BE side, and then complete multipart upload these files on FE side. If you do not complete multi parts of a file, the file will not be visible. So in this way, the atomicity of a single file can be guaranteed. But it still cannot guarantee the atomicity of multiple files. Because hive committers have best-effort semantics, this shortens the inconsistent time window.

## ChangeList:
- Add `used_by_s3_committer` in `FileWriterOptions` on BE side to start multi-part uploading files, then complete multi-part uploading files on FE side.
- `cosn://`use s3 client on FE side, because it need to complete multi-part uploading files on FE side.
-  Add `Status directoryExists(String dir)` and `Status deleteDirectory` in `FileSystem`.
dataroaring pushed a commit that referenced this pull request May 1, 2024
…4.0 (#34371)

* [feature](insert)use optional location and add hive regression test (#33153)

* [feature](iceberg)The new DDL syntax is added to create iceberg partitioned tables (#33338)

support partition by :

```
create table tb1 (c1 string, ts datetime) engine = iceberg partition by (c1, day(ts)) () properties ("a"="b")
```

* [Enhancement](hive-writer) Adjust table sink exchange rebalancer params. (#33397)

Issue Number:  #31442

Change table sink exchange rebalancer params to node level and adjust these params to improve write performance by better balance.

rebalancer params:
```
DEFINE_mInt64(table_sink_partition_write_min_data_processed_rebalance_threshold,
              "26214400"); // 25MB
// Minimum partition data processed to rebalance writers in exchange when partition writing
DEFINE_mInt64(table_sink_partition_write_min_partition_data_processed_rebalance_threshold,
              "15728640"); // 15MB
```

* [feature](profile) add transaction statistics for profile (#33488)

1. commit total time
2. fs operator total time
     rename file count
     rename dir count
     delete dir count
3. add partition total time
    add partition count
4. update partition total time
    update partition count
like:
```
      -  Transaction  Commit  Time:  906ms
          -  FileSystem  Operator  Time:  833ms
              -  Rename  File  Count:  4
              -  Rename  Dir  Count:  0
              -  Delete  Dir  Count:  0
          -  HMS  Add  Partition  Time:  0ms
              -  HMS  Add  Partition  Count:  0
          -  HMS  Update  Partition  Time:  68ms
              -  HMS  Update  Partition  Count:  4
```

* [feature](iceberg) add iceberg transaction implement (#33629)

Issue #31442

add iceberg transaction

* [feature](insert)support default value when create hive table (#33666)

Issue Number: #31442

hive3 support create table with column's default value
if use hive3, we can write default value to table

* [refactor](filesystem)refactor `filesystem` interface (#33361)

1. Remame`list` to `globList` . The path of this `list` needs to have a wildcard character, and the corresponding hdfs interface is `globStatus`, so the modified name is `globList`.
2. If you only need to view files based on paths, you can use the `listFiles` operation.
3. Merge `listLocatedFiles` function into `listFiles` function.

* [opt](meta-cache) refine the meta cache (#33449)

1. Use `caffeine` instead of `guava cache` to get better performace
2. Add a new class `CacheFactory`

    All (Async)LoadingCache should be built from `CacheFactory`

3. Use separator executor for different caches

    1. rowCountRefreshExecutor
      For row count cache.
      Row count cache is an async loading cache, and we can ignore the result
      if cache missing or thread pool is full.
      So use a separate executor for this cache.

    2.  commonRefreshExecutor
      For other caches. Other caches are sync loading cache.
      But commonRefreshExecutor will be used for async refresh.
      That is, if cache entry is missing, the cache value will be loaded in caller thread, sychronously.
      if cache entry need refresh, it will be reloaded in commonRefreshExecutor.

    3. fileListingExecutor
      File listing is a heavy operation, so use a separate executor for it.
      For fileCache, the refresh operation will still use commonRefreshExecutor to trigger refresh.
      And fileListingExecutor will be used to list file.

4. Change the refresh and expire logic of caches

    For most of caches, set `refreshAfterWrite` strategy, so that
    even if the cache entry is expired, the old entry can still be
    used while new entry is being loaded.

5. Add new global variable `enable_get_row_count_from_file_list`

    Default is true, if false, will disable getting row count from file list

* [bugfix](hive)delete write path after hive insert (#33798)

Issue #31442

1. delete file according query id
2. delete write path after insert

* [Enhancement](multi-catalog) Rewrite `S3URI` to remove tricky virtual bucket mechanism and support different uri styles by flags. (#33858)

Many domestic cloud vendors are compatible with the s3 protocol. However, early versions of s3 client will only generate path style http requests (aws/aws-sdk-java-v2#763) when encountering endpoints that do not start with s3, while some cloud vendors only support virtual host style http request.

Therefore, Doris used `forceVirtualHosted` in `S3URI` to convert it into a virtual hosted path and implemented it through path style.
For example:
For s3 uri `s3://my-bucket/data/file.txt`, It will eventually be parsed into:
- virtualBucket: my-bucket
- Bucket: data (bucket must be set, otherwise the s3 client will report an error) Especially this step is particularly tricky because of the limitations of the s3 client.
- Key: file.txt

 The path style mode is used to generate an http request similar to the virtual host by setting the endpoint to virtualBucket + original endpoint, setting the bucket and key.
**However, the bucket and key here are inconsistent with the original concepts of s3, but the aws client happens to be able to generate an http request similar to the virtual host through the path style mode.**

However, after #30799 we have upgrade the aws sdk version from 2.17.257 to 2.20.131. The current aws s3 client can already generate a virtual host by third party by default style of http request. So in #31111 need to set the path style option, let the s3 client use doris' virtual bucket mechanism to continue working.

**Finally, the virtual bucket mechanism is too confusing and tricky, and we no longer need it with the new version of s3 client.**

### Resolution:

Rewrite `S3URI` to remove tricky virtual bucket mechanism and support different uri styles by flags.

This class represents a fully qualified location in S3 for input/output operations expressed as as URI.
 #### For AWS S3, URI common styles:
  - AWS Client Style(Hadoop S3 Style): `s3://my-bucket/path/to/file?versionId=abc123&partNumber=77&partNumber=88`
  - Virtual Host Style: `https://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
  - Path Style: `https://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
 
  Regarding the above-mentioned common styles, we can use <code>isPathStyle</code> to control whether to use path style
  or virtual host style.
  "Virtual host style" is the currently mainstream and recommended approach to use, so the default value of
  <code>isPathStyle</code> is false.
 
  #### Other Styles:
  - Virtual Host AWS Client (Hadoop S3) Mixed Style:
    `s3://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
  - Path AWS Client (Hadoop S3) Mixed Style:
     `s3://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
 
  For these two styles, we can use <code>isPathStyle</code> and <code>forceParsingByStandardUri</code>
  to control whether to use.
  Virtual Host AWS Client (Hadoop S3) Mixed Style: <code>isPathStyle = false && forceParsingByStandardUri = true</code>
  Path AWS Client (Hadoop S3) Mixed Style: <code>isPathStyle = true && forceParsingByStandardUri = true</code>
 
  When the incoming location is url encoded, the encoded string will be returned.
  For <code>getKey()</code>, <code>getQueryParams()</code> will return the encoding string

* [improvement](hive)add the `queryid` to the temporary file path (#34278)

`_temp_<table_name>` to `_temp_<queryid>_<table_name>`.
Prevent users from having a table with the name `_temp_<table_name>`.

So as to partition temp dir

* [feature](Cloud) Load index data into index cache when writing data (#34046)

* [Feature](hive-writer) Implements s3 file committer. (#33937)

Issue Number: #31442

[Feature] (hive-writer) Implements s3 file committer. 

S3 committer will start multipart uploading all files on BE side, and then complete multipart upload these files on FE side. If you do not complete multi parts of a file, the file will not be visible. So in this way, the atomicity of a single file can be guaranteed. But it still cannot guarantee the atomicity of multiple files. Because hive committers have best-effort semantics, this shortens the inconsistent time window.

## ChangeList:
- Add `used_by_s3_committer` in `FileWriterOptions` on BE side to start multi-part uploading files, then complete multi-part uploading files on FE side.
- `cosn://`use s3 client on FE side, because it need to complete multi-part uploading files on FE side.
-  Add `Status directoryExists(String dir)` and `Status deleteDirectory` in `FileSystem`.

---------

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
AshinGau added a commit that referenced this pull request Jun 21, 2024
…ls (#36432)

## Proposed changes

## Fixed Bugs introduced from #33937
1. `FileSystemCacheKey.equals()` compares properties by `==`, resulting
in creating new file system in each partition
2. `dfsFileSystem` is not synchronized, resulting in creating more file
systems than need.
3. `jobConf.iterator()` will produce more than 2000 pairs of key-value
iszhangpch pushed a commit to iszhangpch/doris-p that referenced this pull request Jun 21, 2024
…ls (apache#36432)

## Proposed changes

## Fixed Bugs introduced from apache#33937
1. `FileSystemCacheKey.equals()` compares properties by `==`, resulting
in creating new file system in each partition
2. `dfsFileSystem` is not synchronized, resulting in creating more file
systems than need.
3. `jobConf.iterator()` will produce more than 2000 pairs of key-value
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
…ls (#36432)

## Proposed changes

## Fixed Bugs introduced from #33937
1. `FileSystemCacheKey.equals()` compares properties by `==`, resulting
in creating new file system in each partition
2. `dfsFileSystem` is not synchronized, resulting in creating more file
systems than need.
3. `jobConf.iterator()` will produce more than 2000 pairs of key-value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.3-merged meta-change reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants