Skip to content

Conversation

@wuwenchi
Copy link
Contributor

Proposed changes

Issue #31442

add iceberg transaction

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@wuwenchi
Copy link
Contributor Author

run buildall

@wuwenchi
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 38616 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cd1c657f17f91c9cde15531e3d3798d6144cef97, data reload: false

------ Round 1 ----------------------------------
q1	17604	4383	4250	4250
q2	2008	179	178	178
q3	10473	1223	1166	1166
q4	10199	881	896	881
q5	7555	2711	2612	2612
q6	220	132	131	131
q7	1006	609	588	588
q8	9205	2073	2047	2047
q9	8042	6634	6577	6577
q10	8592	3514	3520	3514
q11	465	235	236	235
q12	433	223	214	214
q13	17774	2917	2931	2917
q14	264	225	239	225
q15	525	469	474	469
q16	529	375	375	375
q17	982	754	733	733
q18	7447	6857	6693	6693
q19	6500	1557	1506	1506
q20	680	321	294	294
q21	3586	2711	2843	2711
q22	364	302	300	300
Total cold run time: 114453 ms
Total hot run time: 38616 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4351	4222	4224	4222
q2	366	270	266	266
q3	2983	2751	2775	2751
q4	1861	1569	1576	1569
q5	5333	5339	5276	5276
q6	207	126	124	124
q7	2257	1847	1899	1847
q8	3224	3344	3316	3316
q9	8558	8566	8673	8566
q10	4078	3965	3997	3965
q11	620	516	492	492
q12	808	635	640	635
q13	17485	3184	3048	3048
q14	351	280	278	278
q15	517	472	497	472
q16	486	450	460	450
q17	1806	1509	1496	1496
q18	8230	7981	7835	7835
q19	1696	1607	1625	1607
q20	2052	1856	1824	1824
q21	5184	4888	5059	4888
q22	534	483	460	460
Total cold run time: 72987 ms
Total hot run time: 55387 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183511 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cd1c657f17f91c9cde15531e3d3798d6144cef97, data reload: false

query1	878	1141	344	344
query2	7646	2675	2607	2607
query3	6654	205	206	205
query4	37059	21608	21182	21182
query5	4132	400	393	393
query6	225	189	172	172
query7	4037	283	289	283
query8	220	172	169	169
query9	5763	2295	2291	2291
query10	372	239	232	232
query11	14502	14142	14197	14142
query12	141	91	85	85
query13	1015	374	348	348
query14	9667	6920	6727	6727
query15	200	173	175	173
query16	7073	249	257	249
query17	1705	580	540	540
query18	1596	275	264	264
query19	196	151	153	151
query20	91	85	89	85
query21	203	127	124	124
query22	4987	4885	4827	4827
query23	33647	33112	33196	33112
query24	11359	3029	3051	3029
query25	538	400	381	381
query26	892	158	151	151
query27	3069	371	371	371
query28	6566	2171	2108	2108
query29	893	643	639	639
query30	253	176	172	172
query31	979	741	758	741
query32	64	54	54	54
query33	546	239	245	239
query34	1009	485	515	485
query35	888	737	725	725
query36	1084	953	933	933
query37	107	71	70	70
query38	3687	3635	3611	3611
query39	1623	1565	1582	1565
query40	165	130	127	127
query41	46	45	50	45
query42	100	96	108	96
query43	597	572	552	552
query44	1329	731	746	731
query45	292	281	280	280
query46	1104	729	738	729
query47	2088	1944	1983	1944
query48	403	303	303	303
query49	899	382	393	382
query50	763	395	403	395
query51	6913	6916	6870	6870
query52	107	97	85	85
query53	342	284	280	280
query54	248	223	221	221
query55	73	72	71	71
query56	234	216	219	216
query57	1191	1114	1132	1114
query58	223	228	209	209
query59	3353	3444	3436	3436
query60	258	247	262	247
query61	93	92	90	90
query62	582	457	436	436
query63	299	283	276	276
query64	4319	4085	3633	3633
query65	3100	3021	3035	3021
query66	737	313	315	313
query67	15326	15001	15015	15001
query68	5132	543	532	532
query69	519	299	294	294
query70	1274	1194	1204	1194
query71	456	275	269	269
query72	6401	2606	2472	2472
query73	728	314	319	314
query74	6869	6379	6371	6371
query75	3128	2348	2312	2312
query76	3383	1095	1075	1075
query77	622	242	252	242
query78	10779	10057	10197	10057
query79	3494	518	513	513
query80	2221	419	425	419
query81	524	224	229	224
query82	1531	102	100	100
query83	352	184	179	179
query84	258	93	83	83
query85	1421	275	261	261
query86	458	317	309	309
query87	3789	3540	3591	3540
query88	5380	2268	2281	2268
query89	480	368	359	359
query90	1814	173	232	173
query91	121	102	95	95
query92	57	47	50	47
query93	4986	500	499	499
query94	1120	176	177	176
query95	370	283	284	283
query96	593	265	262	262
query97	2665	2474	2450	2450
query98	231	214	209	209
query99	1249	853	848	848
Total cold run time: 293155 ms
Total hot run time: 183511 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.32 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cd1c657f17f91c9cde15531e3d3798d6144cef97, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.03
query3	0.23	0.05	0.05
query4	1.68	0.07	0.08
query5	0.49	0.50	0.49
query6	1.53	0.65	0.66
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.55	0.49	0.49
query10	0.54	0.56	0.55
query11	0.16	0.11	0.11
query12	0.14	0.11	0.12
query13	0.61	0.59	0.59
query14	0.74	0.78	0.79
query15	0.82	0.82	0.81
query16	0.37	0.37	0.36
query17	0.96	0.96	0.99
query18	0.22	0.23	0.25
query19	1.87	1.66	1.78
query20	0.02	0.01	0.01
query21	15.42	0.66	0.65
query22	4.17	7.71	1.87
query23	18.28	1.32	1.30
query24	1.77	0.31	0.21
query25	0.15	0.08	0.08
query26	0.27	0.16	0.16
query27	0.08	0.08	0.08
query28	13.28	0.99	0.99
query29	12.58	3.38	3.35
query30	0.26	0.07	0.06
query31	2.84	0.36	0.38
query32	3.30	0.46	0.47
query33	2.81	2.87	2.87
query34	17.11	4.42	4.38
query35	4.49	4.44	4.49
query36	0.65	0.46	0.46
query37	0.18	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.03	0.04
query40	0.17	0.15	0.14
query41	0.09	0.04	0.05
query42	0.05	0.05	0.04
query43	0.04	0.04	0.03
Total cold run time: 109.3 s
Total hot run time: 30.32 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit cd1c657f17f91c9cde15531e3d3798d6144cef97 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.5 seconds inserted 10000000 Rows, about 740K ops/s

@wuwenchi
Copy link
Contributor Author

run p0

@wuwenchi
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@wuwenchi
Copy link
Contributor Author

run p0

1 similar comment
@wuwenchi
Copy link
Contributor Author

run p0

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 20, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@kaka11chen kaka11chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 68b33c8 into apache:master Apr 20, 2024
morningman pushed a commit to morningman/doris that referenced this pull request Apr 30, 2024
dataroaring pushed a commit that referenced this pull request May 1, 2024
…4.0 (#34371)

* [feature](insert)use optional location and add hive regression test (#33153)

* [feature](iceberg)The new DDL syntax is added to create iceberg partitioned tables (#33338)

support partition by :

```
create table tb1 (c1 string, ts datetime) engine = iceberg partition by (c1, day(ts)) () properties ("a"="b")
```

* [Enhancement](hive-writer) Adjust table sink exchange rebalancer params. (#33397)

Issue Number:  #31442

Change table sink exchange rebalancer params to node level and adjust these params to improve write performance by better balance.

rebalancer params:
```
DEFINE_mInt64(table_sink_partition_write_min_data_processed_rebalance_threshold,
              "26214400"); // 25MB
// Minimum partition data processed to rebalance writers in exchange when partition writing
DEFINE_mInt64(table_sink_partition_write_min_partition_data_processed_rebalance_threshold,
              "15728640"); // 15MB
```

* [feature](profile) add transaction statistics for profile (#33488)

1. commit total time
2. fs operator total time
     rename file count
     rename dir count
     delete dir count
3. add partition total time
    add partition count
4. update partition total time
    update partition count
like:
```
      -  Transaction  Commit  Time:  906ms
          -  FileSystem  Operator  Time:  833ms
              -  Rename  File  Count:  4
              -  Rename  Dir  Count:  0
              -  Delete  Dir  Count:  0
          -  HMS  Add  Partition  Time:  0ms
              -  HMS  Add  Partition  Count:  0
          -  HMS  Update  Partition  Time:  68ms
              -  HMS  Update  Partition  Count:  4
```

* [feature](iceberg) add iceberg transaction implement (#33629)

Issue #31442

add iceberg transaction

* [feature](insert)support default value when create hive table (#33666)

Issue Number: #31442

hive3 support create table with column's default value
if use hive3, we can write default value to table

* [refactor](filesystem)refactor `filesystem` interface (#33361)

1. Remame`list` to `globList` . The path of this `list` needs to have a wildcard character, and the corresponding hdfs interface is `globStatus`, so the modified name is `globList`.
2. If you only need to view files based on paths, you can use the `listFiles` operation.
3. Merge `listLocatedFiles` function into `listFiles` function.

* [opt](meta-cache) refine the meta cache (#33449)

1. Use `caffeine` instead of `guava cache` to get better performace
2. Add a new class `CacheFactory`

    All (Async)LoadingCache should be built from `CacheFactory`

3. Use separator executor for different caches

    1. rowCountRefreshExecutor
      For row count cache.
      Row count cache is an async loading cache, and we can ignore the result
      if cache missing or thread pool is full.
      So use a separate executor for this cache.

    2.  commonRefreshExecutor
      For other caches. Other caches are sync loading cache.
      But commonRefreshExecutor will be used for async refresh.
      That is, if cache entry is missing, the cache value will be loaded in caller thread, sychronously.
      if cache entry need refresh, it will be reloaded in commonRefreshExecutor.

    3. fileListingExecutor
      File listing is a heavy operation, so use a separate executor for it.
      For fileCache, the refresh operation will still use commonRefreshExecutor to trigger refresh.
      And fileListingExecutor will be used to list file.

4. Change the refresh and expire logic of caches

    For most of caches, set `refreshAfterWrite` strategy, so that
    even if the cache entry is expired, the old entry can still be
    used while new entry is being loaded.

5. Add new global variable `enable_get_row_count_from_file_list`

    Default is true, if false, will disable getting row count from file list

* [bugfix](hive)delete write path after hive insert (#33798)

Issue #31442

1. delete file according query id
2. delete write path after insert

* [Enhancement](multi-catalog) Rewrite `S3URI` to remove tricky virtual bucket mechanism and support different uri styles by flags. (#33858)

Many domestic cloud vendors are compatible with the s3 protocol. However, early versions of s3 client will only generate path style http requests (aws/aws-sdk-java-v2#763) when encountering endpoints that do not start with s3, while some cloud vendors only support virtual host style http request.

Therefore, Doris used `forceVirtualHosted` in `S3URI` to convert it into a virtual hosted path and implemented it through path style.
For example:
For s3 uri `s3://my-bucket/data/file.txt`, It will eventually be parsed into:
- virtualBucket: my-bucket
- Bucket: data (bucket must be set, otherwise the s3 client will report an error) Especially this step is particularly tricky because of the limitations of the s3 client.
- Key: file.txt

 The path style mode is used to generate an http request similar to the virtual host by setting the endpoint to virtualBucket + original endpoint, setting the bucket and key.
**However, the bucket and key here are inconsistent with the original concepts of s3, but the aws client happens to be able to generate an http request similar to the virtual host through the path style mode.**

However, after #30799 we have upgrade the aws sdk version from 2.17.257 to 2.20.131. The current aws s3 client can already generate a virtual host by third party by default style of http request. So in #31111 need to set the path style option, let the s3 client use doris' virtual bucket mechanism to continue working.

**Finally, the virtual bucket mechanism is too confusing and tricky, and we no longer need it with the new version of s3 client.**

### Resolution:

Rewrite `S3URI` to remove tricky virtual bucket mechanism and support different uri styles by flags.

This class represents a fully qualified location in S3 for input/output operations expressed as as URI.
 #### For AWS S3, URI common styles:
  - AWS Client Style(Hadoop S3 Style): `s3://my-bucket/path/to/file?versionId=abc123&partNumber=77&partNumber=88`
  - Virtual Host Style: `https://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
  - Path Style: `https://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
 
  Regarding the above-mentioned common styles, we can use <code>isPathStyle</code> to control whether to use path style
  or virtual host style.
  "Virtual host style" is the currently mainstream and recommended approach to use, so the default value of
  <code>isPathStyle</code> is false.
 
  #### Other Styles:
  - Virtual Host AWS Client (Hadoop S3) Mixed Style:
    `s3://my-bucket.s3.us-west-1.amazonaws.com/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
  - Path AWS Client (Hadoop S3) Mixed Style:
     `s3://s3.us-west-1.amazonaws.com/my-bucket/resources/doc.txt?versionId=abc123&partNumber=77&partNumber=88`
 
  For these two styles, we can use <code>isPathStyle</code> and <code>forceParsingByStandardUri</code>
  to control whether to use.
  Virtual Host AWS Client (Hadoop S3) Mixed Style: <code>isPathStyle = false && forceParsingByStandardUri = true</code>
  Path AWS Client (Hadoop S3) Mixed Style: <code>isPathStyle = true && forceParsingByStandardUri = true</code>
 
  When the incoming location is url encoded, the encoded string will be returned.
  For <code>getKey()</code>, <code>getQueryParams()</code> will return the encoding string

* [improvement](hive)add the `queryid` to the temporary file path (#34278)

`_temp_<table_name>` to `_temp_<queryid>_<table_name>`.
Prevent users from having a table with the name `_temp_<table_name>`.

So as to partition temp dir

* [feature](Cloud) Load index data into index cache when writing data (#34046)

* [Feature](hive-writer) Implements s3 file committer. (#33937)

Issue Number: #31442

[Feature] (hive-writer) Implements s3 file committer. 

S3 committer will start multipart uploading all files on BE side, and then complete multipart upload these files on FE side. If you do not complete multi parts of a file, the file will not be visible. So in this way, the atomicity of a single file can be guaranteed. But it still cannot guarantee the atomicity of multiple files. Because hive committers have best-effort semantics, this shortens the inconsistent time window.

## ChangeList:
- Add `used_by_s3_committer` in `FileWriterOptions` on BE side to start multi-part uploading files, then complete multi-part uploading files on FE side.
- `cosn://`use s3 client on FE side, because it need to complete multi-part uploading files on FE side.
-  Add `Status directoryExists(String dir)` and `Status deleteDirectory` in `FileSystem`.

---------

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
wuwenchi added a commit to wuwenchi/doris that referenced this pull request May 13, 2024
Issue apache#31442

add iceberg transaction

(cherry picked from commit 68b33c8)
wuwenchi added a commit to wuwenchi/doris that referenced this pull request May 13, 2024
wuwenchi added a commit to wuwenchi/doris that referenced this pull request May 13, 2024
Issue apache#31442

add iceberg transaction

(cherry picked from commit 68b33c8)
wuwenchi added a commit to wuwenchi/doris that referenced this pull request May 21, 2024
Issue apache#31442

add iceberg transaction

(cherry picked from commit 68b33c8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.4-merged meta-change reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants