Skip to content

Conversation

@eldenmoon
Copy link
Member

…(#101)

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

@eldenmoon
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 37405 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9a00591de62dd5da14ae8f4e1da090135ccf8cfb, data reload: false

------ Round 1 ----------------------------------
q1	17650	4672	4572	4572
q2	2342	146	139	139
q3	11246	945	942	942
q4	4865	773	722	722
q5	7989	2962	2968	2962
q6	194	124	123	123
q7	1179	790	773	773
q8	9368	2063	2058	2058
q9	7654	6419	6357	6357
q10	8125	2436	2451	2436
q11	415	216	203	203
q12	753	276	285	276
q13	18030	3280	3269	3269
q14	277	269	256	256
q15	539	503	495	495
q16	488	408	420	408
q17	954	593	502	502
q18	6919	5881	6004	5881
q19	1556	1372	1350	1350
q20	643	341	353	341
q21	7188	3035	3152	3035
q22	819	322	305	305
Total cold run time: 109193 ms
Total hot run time: 37405 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4487	4399	4401	4399
q2	331	246	224	224
q3	3041	2910	2891	2891
q4	1833	1611	1615	1611
q5	5185	5228	5289	5228
q6	193	113	113	113
q7	2220	1820	1745	1745
q8	3123	3251	3264	3251
q9	8392	8343	8281	8281
q10	5855	3612	3634	3612
q11	541	448	462	448
q12	747	578	585	578
q13	9955	3088	3070	3070
q14	266	242	256	242
q15	530	495	495	495
q16	533	478	468	468
q17	1859	1669	1677	1669
q18	8129	7878	7583	7583
q19	10001	1593	1571	1571
q20	2138	1911	1897	1897
q21	4705	4623	4686	4623
q22	573	484	444	444
Total cold run time: 74637 ms
Total hot run time: 54443 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174824 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9a00591de62dd5da14ae8f4e1da090135ccf8cfb, data reload: false

query1	926	341	335	335
query2	6538	2150	1912	1912
query3	6711	215	202	202
query4	31678	21969	21934	21934
query5	4319	356	349	349
query6	255	164	163	163
query7	4596	292	277	277
query8	253	174	180	174
query9	8971	2242	2232	2232
query10	418	221	231	221
query11	18861	15300	15349	15300
query12	132	75	79	75
query13	1620	418	415	415
query14	9393	6913	6861	6861
query15	239	182	183	182
query16	8075	261	256	256
query17	1867	526	501	501
query18	2102	267	262	262
query19	196	144	139	139
query20	83	78	77	77
query21	191	127	119	119
query22	5222	5018	4867	4867
query23	30813	30023	30060	30023
query24	10655	2778	2794	2778
query25	577	345	337	337
query26	1372	143	147	143
query27	3117	305	305	305
query28	7695	1840	1831	1831
query29	867	627	608	608
query30	278	134	138	134
query31	887	742	735	735
query32	87	50	48	48
query33	727	229	222	222
query34	1172	469	468	468
query35	858	761	747	747
query36	1057	889	933	889
query37	115	60	60	60
query38	3314	3081	3107	3081
query39	1304	1260	1246	1246
query40	184	101	90	90
query41	39	34	34	34
query42	98	93	97	93
query43	477	480	497	480
query44	1123	677	692	677
query45	194	181	172	172
query46	1053	645	663	645
query47	1572	1416	1481	1416
query48	443	338	354	338
query49	1102	297	294	294
query50	770	375	385	375
query51	5180	5248	5186	5186
query52	100	87	83	83
query53	329	274	265	265
query54	269	219	217	217
query55	83	76	75	75
query56	210	201	199	199
query57	990	888	899	888
query58	200	172	181	172
query59	2352	2409	2415	2409
query60	242	214	208	208
query61	82	88	82	82
query62	679	343	368	343
query63	297	265	260	260
query64	5352	3666	3439	3439
query65	3271	3212	3243	3212
query66	939	304	311	304
query67	14526	14365	14233	14233
query68	4311	557	540	540
query69	471	330	320	320
query70	1268	1150	1238	1150
query71	324	254	247	247
query72	6003	2845	2648	2648
query73	709	331	336	331
query74	6637	6262	6347	6262
query75	3024	2352	2362	2352
query76	2531	867	900	867
query77	348	226	231	226
query78	9127	8940	8789	8789
query79	2220	501	498	498
query80	1308	356	347	347
query81	533	207	195	195
query82	692	82	76	76
query83	253	122	122	122
query84	227	88	79	79
query85	2130	332	327	327
query86	480	320	292	292
query87	3392	3192	3182	3182
query88	3831	2345	2360	2345
query89	450	357	353	353
query90	1998	164	159	159
query91	152	136	132	132
query92	56	43	41	41
query93	2231	488	473	473
query94	1309	177	177	177
query95	8077	7797	353	353
query96	595	288	276	276
query97	4216	4105	4125	4105
query98	214	196	193	193
query99	1138	670	700	670
Total cold run time: 289952 ms
Total hot run time: 174824 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.20% (8630/23840)
Line Coverage: 28.21% (70570/250133)
Region Coverage: 27.22% (36422/133790)
Branch Coverage: 24.00% (18652/77720)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9a00591de62dd5da14ae8f4e1da090135ccf8cfb_9a00591de62dd5da14ae8f4e1da090135ccf8cfb/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 30.8 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9a00591de62dd5da14ae8f4e1da090135ccf8cfb, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.02	0.03
query3	0.23	0.06	0.05
query4	1.68	0.10	0.10
query5	0.52	0.52	0.52
query6	1.17	0.64	0.64
query7	0.01	0.01	0.01
query8	0.03	0.02	0.03
query9	0.54	0.50	0.49
query10	0.55	0.57	0.56
query11	0.11	0.08	0.09
query12	0.11	0.09	0.09
query13	0.60	0.61	0.59
query14	0.79	0.81	0.79
query15	0.79	0.79	0.76
query16	0.39	0.37	0.39
query17	1.01	1.02	0.99
query18	0.22	0.27	0.24
query19	1.83	1.82	1.79
query20	0.01	0.01	0.01
query21	15.40	0.57	0.55
query22	2.44	2.73	1.46
query23	17.36	0.81	0.78
query24	2.48	1.12	1.25
query25	0.32	0.17	0.21
query26	0.67	0.16	0.14
query27	0.05	0.05	0.06
query28	12.13	0.84	0.83
query29	12.54	3.12	3.12
query30	0.66	0.54	0.51
query31	2.79	0.35	0.36
query32	3.35	0.48	0.48
query33	3.21	3.21	3.23
query34	15.73	4.32	4.34
query35	4.30	4.27	4.26
query36	1.10	1.05	1.06
query37	0.07	0.05	0.05
query38	0.04	0.03	0.03
query39	0.02	0.02	0.01
query40	0.16	0.13	0.12
query41	0.08	0.01	0.01
query42	0.03	0.02	0.01
query43	0.03	0.02	0.02
Total cold run time: 105.65 s
Total hot run time: 30.8 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 9a00591de62dd5da14ae8f4e1da090135ccf8cfb with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      31 seconds loaded 861443392 Bytes, about 26 MB/s
Insert into select:       13.7 seconds inserted 10000000 Rows, about 729K ops/s

std::string output;
google::protobuf::io::StringOutputStream string_output_stream(&output);
google::protobuf::io::CodedOutputStream output_stream(&string_output_stream);
output_stream.SetSerializationDeterministic(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure the deterministic serialize will take affect on map value?
void CodedOutputStream::SetSerializationDeterministic(
bool value)
Indicate to the serializer whether the user wants derministic serialization.

The default when this is not called comes from the global default, controlled by SetDefaultSerializationDeterministic.

What deterministic serialization means is entirely up to the driver of the serialization process (i.e. the caller of methods like WriteVarint32). In the case of serializing a proto buffer message using one of the methods of MessageLite, this means that for a given binary equal messages will always be serialized to the same bytes. This implies:

Repeated serialization of a message will return the same bytes.
Different processes running the same binary (including on different
machines) will serialize equal messages to the same bytes.

Note that this is not canonical across languages. It is also unstable across different builds with intervening message definition changes, due to unknown fields. Users who need canonical serialization (e.g. persistent storage in a canonical form, fingerprinting) should define their own canonicalization specification and implement the serializer using reflection APIs rather than relying on this API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes , map serialization is none deterministic by default, and SetSerializationDeterministic to true will make it deterministic.From the document above This means that for a given binary equal messages will always be serialized to the same bytes.

I've also tested and it's deterministic as expected

string dump_structure() const {
string str = "[";
for (auto p : _field_name_to_index) {
for (auto p : _cols) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change this code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_field_name_to_index is an unorded_map. change it to make output ordered

Copy link
Contributor

@xiaokang xiaokang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 18, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@eldenmoon eldenmoon merged commit ee6d24a into apache:master Feb 18, 2024
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Mar 7, 2024
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Mar 7, 2024
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Mar 7, 2024
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Mar 7, 2024
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.6-merged dev/2.1.0 reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants