Skip to content

Conversation

@Gabriel39
Copy link
Contributor

@Gabriel39 Gabriel39 commented Apr 2, 2024

Proposed changes

Implement another count distinct function.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <stddef.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: inclusion of deprecated C++ header 'stddef.h'; consider using 'cstddef' instead [modernize-deprecated-headers]

Suggested change
#include <stddef.h>
#include <cstddef>

#include <stddef.h>

#include <algorithm>
#include <boost/iterator/iterator_facade.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'boost/iterator/iterator_facade.hpp' file not found [clang-diagnostic-error]

#include <boost/iterator/iterator_facade.hpp>
         ^

Comment on lines 44 to 45
namespace doris {
namespace vectorized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]

Suggested change
namespace doris {
namespace vectorized {
namespace doris::vectorized {

be/src/vec/aggregate_functions/aggregate_function_uniq_without_key.h:48:

- } // namespace vectorized
- } // namespace doris
+ } // namespace doris

auto data = reinterpret_cast<const UInt64*>(
assert_cast<const ColumnFixedLengthObject&>(column).get_data().data());
for (size_t i = 0; i != num_rows; ++i) {
auto rhs_place = places + sizeof(Data) * i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'auto rhs_place' can be declared as 'auto *rhs_place' [readability-qualified-auto]

Suggested change
auto rhs_place = places + sizeof(Data) * i;
auto *rhs_place = places + sizeof(Data) * i;

@Gabriel39 Gabriel39 closed this Apr 2, 2024
@Gabriel39 Gabriel39 reopened this Apr 2, 2024
@Gabriel39 Gabriel39 marked this pull request as draft April 2, 2024 12:14
@Gabriel39
Copy link
Contributor Author

run buildall

@Gabriel39 Gabriel39 marked this pull request as ready for review April 23, 2024 07:56
@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.20% (8917/25329)
Line Coverage: 26.98% (73336/271825)
Region Coverage: 26.16% (37884/144835)
Branch Coverage: 22.97% (19291/83968)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8ede6819495382035ce76b234a7eb7d59e6e157e_8ede6819495382035ce76b234a7eb7d59e6e157e/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 31.27 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 8ede6819495382035ce76b234a7eb7d59e6e157e, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.67	0.09	0.08
query5	0.50	0.49	0.51
query6	1.14	0.86	0.82
query7	0.02	0.01	0.01
query8	0.05	0.03	0.04
query9	0.50	0.45	0.45
query10	0.50	0.52	0.51
query11	0.15	0.11	0.11
query12	0.14	0.12	0.11
query13	0.65	0.65	0.63
query14	0.90	0.99	1.08
query15	0.85	0.85	0.84
query16	0.36	0.36	0.37
query17	1.05	1.01	1.03
query18	0.19	0.24	0.22
query19	1.91	1.80	1.81
query20	0.01	0.02	0.01
query21	15.42	0.66	0.66
query22	4.46	7.34	1.55
query23	18.35	1.40	1.33
query24	1.40	0.44	0.26
query25	0.15	0.10	0.09
query26	0.27	0.17	0.18
query27	0.08	0.08	0.09
query28	13.32	1.02	1.01
query29	12.63	3.40	3.40
query30	0.26	0.08	0.06
query31	2.84	0.39	0.40
query32	3.23	0.48	0.48
query33	2.70	2.91	2.89
query34	17.46	4.62	4.52
query35	4.54	4.65	4.63
query36	0.65	0.46	0.46
query37	0.22	0.18	0.18
query38	0.19	0.18	0.19
query39	0.05	0.04	0.04
query40	0.19	0.15	0.14
query41	0.11	0.06	0.06
query42	0.07	0.06	0.05
query43	0.05	0.05	0.05
Total cold run time: 109.58 s
Total hot run time: 31.27 s

BiteTheDDDDt
BiteTheDDDDt previously approved these changes Apr 23, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 23, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 26, 2024
@Gabriel39
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <stddef.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: inclusion of deprecated C++ header 'stddef.h'; consider using 'cstddef' instead [modernize-deprecated-headers]

Suggested change
#include <stddef.h>
#include <cstddef>

#include <stddef.h>

#include <algorithm>
#include <boost/iterator/iterator_facade.hpp>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'boost/iterator/iterator_facade.hpp' file not found [clang-diagnostic-error]

#include <boost/iterator/iterator_facade.hpp>
         ^

Comment on lines +44 to +45
namespace doris {
namespace vectorized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]

Suggested change
namespace doris {
namespace vectorized {
namespace doris::vectorized {

be/src/vec/aggregate_functions/aggregate_function_uniq_distribute_key.h:48:

- } // namespace vectorized
- } // namespace doris
+ } // namespace doris

auto data = reinterpret_cast<const UInt64*>(
assert_cast<const ColumnFixedLengthObject&>(column).get_data().data());
for (size_t i = 0; i != num_rows; ++i) {
auto rhs_place = places + sizeof(Data) * i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'auto rhs_place' can be declared as 'auto *rhs_place' [readability-qualified-auto]

Suggested change
auto rhs_place = places + sizeof(Data) * i;
auto *rhs_place = places + sizeof(Data) * i;

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 26, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.16% (8923/25375)
Line Coverage: 26.97% (73391/272146)
Region Coverage: 26.14% (37909/145027)
Branch Coverage: 22.96% (19302/84050)
Coverage Report: http://coverage.selectdb-in.cc/coverage/e1e7197e5090241e3d88e378809f8aa7ff6ee28f_e1e7197e5090241e3d88e378809f8aa7ff6ee28f/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 40813 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e1e7197e5090241e3d88e378809f8aa7ff6ee28f, data reload: false

------ Round 1 ----------------------------------
q1	17614	4398	4264	4264
q2	2011	186	188	186
q3	10496	1242	1230	1230
q4	10453	855	736	736
q5	7842	2717	2724	2717
q6	218	133	136	133
q7	1050	626	600	600
q8	9414	2144	2102	2102
q9	9034	6794	6662	6662
q10	8805	3760	3700	3700
q11	498	235	246	235
q12	393	225	213	213
q13	17752	2988	2953	2953
q14	266	230	225	225
q15	502	477	468	468
q16	489	377	386	377
q17	949	669	707	669
q18	8029	7473	7315	7315
q19	1635	1524	1508	1508
q20	686	297	301	297
q21	5121	3984	3959	3959
q22	331	264	273	264
Total cold run time: 113588 ms
Total hot run time: 40813 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4255	4177	4201	4177
q2	376	263	268	263
q3	2960	2756	2717	2717
q4	1813	1552	1579	1552
q5	5286	5307	5274	5274
q6	210	123	124	123
q7	2288	1843	1826	1826
q8	3192	3337	3322	3322
q9	8596	8505	8564	8505
q10	3897	3737	3695	3695
q11	575	480	474	474
q12	744	597	595	595
q13	17135	2921	2961	2921
q14	322	272	279	272
q15	518	471	472	471
q16	481	417	412	412
q17	1767	1472	1459	1459
q18	7662	7471	7409	7409
q19	5549	1567	1553	1553
q20	1989	1725	1754	1725
q21	4761	4732	4758	4732
q22	559	477	500	477
Total cold run time: 74935 ms
Total hot run time: 53954 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186747 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e1e7197e5090241e3d88e378809f8aa7ff6ee28f, data reload: false

query1	896	361	351	351
query2	6453	2384	2348	2348
query3	6662	203	211	203
query4	22965	21125	21030	21030
query5	4181	434	422	422
query6	277	196	179	179
query7	4590	291	284	284
query8	248	204	185	185
query9	8424	2297	2276	2276
query10	424	236	256	236
query11	14725	14087	14165	14087
query12	130	88	87	87
query13	1648	359	361	359
query14	10505	7097	8106	7097
query15	254	165	168	165
query16	8156	253	251	251
query17	1867	562	533	533
query18	2105	271	265	265
query19	199	147	150	147
query20	90	86	84	84
query21	197	126	134	126
query22	5063	4848	4849	4848
query23	33771	33157	33104	33104
query24	11910	2954	2925	2925
query25	650	372	366	366
query26	1755	148	159	148
query27	3019	313	305	305
query28	7664	1980	1980	1980
query29	1010	594	575	575
query30	295	150	149	149
query31	957	732	703	703
query32	92	50	53	50
query33	749	242	238	238
query34	1059	471	474	471
query35	830	669	668	668
query36	1059	911	899	899
query37	285	66	70	66
query38	3169	3048	2996	2996
query39	1586	1550	1520	1520
query40	279	123	124	123
query41	42	40	43	40
query42	105	91	96	91
query43	581	532	523	523
query44	1178	715	737	715
query45	305	257	270	257
query46	1079	712	722	712
query47	1925	1852	1852	1852
query48	361	284	290	284
query49	1188	395	400	395
query50	765	367	385	367
query51	6770	6703	6617	6617
query52	103	86	87	86
query53	347	280	275	275
query54	310	238	226	226
query55	75	70	70	70
query56	247	220	218	218
query57	1243	1106	1149	1106
query58	220	195	198	195
query59	3477	3324	3364	3324
query60	264	239	232	232
query61	92	111	98	98
query62	649	436	439	436
query63	301	272	277	272
query64	9487	7144	7245	7144
query65	3123	3063	3054	3054
query66	1404	348	347	347
query67	15564	15434	15561	15434
query68	5198	533	525	525
query69	464	303	313	303
query70	1208	1165	1162	1162
query71	393	277	270	270
query72	7403	2811	2570	2570
query73	699	318	321	318
query74	6590	6224	6249	6224
query75	3449	2774	2759	2759
query76	3191	976	998	976
query77	476	264	265	264
query78	11433	10778	10658	10658
query79	9328	515	521	515
query80	2081	437	446	437
query81	602	245	234	234
query82	2150	106	104	104
query83	285	174	172	172
query84	280	86	86	86
query85	2643	347	267	267
query86	706	311	293	293
query87	3328	3114	3135	3114
query88	5476	2360	2349	2349
query89	509	386	362	362
query90	2010	177	183	177
query91	126	96	99	96
query92	64	49	46	46
query93	6431	522	490	490
query94	2371	191	179	179
query95	647	298	298	298
query96	611	273	270	270
query97	3793	2929	2974	2929
query98	238	228	213	213
query99	1271	868	877	868
Total cold run time: 306857 ms
Total hot run time: 186747 ms

@Gabriel39 Gabriel39 merged commit ee4196d into apache:master Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants