Skip to content

Conversation

@wyxxxcat
Copy link
Contributor

@wyxxxcat wyxxxcat commented Aug 10, 2024

Proposed changes

Issue Number: #38977

pAGfT6f.png

doc : apache/doris-website#1014

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions github-actions bot added the doing label Aug 10, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@wyxxxcat wyxxxcat force-pushed the REGR_agg_func branch 2 times, most recently from 969b94a to 3bc2fcd Compare August 11, 2024 07:07
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@wyxxxcat wyxxxcat force-pushed the REGR_agg_func branch 2 times, most recently from 68a44ac to a441b02 Compare August 13, 2024 09:06
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@wyxxxcat
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39846 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a441b026bf5eac2d6dd220a9d4e55310ec0567d9, data reload: false

------ Round 1 ----------------------------------
q1	17637	4498	4300	4300
q2	2009	183	182	182
q3	11857	1012	1101	1012
q4	10548	768	822	768
q5	7764	2758	2783	2758
q6	219	138	138	138
q7	935	589	591	589
q8	9570	2021	2031	2021
q9	10150	6526	6552	6526
q10	7041	2210	2254	2210
q11	446	246	246	246
q12	398	230	223	223
q13	17762	2993	2976	2976
q14	272	228	240	228
q15	522	484	490	484
q16	496	414	378	378
q17	976	729	800	729
q18	8037	7429	7393	7393
q19	6689	1068	983	983
q20	686	337	342	337
q21	5269	4618	4360	4360
q22	1093	1005	1028	1005
Total cold run time: 120376 ms
Total hot run time: 39846 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4423	4314	4299	4299
q2	379	264	262	262
q3	2975	2819	2773	2773
q4	2012	1796	1772	1772
q5	5595	5530	5641	5530
q6	234	134	138	134
q7	2131	1748	1769	1748
q8	3251	3440	3437	3437
q9	8732	8796	8712	8712
q10	3422	3268	3293	3268
q11	612	520	498	498
q12	833	633	656	633
q13	16012	3155	3198	3155
q14	310	299	312	299
q15	542	484	492	484
q16	496	434	441	434
q17	1803	1555	1492	1492
q18	8146	8238	7876	7876
q19	2364	1563	1497	1497
q20	2164	1869	1872	1869
q21	5549	5031	5200	5031
q22	1145	1044	1012	1012
Total cold run time: 73130 ms
Total hot run time: 56215 ms

@zclllyybb
Copy link
Contributor

测试注意这些点都得覆盖:

  1. 对空表做操作
  2. 对字面量做操作,以及一个为Column一个为字面量这种
  3. nullable和非nullable的入参都要有,也要考虑混合的情况。可以通过套nullable()函数很容易实现。
  4. 这个函数是否有特殊值,比如没有定义的情况

@wyxxxcat wyxxxcat marked this pull request as ready for review August 14, 2024 04:29
@wyxxxcat
Copy link
Contributor Author

run buildall

@wyxxxcat
Copy link
Contributor Author

run buildall

Copy link
Contributor

@morrySnow morrySnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@doris-robot
Copy link

TPC-H: Total hot run time: 37911 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 551d03892bdc03c374999c9c5da8fe7979561c55, data reload: false

------ Round 1 ----------------------------------
q1	18328	4726	4362	4362
q2	2037	174	177	174
q3	10491	1201	1052	1052
q4	10145	777	700	700
q5	7770	2810	2801	2801
q6	226	139	140	139
q7	981	586	586	586
q8	9326	2044	2075	2044
q9	7312	6534	6537	6534
q10	7009	2208	2204	2204
q11	463	245	243	243
q12	392	222	216	216
q13	18859	2995	3013	2995
q14	286	243	230	230
q15	530	490	485	485
q16	501	391	387	387
q17	979	728	688	688
q18	7429	6923	6929	6923
q19	6333	1055	945	945
q20	671	325	346	325
q21	4098	2884	3036	2884
q22	1079	1020	994	994
Total cold run time: 115245 ms
Total hot run time: 37911 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4549	4268	4286	4268
q2	366	260	264	260
q3	2865	2572	2612	2572
q4	1875	1634	1591	1591
q5	5336	5393	5366	5366
q6	216	128	128	128
q7	2022	1684	1678	1678
q8	3184	3302	3324	3302
q9	8406	8347	8357	8347
q10	3394	3151	3162	3151
q11	616	492	495	492
q12	769	600	622	600
q13	17280	3014	2980	2980
q14	301	285	270	270
q15	527	491	474	474
q16	463	420	412	412
q17	1765	1492	1487	1487
q18	7735	7575	7322	7322
q19	1699	1646	1538	1538
q20	2007	1804	1754	1754
q21	5167	5138	5126	5126
q22	1101	1032	1013	1013
Total cold run time: 71643 ms
Total hot run time: 54131 ms

@wyxxxcat
Copy link
Contributor Author

Result type shoule be AlwaysNullable

Done

@wyxxxcat
Copy link
Contributor Author

wyxxxcat commented Oct 8, 2024

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.23% (9632/25872)
Line Coverage: 28.65% (79856/278765)
Region Coverage: 28.08% (41285/147022)
Branch Coverage: 24.71% (21032/85108)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ea509164088be5160eab9a3ff97ac9daffc4e8e5_ea509164088be5160eab9a3ff97ac9daffc4e8e5/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 40569 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ea509164088be5160eab9a3ff97ac9daffc4e8e5, data reload: false

------ Round 1 ----------------------------------
q1	17629	7402	7237	7237
q2	2007	280	276	276
q3	12117	1072	1166	1072
q4	10567	732	760	732
q5	7754	2851	2842	2842
q6	240	151	146	146
q7	1027	667	618	618
q8	9367	1942	1942	1942
q9	6915	6381	6367	6367
q10	7017	2294	2295	2294
q11	439	256	253	253
q12	407	222	223	222
q13	17761	2963	2975	2963
q14	248	218	213	213
q15	578	518	521	518
q16	640	589	597	589
q17	967	577	473	473
q18	7119	6662	6727	6662
q19	1346	988	890	890
q20	498	198	203	198
q21	3943	3211	3079	3079
q22	1097	991	983	983
Total cold run time: 109683 ms
Total hot run time: 40569 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7227	7177	7285	7177
q2	318	229	232	229
q3	2976	2995	2924	2924
q4	2068	1926	1839	1839
q5	5737	5769	5726	5726
q6	224	140	149	140
q7	2231	1871	1799	1799
q8	3361	3554	3515	3515
q9	8943	8869	8891	8869
q10	3588	3573	3523	3523
q11	572	505	502	502
q12	833	638	644	638
q13	9984	3194	3183	3183
q14	304	272	280	272
q15	595	540	511	511
q16	687	657	665	657
q17	1847	1648	1602	1602
q18	8146	7860	7398	7398
q19	1688	1415	1503	1415
q20	2058	1873	1908	1873
q21	5579	5409	5418	5409
q22	1134	1072	1080	1072
Total cold run time: 70100 ms
Total hot run time: 60273 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191577 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ea509164088be5160eab9a3ff97ac9daffc4e8e5, data reload: false

query1	912	392	396	392
query2	6284	2043	2034	2034
query3	8678	199	202	199
query4	34075	23541	23490	23490
query5	3440	465	433	433
query6	269	165	163	163
query7	4201	309	313	309
query8	278	239	217	217
query9	9436	2705	2694	2694
query10	454	284	286	284
query11	17697	15110	15184	15110
query12	156	100	102	100
query13	1591	463	446	446
query14	9496	7155	7322	7155
query15	241	170	180	170
query16	7921	479	443	443
query17	1638	608	594	594
query18	1994	331	315	315
query19	242	156	174	156
query20	121	114	112	112
query21	205	112	108	108
query22	4662	4455	4397	4397
query23	35177	34033	33898	33898
query24	11090	2872	2854	2854
query25	601	440	417	417
query26	745	164	168	164
query27	2140	308	299	299
query28	6919	2415	2399	2399
query29	762	442	435	435
query30	255	154	155	154
query31	1057	814	803	803
query32	101	54	56	54
query33	768	302	303	302
query34	915	512	512	512
query35	860	731	738	731
query36	1104	952	959	952
query37	146	93	87	87
query38	4034	3937	3904	3904
query39	1472	1477	1449	1449
query40	207	95	96	95
query41	47	44	46	44
query42	113	96	96	96
query43	533	489	498	489
query44	1232	801	805	801
query45	194	166	165	165
query46	1135	722	703	703
query47	1976	1832	1827	1827
query48	439	362	358	358
query49	897	418	407	407
query50	826	431	434	431
query51	7014	7034	6780	6780
query52	99	90	90	90
query53	253	179	187	179
query54	1174	479	498	479
query55	81	77	74	74
query56	291	273	264	264
query57	1280	1171	1153	1153
query58	229	231	234	231
query59	3187	2972	2823	2823
query60	302	266	270	266
query61	107	99	106	99
query62	856	672	668	668
query63	214	182	177	177
query64	4107	661	606	606
query65	3243	3203	3178	3178
query66	820	302	329	302
query67	15953	15475	15490	15475
query68	4452	553	576	553
query69	532	295	308	295
query70	1194	1113	1159	1113
query71	368	274	275	274
query72	7349	4001	3962	3962
query73	776	355	353	353
query74	9447	8957	9151	8957
query75	3416	2671	2697	2671
query76	2642	875	965	875
query77	525	304	291	291
query78	10690	9621	9570	9570
query79	2113	577	594	577
query80	1131	454	452	452
query81	590	243	246	243
query82	716	143	140	140
query83	259	144	136	136
query84	257	76	75	75
query85	1326	309	287	287
query86	465	303	306	303
query87	4419	4345	4359	4345
query88	3606	2484	2455	2455
query89	405	287	290	287
query90	1920	185	186	185
query91	156	106	116	106
query92	63	48	50	48
query93	2503	567	540	540
query94	1030	311	298	298
query95	356	265	258	258
query96	628	289	283	283
query97	3300	3206	3161	3161
query98	218	202	198	198
query99	1531	1293	1291	1291
Total cold run time: 297329 ms
Total hot run time: 191577 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.86 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ea509164088be5160eab9a3ff97ac9daffc4e8e5, data reload: false

query1	0.05	0.04	0.05
query2	0.06	0.02	0.03
query3	0.23	0.06	0.06
query4	1.66	0.10	0.10
query5	0.52	0.50	0.51
query6	1.14	0.75	0.73
query7	0.03	0.01	0.02
query8	0.04	0.03	0.03
query9	0.56	0.50	0.49
query10	0.56	0.56	0.55
query11	0.14	0.11	0.10
query12	0.13	0.11	0.12
query13	0.61	0.60	0.58
query14	2.73	2.71	2.76
query15	0.90	0.83	0.82
query16	0.37	0.39	0.38
query17	1.02	1.06	0.98
query18	0.20	0.19	0.21
query19	1.87	1.86	1.97
query20	0.01	0.01	0.02
query21	15.36	0.58	0.58
query22	2.46	1.76	2.52
query23	16.75	0.99	0.73
query24	3.10	1.59	1.29
query25	0.32	0.18	0.06
query26	0.42	0.15	0.14
query27	0.04	0.05	0.05
query28	10.25	1.11	1.07
query29	12.55	3.31	3.26
query30	0.25	0.06	0.06
query31	2.88	0.38	0.38
query32	3.29	0.47	0.47
query33	2.96	3.02	3.01
query34	17.31	4.51	4.45
query35	4.51	4.50	4.51
query36	0.65	0.48	0.48
query37	0.08	0.06	0.06
query38	0.05	0.03	0.04
query39	0.03	0.02	0.02
query40	0.16	0.12	0.13
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.38 s
Total hot run time: 32.86 s

@wyxxxcat wyxxxcat force-pushed the REGR_agg_func branch 2 times, most recently from ccc16f4 to f5a0bf2 Compare October 11, 2024 08:59
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@wyxxxcat
Copy link
Contributor Author

@zhiqiang-hhhh plz review

}

Float64 get_regr_sxx_result() const {
Float64 result = sum_of_x_squared - (sum_x * sum_x) / count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if count is zero

Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use template to avoid redundant agg data field.

Float64 sum_y {};
Float64 sum_of_x_mul_y {};
Float64 sum_of_x_squared {};
Float64 sum_of_y_squared {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sum_of_y_squared is only needed for regr_syy, in other situations, this field is not necessary.
so we should use template to make our implementation more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Need refactor to simplify code.
  2. Need more test.

Float64 get_regr_sxx_result() const {
// count == 0
// The result of a query for an empty table is a null value
Float64 result = sum_of_x_squared - (sum_x * sum_x / count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (count == 0) {
   return Nan;
}

Float64 get_regr_sxy_result() const {
// count == 0
// The result of a query for an empty table is a null value
Float64 result = sum_of_x_mul_y - (sum_x * sum_y / count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (count == 0 ) {
    return Nan;
}

Float64 get_regr_syy_result() const {
// count == 0
// The result of a query for an empty table is a null value
Float64 result = sum_of_y_squared - (sum_y * sum_y / count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if count == 0 {
    return Nan;
}

@zhiqiang-hhhh
Copy link
Contributor

@wyxxxcat Could you please contact me on wechat? 839616693

@github-actions
Copy link
Contributor

We're closing this PR because it hasn't been updated in a while.
This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and feel free a maintainer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 14, 2025
@wyxxxcat wyxxxcat closed this Apr 14, 2025
@zclllyybb zclllyybb reopened this Jun 30, 2025
@zclllyybb zclllyybb self-assigned this Jun 30, 2025
@github-actions github-actions bot closed this Jul 1, 2025
JoverZhang pushed a commit to JoverZhang/doris that referenced this pull request Dec 22, 2025
JoverZhang pushed a commit to JoverZhang/doris that referenced this pull request Dec 22, 2025
zclllyybb pushed a commit that referenced this pull request Dec 29, 2025
…onRegrData (#59224)

### What problem does this PR solve?

Issue Number: close #38977

Problem Summary:

This PR migrates regr_sxx/syy/sxy onto the shared
Moment(AggregateFunctionRegrData) introduced in #55940.

The original implementation and tests were done in #39187 by @wyxxxcat.
This PR builds on top of that work, refactoring it to reuse the same
state and merge logic.

---------

Co-authored-by: wyxxxcat <1520358997@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants