Skip to content

Conversation

@superdiaodiao
Copy link
Contributor

@superdiaodiao superdiaodiao commented Apr 5, 2024

Proposed changes

Issue Number: #31737

Consider this table named group_array_intersect_test, which has id and c_array_int columns:

+------+-----------------+
| id   | c_array_int     |
+------+-----------------+
|    0 | [0]             |
|    6 | [null]          |
|   12 | [12, null, 13]  |
|   14 | [12, 13]        |
+------+-----------------+

we can use group_array_intersect to get the intersect element(s) from given array like these queries:

mysql> select group_array_intersect(c_array_int) from group_array_intersect_test where id in (6, 12);
+------------------------------------+
| group_array_intersect(c_array_int) |
+------------------------------------+
| [null]                             |
+------------------------------------+
1 row in set (0.02 sec)

mysql> select group_array_intersect(c_array_int) from group_array_intersect_test where id in (14, 12);
+------------------------------------+
| group_array_intersect(c_array_int) |
+------------------------------------+
| [13, 12]                           |
+------------------------------------+
1 row in set (0.01 sec)

mysql> select group_array_intersect(c_array_int) from group_array_intersect_test where id in (0, 6);
+------------------------------------+
| group_array_intersect(c_array_int) |
+------------------------------------+
| []                                 |
+------------------------------------+
1 row in set (0.00 sec)

Furthermore, this function supports all kinds of array type, not only array(int) mentioned above, but also array(varchar()), array(date) and so on.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@superdiaodiao superdiaodiao force-pushed the group_array_intersect branch from eb3889e to 96f8c80 Compare April 5, 2024 03:37
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@superdiaodiao superdiaodiao force-pushed the group_array_intersect branch 2 times, most recently from f918d19 to 0784ce8 Compare April 5, 2024 04:11
@github-actions
Copy link
Contributor

github-actions bot commented Apr 5, 2024

clang-tidy review says "All clean, LGTM! 👍"

@superdiaodiao superdiaodiao force-pushed the group_array_intersect branch from 0784ce8 to 1b3ea07 Compare April 5, 2024 04:15
@superdiaodiao superdiaodiao force-pushed the group_array_intersect branch from 1b3ea07 to 6aa4266 Compare April 5, 2024 04:17
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@superdiaodiao
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38737 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6aa4266164ab4c81109e43d77e2bf2504a377ca8, data reload: false

------ Round 1 ----------------------------------
q1	17647	4215	4088	4088
q2	2011	183	179	179
q3	10810	1260	1473	1260
q4	10516	829	1004	829
q5	8903	3074	3016	3016
q6	220	136	137	136
q7	1127	641	627	627
q8	9631	1973	2064	1973
q9	6820	6191	6136	6136
q10	8444	3528	3527	3527
q11	423	242	230	230
q12	377	208	207	207
q13	17788	2901	2930	2901
q14	261	232	237	232
q15	535	489	478	478
q16	505	396	364	364
q17	955	913	887	887
q18	7324	6490	6429	6429
q19	1597	1544	1536	1536
q20	571	326	294	294
q21	3547	3102	3132	3102
q22	366	306	331	306
Total cold run time: 110378 ms
Total hot run time: 38737 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4064	4060	4052	4052
q2	332	217	226	217
q3	2969	2956	2946	2946
q4	1861	1875	1868	1868
q5	5221	5211	5207	5207
q6	209	125	125	125
q7	2244	1801	1781	1781
q8	3222	3288	3284	3284
q9	8481	8474	8536	8474
q10	3766	3819	3846	3819
q11	556	456	444	444
q12	712	514	557	514
q13	10697	2868	2934	2868
q14	285	261	271	261
q15	510	471	470	470
q16	447	399	403	399
q17	1719	1657	1661	1657
q18	7601	7100	7150	7100
q19	1634	1646	1638	1638
q20	1946	1744	1732	1732
q21	5015	4704	4787	4704
q22	504	424	448	424
Total cold run time: 63995 ms
Total hot run time: 53984 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.55% (8906/25053)
Line Coverage: 27.27% (73124/268132)
Region Coverage: 26.40% (37810/143203)
Branch Coverage: 23.16% (19272/83210)
Coverage Report: http://coverage.selectdb-in.cc/coverage/78e6ecd485ea94025feb7c77edb97dfc80a883e8_78e6ecd485ea94025feb7c77edb97dfc80a883e8/report/index.html

@superdiaodiao
Copy link
Contributor Author

run p0

@zclllyybb
Copy link
Contributor

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.54% (8906/25056)
Line Coverage: 27.27% (73140/268215)
Region Coverage: 26.41% (37829/143251)
Branch Coverage: 23.16% (19271/83222)
Coverage Report: http://coverage.selectdb-in.cc/coverage/78e6ecd485ea94025feb7c77edb97dfc80a883e8_78e6ecd485ea94025feb7c77edb97dfc80a883e8/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38777 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 78e6ecd485ea94025feb7c77edb97dfc80a883e8, data reload: false

------ Round 1 ----------------------------------
q1	17662	4961	4370	4370
q2	2647	195	200	195
q3	11085	1234	1228	1228
q4	10615	833	811	811
q5	7612	2787	2691	2691
q6	225	133	129	129
q7	1008	611	597	597
q8	9478	2061	2052	2052
q9	7945	6633	6548	6548
q10	8472	3487	3567	3487
q11	466	227	228	227
q12	464	220	206	206
q13	19106	2938	2945	2938
q14	275	229	235	229
q15	513	479	477	477
q16	503	394	373	373
q17	1029	677	740	677
q18	7369	6770	6683	6683
q19	1595	1534	1519	1519
q20	703	303	296	296
q21	3570	2867	2754	2754
q22	359	298	290	290
Total cold run time: 112701 ms
Total hot run time: 38777 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4241	4175	4164	4164
q2	369	252	268	252
q3	3020	2723	2757	2723
q4	1892	1591	1614	1591
q5	5304	5278	5310	5278
q6	205	126	120	120
q7	2278	1844	1848	1844
q8	3152	3347	3337	3337
q9	8514	8504	8553	8504
q10	3868	3735	3703	3703
q11	558	470	463	463
q12	759	587	594	587
q13	17798	2925	2948	2925
q14	302	275	275	275
q15	511	467	458	458
q16	465	405	410	405
q17	1737	1481	1436	1436
q18	7488	7611	7294	7294
q19	1670	1543	1570	1543
q20	1964	1733	1729	1729
q21	4799	4715	4654	4654
q22	540	455	452	452
Total cold run time: 71434 ms
Total hot run time: 53737 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182549 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 78e6ecd485ea94025feb7c77edb97dfc80a883e8, data reload: false

query1	890	1126	1120	1120
query2	6505	2590	2455	2455
query3	6657	206	206	206
query4	37771	21570	21353	21353
query5	4159	384	400	384
query6	225	177	175	175
query7	4059	291	280	280
query8	224	175	175	175
query9	5770	2300	2306	2300
query10	540	248	242	242
query11	14662	14224	14172	14172
query12	140	91	88	88
query13	995	370	349	349
query14	8939	6651	6656	6651
query15	197	176	175	175
query16	7172	265	254	254
query17	1483	568	552	552
query18	1494	274	269	269
query19	194	151	151	151
query20	92	87	87	87
query21	206	125	125	125
query22	4963	4882	4850	4850
query23	33559	33048	32773	32773
query24	12890	2930	2809	2809
query25	554	363	369	363
query26	1864	152	151	151
query27	3147	305	308	305
query28	7818	2037	2023	2023
query29	863	601	590	590
query30	303	156	160	156
query31	886	698	720	698
query32	60	52	53	52
query33	589	253	257	253
query34	889	483	485	483
query35	840	693	698	693
query36	1067	929	944	929
query37	285	69	69	69
query38	3517	3461	3440	3440
query39	1572	1532	1523	1523
query40	272	128	126	126
query41	46	48	43	43
query42	106	96	91	91
query43	572	561	559	559
query44	1420	698	698	698
query45	285	272	258	258
query46	1050	712	729	712
query47	1970	1850	1845	1845
query48	349	293	286	286
query49	1139	369	360	360
query50	750	375	370	370
query51	6656	6560	6510	6510
query52	104	86	93	86
query53	353	274	274	274
query54	254	227	218	218
query55	76	72	71	71
query56	243	222	238	222
query57	1198	1128	1131	1128
query58	229	203	196	196
query59	3507	3400	3230	3230
query60	254	232	235	232
query61	91	91	94	91
query62	640	428	440	428
query63	296	275	270	270
query64	4964	3673	4089	3673
query65	3091	3010	2999	2999
query66	1321	342	317	317
query67	15704	14981	14940	14940
query68	4777	528	536	528
query69	516	291	300	291
query70	1272	1149	1207	1149
query71	424	272	265	265
query72	6498	2610	2415	2415
query73	724	315	310	310
query74	6836	6384	6478	6384
query75	3117	2370	2350	2350
query76	3152	1101	1142	1101
query77	656	246	245	245
query78	10910	10234	10170	10170
query79	2964	515	510	510
query80	2106	429	421	421
query81	535	234	236	234
query82	1108	96	99	96
query83	343	181	180	180
query84	269	86	93	86
query85	1731	312	308	308
query86	476	297	294	294
query87	3828	3511	3509	3509
query88	5379	2342	2268	2268
query89	486	371	386	371
query90	1977	177	174	174
query91	130	111	108	108
query92	62	49	48	48
query93	4884	514	491	491
query94	1285	182	180	180
query95	389	300	284	284
query96	593	269	260	260
query97	2671	2505	2506	2505
query98	241	231	212	212
query99	1245	846	869	846
Total cold run time: 296564 ms
Total hot run time: 182549 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 78e6ecd485ea94025feb7c77edb97dfc80a883e8, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.03	0.04
query3	0.22	0.05	0.05
query4	1.68	0.06	0.08
query5	0.49	0.50	0.48
query6	1.45	0.66	0.66
query7	0.02	0.01	0.02
query8	0.05	0.04	0.04
query9	0.56	0.50	0.50
query10	0.55	0.56	0.55
query11	0.14	0.11	0.11
query12	0.14	0.12	0.12
query13	0.59	0.58	0.57
query14	0.77	0.76	0.78
query15	0.82	0.80	0.81
query16	0.37	0.36	0.37
query17	0.93	1.00	1.01
query18	0.22	0.23	0.24
query19	1.87	1.65	1.75
query20	0.02	0.01	0.01
query21	15.40	0.66	0.65
query22	4.26	7.40	1.95
query23	18.31	1.30	1.25
query24	1.77	0.22	0.23
query25	0.15	0.08	0.08
query26	0.26	0.16	0.15
query27	0.08	0.07	0.08
query28	13.46	0.99	0.98
query29	12.60	3.29	3.29
query30	0.26	0.06	0.06
query31	2.86	0.37	0.38
query32	3.28	0.48	0.45
query33	2.78	2.81	2.81
query34	17.17	4.37	4.40
query35	4.47	4.45	4.45
query36	0.65	0.46	0.47
query37	0.19	0.16	0.15
query38	0.16	0.14	0.15
query39	0.04	0.04	0.03
query40	0.18	0.14	0.15
query41	0.10	0.05	0.05
query42	0.05	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 109.53 s
Total hot run time: 30.22 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 78e6ecd485ea94025feb7c77edb97dfc80a883e8 with default session variables
Stream load json:         18 seconds loaded 2358488459 Bytes, about 124 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       13.2 seconds inserted 10000000 Rows, about 757K ops/s

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 14, 2024
@HappenLee HappenLee merged commit 8d773a7 into apache:master Apr 16, 2024
morningman pushed a commit to apache/doris-website that referenced this pull request Apr 17, 2024
…ntersect functions (#560)

- hll_from_base64, hll_to_base64:
master pr: apache/doris#32089

- group_array_intersect:
master pr: apache/doris#33265
morrySnow added a commit to morrySnow/incubator-doris that referenced this pull request Apr 18, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR apache#33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR apache#33265
morrySnow added a commit that referenced this pull request Apr 18, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR #33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR #33265
dataroaring pushed a commit to dataroaring/incubator-doris that referenced this pull request Apr 20, 2024
dataroaring pushed a commit to dataroaring/incubator-doris that referenced this pull request Apr 20, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR apache#33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR apache#33265
cambyzju added a commit to cambyzju/incubator-doris that referenced this pull request Apr 26, 2024
1. MacOS use libhdfs3, so we need call different function.
    this compile error intro by PR apache#33680
2. size_t is not UInt64 on MacOS
    this compile error intro by PR apache#33265
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants