Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #47846

…7846)

This pull request focuses on improving the handling of null values in
the inverted index writer and simplifying the codebase by removing
redundant null map checks. The most important changes include removing
unnecessary null map handling in several methods and ensuring proper
null bitmap updates.

Improvements to null value handling and code simplification:

*
[`be/src/olap/rowset/segment_v2/column_writer.cpp`](diffhunk://#diff-db6023c6e1df0c3616055f02e769cc20fcef7ee083cb3755cec1b661bb7b42ffL952-L958):
Removed redundant null map handling in `Status
ArrayColumnWriter::append_nullable` method.
*
[`be/src/olap/rowset/segment_v2/inverted_index_writer.cpp`](diffhunk://#diff-97781916b276f771710ab520c79ca29d5e4e331296fad7573fc9933a376dc165L328-R328):
Simplified `add_array_nulls` method to always return `Status::OK()`.
*
[`be/src/olap/rowset/segment_v2/inverted_index_writer.cpp`](diffhunk://#diff-97781916b276f771710ab520c79ca29d5e4e331296fad7573fc9933a376dc165L429-R426):
Added null map check before accessing elements in the loop to prevent
potential null pointer dereference.
[[1]](diffhunk://#diff-97781916b276f771710ab520c79ca29d5e4e331296fad7573fc9933a376dc165L429-R426)
[[2]](diffhunk://#diff-97781916b276f771710ab520c79ca29d5e4e331296fad7573fc9933a376dc165L525-R531)
*
[`be/src/olap/rowset/segment_v2/inverted_index_writer.cpp`](diffhunk://#diff-97781916b276f771710ab520c79ca29d5e4e331296fad7573fc9933a376dc165R513):
Updated `_null_bitmap` in the `add_null_document` method to ensure
proper null bitmap updates.
*
[`be/src/olap/task/index_builder.cpp`](diffhunk://#diff-df38b3b177cd231676ce7a405526b3419c543e29171143ddec02960a84a930c6L645-R645):
Removed redundant null map handling in `Status
IndexBuilder::_add_nullable` method.
@Thearas
Copy link
Contributor

Thearas commented Feb 19, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Feb 19, 2025
@Thearas
Copy link
Contributor

Thearas commented Feb 19, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40931 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d9858284d3e37349e08f6386f11e0e116e31ef08, data reload: false

------ Round 1 ----------------------------------
q1	17599	7509	7293	7293
q2	2067	172	172	172
q3	10582	1082	1224	1082
q4	10572	745	715	715
q5	7740	2923	2891	2891
q6	240	153	151	151
q7	982	615	597	597
q8	9363	1959	2030	1959
q9	6656	6418	6406	6406
q10	7036	2331	2292	2292
q11	472	262	267	262
q12	403	214	212	212
q13	17793	3014	3024	3014
q14	249	222	216	216
q15	575	509	525	509
q16	665	603	574	574
q17	988	546	548	546
q18	7328	6778	6724	6724
q19	1425	1104	1015	1015
q20	497	207	201	201
q21	4083	3207	3121	3121
q22	1124	979	1009	979
Total cold run time: 108439 ms
Total hot run time: 40931 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7338	7213	7246	7213
q2	334	230	242	230
q3	2971	2959	2951	2951
q4	2048	1842	1824	1824
q5	5672	5738	5747	5738
q6	219	140	140	140
q7	2226	1812	1850	1812
q8	3379	3574	3569	3569
q9	8879	8947	8913	8913
q10	3658	3620	3581	3581
q11	594	507	495	495
q12	799	595	625	595
q13	9637	3251	3164	3164
q14	297	298	270	270
q15	582	547	506	506
q16	716	641	658	641
q17	1854	1642	1625	1625
q18	8279	7885	7625	7625
q19	1654	1510	1654	1510
q20	2121	1834	1861	1834
q21	5666	5635	5309	5309
q22	1140	1055	1053	1053
Total cold run time: 70063 ms
Total hot run time: 60598 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196661 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d9858284d3e37349e08f6386f11e0e116e31ef08, data reload: false

query1	1301	943	936	936
query2	6473	2109	2087	2087
query3	10910	4440	4369	4369
query4	67305	29308	23347	23347
query5	4958	457	439	439
query6	424	191	188	188
query7	5661	315	308	308
query8	307	229	221	221
query9	9196	2702	2698	2698
query10	484	271	266	266
query11	17811	15345	15824	15345
query12	163	103	105	103
query13	1567	453	438	438
query14	10025	7295	7560	7295
query15	198	184	184	184
query16	7093	467	535	467
query17	1092	604	607	604
query18	1810	341	331	331
query19	249	157	171	157
query20	119	107	114	107
query21	212	105	105	105
query22	4791	4359	4490	4359
query23	34661	34058	34210	34058
query24	6145	2919	2894	2894
query25	535	423	418	418
query26	658	176	182	176
query27	2066	374	362	362
query28	4386	2529	2466	2466
query29	687	437	419	419
query30	246	161	160	160
query31	1007	837	839	837
query32	72	57	58	57
query33	424	288	287	287
query34	926	502	514	502
query35	841	721	726	721
query36	1101	963	984	963
query37	124	76	71	71
query38	4194	4075	3972	3972
query39	1509	1496	1459	1459
query40	210	100	98	98
query41	50	47	47	47
query42	111	103	106	103
query43	545	517	502	502
query44	1221	866	849	849
query45	188	168	165	165
query46	1159	741	727	727
query47	2038	1997	1986	1986
query48	473	386	392	386
query49	747	394	396	394
query50	840	437	429	429
query51	7327	7243	7104	7104
query52	100	86	89	86
query53	261	184	181	181
query54	558	451	456	451
query55	80	80	73	73
query56	249	244	244	244
query57	1262	1127	1127	1127
query58	229	212	201	201
query59	3138	2929	2964	2929
query60	279	248	259	248
query61	108	106	108	106
query62	834	727	701	701
query63	220	185	187	185
query64	1376	692	647	647
query65	3312	3184	3212	3184
query66	654	295	323	295
query67	16117	15861	15629	15629
query68	4234	596	574	574
query69	408	265	262	262
query70	1207	1080	1081	1080
query71	345	259	258	258
query72	6474	2511	3786	2511
query73	754	358	351	351
query74	9904	9172	9241	9172
query75	3332	2649	2643	2643
query76	1846	1083	1142	1083
query77	497	296	282	282
query78	10564	9550	9634	9550
query79	1858	606	606	606
query80	1432	429	417	417
query81	533	248	237	237
query82	1253	117	118	117
query83	175	155	146	146
query84	276	83	80	80
query85	1006	304	298	298
query86	427	303	294	294
query87	4525	4209	4232	4209
query88	3928	2418	2415	2415
query89	428	295	288	288
query90	1838	188	187	187
query91	181	149	171	149
query92	66	51	52	51
query93	1986	558	555	555
query94	773	307	291	291
query95	351	257	260	257
query96	623	284	284	284
query97	3328	3168	3260	3168
query98	212	208	195	195
query99	1704	1403	1398	1398
Total cold run time: 321885 ms
Total hot run time: 196661 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.43 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d9858284d3e37349e08f6386f11e0e116e31ef08, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.03
query3	0.23	0.07	0.07
query4	1.62	0.10	0.10
query5	0.54	0.51	0.51
query6	1.14	0.73	0.72
query7	0.02	0.02	0.02
query8	0.04	0.03	0.04
query9	0.57	0.52	0.49
query10	0.56	0.55	0.56
query11	0.14	0.10	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.59
query14	2.72	2.82	2.72
query15	0.91	0.83	0.84
query16	0.38	0.38	0.37
query17	0.97	1.06	1.01
query18	0.23	0.22	0.22
query19	1.96	1.87	2.05
query20	0.02	0.01	0.01
query21	15.37	0.59	0.59
query22	2.90	3.06	1.84
query23	17.14	0.84	0.90
query24	3.34	1.46	1.74
query25	0.20	0.42	0.08
query26	0.43	0.14	0.14
query27	0.04	0.03	0.04
query28	9.60	1.11	1.07
query29	12.59	3.22	3.24
query30	0.24	0.05	0.05
query31	2.87	0.40	0.39
query32	3.25	0.47	0.46
query33	3.03	2.99	3.00
query34	17.08	4.52	4.60
query35	4.54	4.56	4.58
query36	0.68	0.48	0.51
query37	0.09	0.06	0.07
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.15	0.13	0.12
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 106.7 s
Total hot run time: 33.43 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 232f310 into branch-3.0 Feb 24, 2025
21 of 24 checks passed
@github-actions github-actions bot deleted the auto-pick-47846-branch-3.0 branch February 24, 2025 03:30
@gavinchou gavinchou mentioned this pull request Apr 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants