Skip to content

Conversation

@AshinGau
Copy link
Member

@AshinGau AshinGau commented May 14, 2024

Proposed changes

If an iceberg table has equality delete files, the data file should read all the delete columns, which are used to delete rows by comparing the column values. After filtering the data block, the output block should remove the added delete columns to match the right output slots.
Fix errors like:

mysql> select count(*) from customer_flink_three_orc;
ERROR 1105 (HY000): errCode = 2, detailMessage = (xxx)[CANCELLED][INTERNAL_ERROR] cur path:
hdfs://xxx:4007/usr/hive/warehouse/hadoop_catalog/multi_catalog/customer_flink_three_orc/data/xxx.orc. 
Can't find the delete column 'c_name' in data file

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@AshinGau
Copy link
Member Author

run buildall

@AshinGau AshinGau changed the title [fix](iceberg) read the primary key columns if hasing equality delete [fix](iceberg) read all delete columns if hasing equality delete May 14, 2024
@AshinGau AshinGau changed the title [fix](iceberg) read all delete columns if hasing equality delete [fix](iceberg) read all delete columns if having equality delete May 14, 2024
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.66% (8984/25192)
Line Coverage: 27.32% (74273/271818)
Region Coverage: 26.56% (38376/144507)
Branch Coverage: 23.38% (19575/83732)
Coverage Report: http://coverage.selectdb-in.cc/coverage/544a3ddb479ed3af6a0950f56991f5876ecce7cd_544a3ddb479ed3af6a0950f56991f5876ecce7cd/report/index.html

@AshinGau AshinGau force-pushed the fix_iceberg_equality branch from 544a3dd to c1b23ba Compare May 15, 2024 00:17
@AshinGau
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.66% (8984/25194)
Line Coverage: 27.33% (74276/271824)
Region Coverage: 26.56% (38386/144522)
Branch Coverage: 23.38% (19577/83746)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c1b23bae73a8ce8eb45b6381cbc1d78539fca6fb_c1b23bae73a8ce8eb45b6381cbc1d78539fca6fb/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41963 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c1b23bae73a8ce8eb45b6381cbc1d78539fca6fb, data reload: false

------ Round 1 ----------------------------------
q1	17608	4373	4219	4219
q2	2014	185	190	185
q3	10464	1289	1170	1170
q4	10199	834	829	829
q5	7482	2768	2707	2707
q6	230	133	133	133
q7	1016	602	589	589
q8	9233	2154	2097	2097
q9	9309	6709	6710	6709
q10	9549	3899	3946	3899
q11	434	239	253	239
q12	480	217	223	217
q13	18294	3216	3222	3216
q14	253	214	221	214
q15	511	463	470	463
q16	506	402	407	402
q17	982	741	746	741
q18	8374	7821	7796	7796
q19	4256	1584	1527	1527
q20	654	318	339	318
q21	5281	4269	4011	4011
q22	367	296	282	282
Total cold run time: 117496 ms
Total hot run time: 41963 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4523	4422	4393	4393
q2	369	267	263	263
q3	3111	2960	2717	2717
q4	1879	1636	1643	1636
q5	5519	5527	5517	5517
q6	213	126	124	124
q7	2348	1984	2002	1984
q8	3237	3416	3385	3385
q9	8691	8648	8591	8591
q10	3949	3889	3858	3858
q11	604	498	519	498
q12	792	612	633	612
q13	15803	3203	3217	3203
q14	303	284	284	284
q15	520	469	471	469
q16	486	431	403	403
q17	1763	1515	1469	1469
q18	7645	7652	7558	7558
q19	1665	1486	1586	1486
q20	2015	1796	1742	1742
q21	10906	4890	4850	4850
q22	606	486	512	486
Total cold run time: 76947 ms
Total hot run time: 55528 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186965 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c1b23bae73a8ce8eb45b6381cbc1d78539fca6fb, data reload: false

query1	901	387	363	363
query2	6447	2451	2295	2295
query3	6664	204	209	204
query4	24443	21237	21267	21237
query5	4167	417	423	417
query6	261	177	186	177
query7	4594	300	296	296
query8	240	190	195	190
query9	8598	2485	2462	2462
query10	437	251	255	251
query11	14700	14173	14151	14151
query12	137	96	92	92
query13	1643	382	383	382
query14	10422	7706	7803	7706
query15	224	171	178	171
query16	7688	263	260	260
query17	1778	542	539	539
query18	1942	272	267	267
query19	197	148	148	148
query20	99	91	99	91
query21	197	132	131	131
query22	5054	4777	4817	4777
query23	34443	33385	33418	33385
query24	8385	2860	2945	2860
query25	605	391	403	391
query26	713	160	157	157
query27	2193	321	337	321
query28	5609	2083	2064	2064
query29	856	622	596	596
query30	230	179	179	179
query31	964	766	760	760
query32	93	61	55	55
query33	588	275	249	249
query34	896	485	475	475
query35	776	688	675	675
query36	1091	942	899	899
query37	110	73	69	69
query38	2925	2757	2704	2704
query39	1602	1574	1557	1557
query40	208	129	123	123
query41	44	42	43	42
query42	102	96	96	96
query43	564	568	530	530
query44	1095	734	749	734
query45	269	251	256	251
query46	1068	739	715	715
query47	1939	1844	1850	1844
query48	374	312	302	302
query49	847	395	392	392
query50	780	383	389	383
query51	6793	6619	6604	6604
query52	104	91	91	91
query53	352	293	291	291
query54	620	428	445	428
query55	76	71	71	71
query56	239	225	214	214
query57	1202	1144	1183	1144
query58	211	200	227	200
query59	3534	3140	3092	3092
query60	248	236	247	236
query61	97	97	83	83
query62	623	474	474	474
query63	318	285	289	285
query64	8488	7419	7396	7396
query65	3176	3132	3107	3107
query66	773	356	339	339
query67	15416	15763	14911	14911
query68	4522	538	535	535
query69	470	300	309	300
query70	1214	1180	1129	1129
query71	371	274	263	263
query72	7182	2522	2375	2375
query73	702	329	323	323
query74	6615	6046	6086	6046
query75	3322	2654	2575	2575
query76	2238	981	1022	981
query77	398	260	266	260
query78	10577	10132	10069	10069
query79	2604	517	518	517
query80	1076	454	440	440
query81	544	241	245	241
query82	812	103	99	99
query83	248	165	163	163
query84	248	88	84	84
query85	1392	273	339	273
query86	454	313	316	313
query87	3292	3118	3081	3081
query88	3990	2448	2456	2448
query89	488	390	397	390
query90	2036	184	186	184
query91	122	94	95	94
query92	57	48	48	48
query93	1881	527	497	497
query94	1249	185	183	183
query95	395	301	296	296
query96	603	277	273	273
query97	3136	2991	3003	2991
query98	237	216	218	216
query99	1164	911	897	897
Total cold run time: 276111 ms
Total hot run time: 186965 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c1b23bae73a8ce8eb45b6381cbc1d78539fca6fb, data reload: false

query1	0.04	0.03	0.04
query2	0.09	0.03	0.04
query3	0.22	0.06	0.05
query4	1.67	0.09	0.09
query5	0.48	0.47	0.50
query6	1.12	0.73	0.72
query7	0.01	0.01	0.01
query8	0.04	0.04	0.05
query9	0.54	0.50	0.48
query10	0.54	0.55	0.54
query11	0.15	0.11	0.11
query12	0.15	0.12	0.12
query13	0.60	0.59	0.60
query14	0.77	0.78	0.78
query15	0.82	0.80	0.80
query16	0.36	0.36	0.36
query17	0.96	0.94	1.00
query18	0.22	0.26	0.24
query19	1.79	1.72	1.73
query20	0.01	0.01	0.01
query21	15.49	0.67	0.64
query22	3.98	7.84	1.75
query23	18.30	1.47	1.38
query24	1.35	0.27	0.26
query25	0.16	0.08	0.08
query26	0.26	0.16	0.16
query27	0.07	0.08	0.07
query28	13.56	1.03	0.99
query29	13.66	3.34	3.34
query30	0.24	0.05	0.07
query31	2.85	0.38	0.38
query32	3.30	0.47	0.46
query33	2.81	2.79	2.86
query34	17.14	4.37	4.41
query35	4.53	4.42	4.50
query36	0.68	0.47	0.50
query37	0.18	0.15	0.15
query38	0.15	0.14	0.14
query39	0.04	0.04	0.04
query40	0.16	0.14	0.15
query41	0.09	0.04	0.05
query42	0.05	0.05	0.04
query43	0.05	0.04	0.04
Total cold run time: 109.68 s
Total hot run time: 30.33 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 15, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@hubgeter hubgeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.3-merged dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants