Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Apr 18, 2025

What problem does this PR solve?

Related PR: apache/doris-thirdparty#306
Followup #49835

Problem Summary:
When all row groups are filtered by row group stats, despite stripe stats remaining unfiltered, stream map is not clear, which caused read error data.

ERROR 1105 (HY000): errCode = 2, detailMessage = (172.20.32.136)[INTERNAL_ERROR]Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress. 

Release note

[Fix] (orc-reader-merge-io) Clear streams map when all row groups are filtered by row group stats, despite stripe stats remaining unfiltered。

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…filtered by row group stats, despite stripe stats remaining unf;5Diltered.
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.16% (14426/27137)
Line Coverage 42.02% (124996/297469)
Region Coverage 40.83% (63851/156377)
Branch Coverage 35.46% (32103/90532)

@doris-robot
Copy link

TPC-H: Total hot run time: 35112 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 798d24f0213d2373a106e9f15f3188fa0dedad46, data reload: false

------ Round 1 ----------------------------------
q1	26319	5061	5062	5061
q2	2068	292	185	185
q3	10377	1262	725	725
q4	10226	1041	575	575
q5	7569	2409	2407	2407
q6	196	165	134	134
q7	930	766	619	619
q8	9322	1347	1116	1116
q9	6830	5121	5142	5121
q10	6833	2335	1915	1915
q11	479	285	265	265
q12	356	369	226	226
q13	17779	3673	3098	3098
q14	243	242	215	215
q15	543	491	483	483
q16	453	447	397	397
q17	605	870	354	354
q18	7636	7207	7124	7124
q19	1221	958	566	566
q20	338	330	228	228
q21	4689	3719	3315	3315
q22	1075	1033	983	983
Total cold run time: 116087 ms
Total hot run time: 35112 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5143	5119	5158	5119
q2	242	330	232	232
q3	2185	2668	2290	2290
q4	1459	1931	1489	1489
q5	4460	4467	4422	4422
q6	216	175	129	129
q7	1992	1902	1791	1791
q8	2664	2685	2590	2590
q9	7350	7047	7241	7047
q10	3008	3159	2789	2789
q11	578	533	494	494
q12	699	785	633	633
q13	3511	4011	3368	3368
q14	282	312	268	268
q15	531	475	486	475
q16	480	510	472	472
q17	1200	1519	1409	1409
q18	7807	7683	7522	7522
q19	878	836	962	836
q20	1949	1992	1872	1872
q21	5527	4928	4874	4874
q22	1106	1098	1024	1024
Total cold run time: 53267 ms
Total hot run time: 51145 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191596 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 798d24f0213d2373a106e9f15f3188fa0dedad46, data reload: false

query1	1410	1079	1053	1053
query2	6476	1855	1803	1803
query3	11029	4382	4530	4382
query4	55623	24677	22905	22905
query5	5029	513	449	449
query6	325	192	188	188
query7	4868	493	276	276
query8	303	245	242	242
query9	5649	2534	2511	2511
query10	413	307	260	260
query11	15087	15071	14833	14833
query12	157	113	112	112
query13	1032	496	372	372
query14	10021	6297	6300	6297
query15	191	182	170	170
query16	7158	673	445	445
query17	1073	725	589	589
query18	1618	411	308	308
query19	213	208	164	164
query20	123	121	126	121
query21	208	129	113	113
query22	4460	4433	4328	4328
query23	34309	33393	33297	33297
query24	7109	2413	2442	2413
query25	495	499	460	460
query26	707	284	155	155
query27	2328	511	338	338
query28	3096	2119	2116	2116
query29	597	593	457	457
query30	280	217	194	194
query31	887	880	810	810
query32	78	67	70	67
query33	457	373	320	320
query34	827	883	506	506
query35	798	843	788	788
query36	954	994	910	910
query37	114	105	77	77
query38	4154	4276	4249	4249
query39	1524	1445	1465	1445
query40	212	122	110	110
query41	51	52	52	52
query42	124	106	112	106
query43	494	510	490	490
query44	1403	819	816	816
query45	186	178	166	166
query46	840	1026	636	636
query47	1858	1889	1814	1814
query48	402	422	305	305
query49	677	530	422	422
query50	676	706	405	405
query51	4280	4344	4229	4229
query52	108	103	95	95
query53	229	271	192	192
query54	587	572	516	516
query55	87	82	82	82
query56	305	290	287	287
query57	1201	1188	1106	1106
query58	267	277	253	253
query59	2693	2805	2723	2723
query60	334	319	302	302
query61	136	130	152	130
query62	736	726	701	701
query63	224	188	185	185
query64	1571	1049	755	755
query65	4480	4341	4206	4206
query66	748	403	299	299
query67	15856	15577	15339	15339
query68	7195	896	505	505
query69	535	295	266	266
query70	1212	1086	1113	1086
query71	505	316	318	316
query72	5970	4787	4872	4787
query73	1496	640	338	338
query74	8894	8900	8682	8682
query75	3785	3195	2688	2688
query76	4182	1190	746	746
query77	629	368	286	286
query78	9962	9960	9287	9287
query79	3358	822	554	554
query80	636	530	430	430
query81	474	260	216	216
query82	467	127	94	94
query83	345	246	281	246
query84	288	101	83	83
query85	802	352	314	314
query86	363	324	281	281
query87	4464	4380	4247	4247
query88	3462	2200	2188	2188
query89	398	311	273	273
query90	1951	210	215	210
query91	143	139	115	115
query92	72	64	54	54
query93	2261	934	565	565
query94	679	416	304	304
query95	370	303	282	282
query96	472	558	271	271
query97	3184	3245	3133	3133
query98	223	210	207	207
query99	1472	1432	1297	1297
Total cold run time: 302194 ms
Total hot run time: 191596 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.57 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 798d24f0213d2373a106e9f15f3188fa0dedad46, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.11
query3	0.24	0.19	0.19
query4	1.60	0.18	0.11
query5	0.58	0.55	0.56
query6	1.18	0.72	0.72
query7	0.02	0.01	0.02
query8	0.04	0.03	0.04
query9	0.60	0.52	0.51
query10	0.58	0.56	0.57
query11	0.16	0.11	0.10
query12	0.15	0.12	0.12
query13	0.61	0.59	0.59
query14	1.15	1.16	1.16
query15	0.87	0.84	0.85
query16	0.38	0.39	0.36
query17	1.06	1.03	1.03
query18	0.21	0.19	0.20
query19	1.93	1.80	1.83
query20	0.02	0.01	0.01
query21	15.39	0.93	0.57
query22	0.76	1.08	0.80
query23	14.93	1.35	0.61
query24	7.49	0.75	1.50
query25	0.49	0.25	0.06
query26	0.55	0.16	0.14
query27	0.05	0.05	0.05
query28	9.91	0.86	0.43
query29	12.54	4.02	3.40
query30	0.25	0.09	0.06
query31	2.84	0.59	0.38
query32	3.23	0.54	0.47
query33	3.01	3.06	3.13
query34	15.80	5.12	4.48
query35	4.53	4.50	4.54
query36	0.66	0.50	0.49
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.02	0.02
Total cold run time: 104.46 s
Total hot run time: 29.57 s

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.65% (14564/26651)
Line Coverage 44.09% (130933/296996)
Region Coverage 41.24% (75375/182788)
Branch Coverage 35.36% (36455/103086)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 21, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 47f49ea into apache:master Apr 23, 2025
28 of 30 checks passed
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…filtered by row group stats, despite stripe stats remaining unfiltered. (apache#50185)

### What problem does this PR solve?

Related PR: apache/doris-thirdparty#306

Problem Summary:
When all row groups are filtered by row group stats, despite stripe
stats remaining unfiltered, stream map is not clear, which caused read
error data.

```
ERROR 1105 (HY000): errCode = 2, detailMessage = (172.20.32.136)[INTERNAL_ERROR]Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress. 
```
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 24, 2025
…filtered by row group stats, despite stripe stats remaining unfiltered. (apache#50185)

Related PR: apache/doris-thirdparty#306

Problem Summary:
When all row groups are filtered by row group stats, despite stripe
stats remaining unfiltered, stream map is not clear, which caused read
error data.

```
ERROR 1105 (HY000): errCode = 2, detailMessage = (172.20.32.136)[INTERNAL_ERROR]Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress.
```
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Jun 24, 2025
…filtered by row group stats, despite stripe stats remaining unfiltered. (apache#50185)

Related PR: apache/doris-thirdparty#306

Problem Summary:
When all row groups are filtered by row group stats, despite stripe
stats remaining unfiltered, stream map is not clear, which caused read
error data.

```
ERROR 1105 (HY000): errCode = 2, detailMessage = (172.20.32.136)[INTERNAL_ERROR]Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress.
```
morrySnow pushed a commit that referenced this pull request Jun 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants