Skip to content

Conversation

@zzzxl1993
Copy link
Contributor

@zzzxl1993 zzzxl1993 commented Mar 12, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 12, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zzzxl1993 zzzxl1993 marked this pull request as draft April 14, 2025 02:33
@zzzxl1993 zzzxl1993 force-pushed the 202503121322 branch 2 times, most recently from 75a9b79 to 2dc3954 Compare May 11, 2025 09:08
@zzzxl1993 zzzxl1993 marked this pull request as ready for review May 11, 2025 09:22
gavinchou
gavinchou previously approved these changes May 11, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 11, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label May 23, 2025
Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 26, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds IOContext propagation to various inverted index reader and query components and extends file cache statistics and profiling to separately track inverted-index-specific I/O metrics.

  • Pass io::IOContext through inverted index readers, visitors, and query classes for context-aware I/O.
  • Introduce new fields in FileCacheStatistics and FileCacheProfileReporter for inverted-index local/remote I/O counts, bytes, and timers.
  • Update CachedRemoteFileReader to increment the new inverted-index counters when reading data.

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated no comments.

Show a summary per file
File Description
be/src/olap/rowset/segment_v2/inverted_index_reader.{h,cpp} Add io_ctx parameter paths and store in InvertedIndexVisitor.
be/src/olap/rowset/segment_v2/inverted_index/util/*.h Add io_ctx to ensure_term_* helper signatures.
be/src/olap/rowset/segment_v2/inverted_index/query/*.{h,cpp} Store and use _io_ctx when calling Lucene readers.
be/src/io/io_common.h Define new inverted-index-specific statistics fields.
be/src/io/cache/cached_remote_file_reader.cpp Update inverted-index I/O stats in _update_stats.
be/src/io/cache/block_file_cache_profile.h Add and update profiling counters for inverted-index I/O metrics.
Comments suppressed due to low confidence (3)

be/src/io/cache/block_file_cache_profile.h:129

  • [nitpick] The counter name 'inverted_index_bytes_scanned_from_cache' is inconsistent with the statistic field 'inverted_index_bytes_read_from_local'. Consider renaming one to match the other (e.g., use 'bytes_read_from_cache' or rename the struct field to 'bytes_scanned_from_cache').
inverted_index_bytes_scanned_from_cache = ADD_CHILD_COUNTER_WITH_LEVEL(

be/src/io/cache/block_file_cache_profile.h:159

  • [nitpick] Updating 'inverted_index_bytes_scanned_from_cache' from 'statistics->inverted_index_bytes_read_from_local' highlights the naming mismatch. Aligning these names will reduce confusion when analyzing profiling output.
COUNTER_UPDATE(inverted_index_bytes_scanned_from_cache,

be/src/io/cache/cached_remote_file_reader.cpp:357

  • New inverted-index-specific counters are updated here. Consider adding or updating unit tests to verify that 'inverted_index_num_local_io_total', 'inverted_index_num_remote_io_total', and related byte/timer counters are correctly incremented under both cache hits and misses.
if (is_inverted_index) {

@airborne12
Copy link
Member

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33874 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit eda04a6bcab9d03c95382ed54996412d0fc2be2c, data reload: false

------ Round 1 ----------------------------------
q1	26192	5078	5032	5032
q2	1956	285	183	183
q3	10402	1246	725	725
q4	10230	973	499	499
q5	7661	2403	2312	2312
q6	192	160	131	131
q7	897	754	601	601
q8	9311	1228	1106	1106
q9	6814	5122	5116	5116
q10	6863	2288	1906	1906
q11	482	293	288	288
q12	347	351	217	217
q13	17793	3714	3116	3116
q14	239	239	213	213
q15	574	483	478	478
q16	432	426	390	390
q17	586	861	359	359
q18	7768	7214	7104	7104
q19	1684	955	558	558
q20	331	341	219	219
q21	3707	2558	2335	2335
q22	1019	986	989	986
Total cold run time: 115480 ms
Total hot run time: 33874 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5173	5092	5082	5082
q2	239	318	215	215
q3	2163	2626	2278	2278
q4	1333	1760	1301	1301
q5	4489	4368	4393	4368
q6	221	185	128	128
q7	2072	1915	1789	1789
q8	2579	2682	2558	2558
q9	7207	7230	6898	6898
q10	3080	3231	2770	2770
q11	577	509	498	498
q12	670	757	612	612
q13	3595	3946	3290	3290
q14	299	319	269	269
q15	534	480	487	480
q16	467	482	428	428
q17	1137	1532	1412	1412
q18	7760	7692	7414	7414
q19	835	769	858	769
q20	2008	2062	1850	1850
q21	4860	4493	4440	4440
q22	1110	1062	1005	1005
Total cold run time: 52408 ms
Total hot run time: 49854 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192977 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit eda04a6bcab9d03c95382ed54996412d0fc2be2c, data reload: false

query1	1383	1114	1070	1070
query2	6339	1844	1851	1844
query3	11212	4549	4552	4549
query4	52126	25413	23064	23064
query5	5143	589	472	472
query6	365	218	201	201
query7	4885	519	316	316
query8	285	225	217	217
query9	5337	2647	2653	2647
query10	428	320	270	270
query11	15057	14985	14907	14907
query12	157	109	109	109
query13	1028	540	413	413
query14	10154	6599	6505	6505
query15	214	209	189	189
query16	7132	664	499	499
query17	1087	756	646	646
query18	1563	419	327	327
query19	212	212	182	182
query20	137	132	131	131
query21	211	127	115	115
query22	4185	4341	4239	4239
query23	34244	33681	33669	33669
query24	6525	2478	2447	2447
query25	457	468	396	396
query26	717	278	158	158
query27	2309	507	369	369
query28	3196	2216	2200	2200
query29	587	566	461	461
query30	283	237	194	194
query31	848	860	795	795
query32	80	64	64	64
query33	487	386	333	333
query34	828	891	546	546
query35	808	826	753	753
query36	939	994	912	912
query37	116	111	81	81
query38	4232	4235	4235	4235
query39	1506	1471	1455	1455
query40	214	130	116	116
query41	67	65	58	58
query42	131	112	116	112
query43	511	523	477	477
query44	1393	855	855	855
query45	185	183	176	176
query46	888	1062	657	657
query47	1848	1888	1817	1817
query48	417	440	319	319
query49	671	499	409	409
query50	684	716	433	433
query51	4289	4256	4188	4188
query52	112	118	107	107
query53	247	268	195	195
query54	607	599	531	531
query55	95	95	88	88
query56	311	327	300	300
query57	1218	1229	1113	1113
query58	275	285	272	272
query59	2660	2868	2682	2682
query60	332	330	313	313
query61	157	127	129	127
query62	730	757	698	698
query63	243	203	188	188
query64	1893	1063	712	712
query65	4355	4200	4174	4174
query66	732	416	359	359
query67	15990	15445	15350	15350
query68	6831	923	527	527
query69	537	319	281	281
query70	1123	1148	1087	1087
query71	500	330	309	309
query72	5938	4772	4639	4639
query73	1497	588	356	356
query74	8937	9100	8643	8643
query75	4011	3209	2733	2733
query76	4204	1220	771	771
query77	694	396	312	312
query78	10165	10271	9290	9290
query79	2277	833	575	575
query80	614	530	480	480
query81	469	257	221	221
query82	439	129	97	97
query83	260	296	238	238
query84	285	109	85	85
query85	779	361	319	319
query86	372	301	283	283
query87	4333	4382	4254	4254
query88	3771	2341	2319	2319
query89	405	314	283	283
query90	1828	220	214	214
query91	146	143	121	121
query92	79	58	60	58
query93	1759	987	578	578
query94	661	407	315	315
query95	442	299	295	295
query96	512	578	287	287
query97	2741	2752	2717	2717
query98	240	207	199	199
query99	1386	1438	1285	1285
Total cold run time: 296292 ms
Total hot run time: 192977 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.41 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit eda04a6bcab9d03c95382ed54996412d0fc2be2c, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.10	0.12
query3	0.26	0.19	0.19
query4	1.60	0.19	0.20
query5	0.48	0.46	0.44
query6	1.20	0.67	0.65
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.59	0.51	0.54
query10	0.56	0.59	0.57
query11	0.15	0.12	0.11
query12	0.15	0.12	0.12
query13	0.60	0.60	0.60
query14	0.80	0.81	0.81
query15	0.89	0.85	0.87
query16	0.39	0.37	0.37
query17	1.03	1.04	1.05
query18	0.21	0.21	0.20
query19	1.92	1.82	1.79
query20	0.01	0.01	0.01
query21	15.40	0.90	0.55
query22	0.76	1.17	0.71
query23	14.90	1.40	0.65
query24	6.87	1.32	0.92
query25	0.49	0.14	0.25
query26	0.50	0.16	0.15
query27	0.05	0.05	0.06
query28	9.99	0.93	0.44
query29	12.54	4.10	3.33
query30	0.26	0.10	0.08
query31	2.81	0.62	0.39
query32	3.25	0.57	0.47
query33	3.13	3.20	3.04
query34	15.77	5.13	4.56
query35	4.57	4.63	4.51
query36	0.71	0.50	0.49
query37	0.08	0.06	0.07
query38	0.05	0.03	0.05
query39	0.03	0.02	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 103.55 s
Total hot run time: 29.41 s

@airborne12 airborne12 merged commit 8e975f0 into apache:master Jun 3, 2025
26 of 29 checks passed
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…ted index in file cache scenarios (apache#48950)

Problem Summary:
add io statistics in file cache stats for inverted index
koarz pushed a commit to koarz/doris that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. cloud dev/3.0.7-merged dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants