ct/l1: add shard-local footer cache#30024
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a shard-local LRU cache for parsed L1 object footers (keyed by l1::object_id) to avoid repeated footer IO/parsing across read_some() calls, wiring it into both the standard cloud-topics read path and the read-replica path (while leaving compaction readers to bypass it).
Changes:
- Add
cloud_topics::l1_footer_cache(LRU of parsed L1 footers) and Bazel targets for it. - Thread the footer cache through
level_one_log_reader_impland the cloud-topics frontend/read-replica construction paths. - Extend frontend reader tests/fixtures to include the footer cache and add an eviction test.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/v/kafka/data/cloud_topic_read_replica.h | Adds footer-cache pointer to read-replica partition proxy. |
| src/v/kafka/data/cloud_topic_read_replica.cc | Passes shard-local footer cache into L1 reader creation for read replicas. |
| src/v/cloud_topics/state_accessors.h | Plumbs shard-local footer cache through state accessors. |
| src/v/cloud_topics/level_one/frontend_reader/tests/reader_test.cc | Adds footer-cache eviction test. |
| src/v/cloud_topics/level_one/frontend_reader/tests/l1_reader_fixture.h | Updates test fixture to own/stop a footer cache and pass it to readers. |
| src/v/cloud_topics/level_one/frontend_reader/tests/BUILD | Adds footer-cache dependency for tests. |
| src/v/cloud_topics/level_one/frontend_reader/level_one_reader.h | Extends level_one_log_reader_impl ctor to accept an optional footer cache. |
| src/v/cloud_topics/level_one/frontend_reader/level_one_reader.cc | Uses footer cache in read_footer() (read-through, then populate on miss). |
| src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.h | New LRU cache API/type for footers. |
| src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.cc | New LRU cache implementation (map + intrusive LRU list). |
| src/v/cloud_topics/level_one/frontend_reader/BUILD | Adds new l1_footer_cache library and wires it into reader deps. |
| src/v/cloud_topics/frontend/frontend.cc | Passes shard-local footer cache into standard L1 reader creation. |
| src/v/cloud_topics/BUILD | Adds footer-cache lib dependency to cloud-topics target. |
| src/v/cloud_topics/app.h | Adds sharded l1_footer_cache service member. |
| src/v/cloud_topics/app.cc | Constructs footer-cache shard service and passes it into state_accessors. |
CI test resultstest results on build#82587
test results on build#83822
test results on build#83856
|
| @@ -398,6 +398,7 @@ std::unique_ptr<model::record_batch_reader::impl> frontend::make_l1_reader( | |||
| auto l1_io = ct_state->local().get_l1_io(); | |||
There was a problem hiding this comment.
Do you have empirical numbers that show how effective the footer cache is? I'd like to understand the ballpark improvement before merging
There was a problem hiding this comment.
I don't really expect a noticeable improvement from this. It's useful when reading the same object from many partitions on the same shard, which happens but is marginal. However, it is very simple to add, so thought it was worth it.
There was a problem hiding this comment.
I guess it may also be helpful for if there are multiple/repeated readers of the same partition. Like maybe iceberg, shadow clusters, compaction, and fetches running concurrently? If that's the case, might be interesting to see if there's a meaningful tail latency improvement for fetches.
d406c10 to
5eb1ec0
Compare
Each read_some() call that opens a new L1 object must parse the object footer via IO. Since footers are immutable once written, cache them shard-locally keyed on object_id. Footers are stored as shared_ptr<const l1::footer> so cache hits are zero-copy. The cache is backed by s3-fifo (via chunked_kv_cache) which avoids scan pollution: one-time footer reads are evicted without displacing frequently-accessed entries. Wired into the standard read path and read replicas, but not compaction readers (less latency-sensitive; bypassing saves slots for others). Capacity is tunable via cloud_topics_l1_footer_cache_max_size (default 512 per shard).
5eb1ec0 to
918e642
Compare
Each read_some() call that opens a new L1 object must parse the footer via IO. Since footers are immutable once written, cache them in a shard-local LRU cache keyed on object_id to avoid repeated reads. This is especially useful because L1 objects should contain partition data from the same topic inserted around the same time.
The footer cache is used by the standard read path and by read replicas. It's not used by compaction readers: they are not as latency sensitive, and bypassing the cache saves slots for others.
Backports Required
Release Notes