ct/l1: add shard-local footer cache by wdberkeley · Pull Request #30024 · redpanda-data/redpanda

wdberkeley · 2026-04-01T00:15:12Z

Each read_some() call that opens a new L1 object must parse the footer via IO. Since footers are immutable once written, cache them in a shard-local LRU cache keyed on object_id to avoid repeated reads. This is especially useful because L1 objects should contain partition data from the same topic inserted around the same time.

The footer cache is used by the standard read path and by read replicas. It's not used by compaction readers: they are not as latency sensitive, and bypassing the cache saves slots for others.

Backports Required

Release Notes

none

Copilot

Pull request overview

This PR introduces a shard-local LRU cache for parsed L1 object footers (keyed by l1::object_id) to avoid repeated footer IO/parsing across read_some() calls, wiring it into both the standard cloud-topics read path and the read-replica path (while leaving compaction readers to bypass it).

Changes:

Add cloud_topics::l1_footer_cache (LRU of parsed L1 footers) and Bazel targets for it.
Thread the footer cache through level_one_log_reader_impl and the cloud-topics frontend/read-replica construction paths.
Extend frontend reader tests/fixtures to include the footer cache and add an eviction test.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/v/kafka/data/cloud_topic_read_replica.h	Adds footer-cache pointer to read-replica partition proxy.
src/v/kafka/data/cloud_topic_read_replica.cc	Passes shard-local footer cache into L1 reader creation for read replicas.
src/v/cloud_topics/state_accessors.h	Plumbs shard-local footer cache through state accessors.
src/v/cloud_topics/level_one/frontend_reader/tests/reader_test.cc	Adds footer-cache eviction test.
src/v/cloud_topics/level_one/frontend_reader/tests/l1_reader_fixture.h	Updates test fixture to own/stop a footer cache and pass it to readers.
src/v/cloud_topics/level_one/frontend_reader/tests/BUILD	Adds footer-cache dependency for tests.
src/v/cloud_topics/level_one/frontend_reader/level_one_reader.h	Extends `level_one_log_reader_impl` ctor to accept an optional footer cache.
src/v/cloud_topics/level_one/frontend_reader/level_one_reader.cc	Uses footer cache in `read_footer()` (read-through, then populate on miss).
src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.h	New LRU cache API/type for footers.
src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.cc	New LRU cache implementation (map + intrusive LRU list).
src/v/cloud_topics/level_one/frontend_reader/BUILD	Adds new `l1_footer_cache` library and wires it into reader deps.
src/v/cloud_topics/frontend/frontend.cc	Passes shard-local footer cache into standard L1 reader creation.
src/v/cloud_topics/BUILD	Adds footer-cache lib dependency to cloud-topics target.
src/v/cloud_topics/app.h	Adds sharded `l1_footer_cache` service member.
src/v/cloud_topics/app.cc	Constructs footer-cache shard service and passes it into `state_accessors`.

vbotbuildovich · 2026-04-01T01:48:32Z

CI test results

test results on build#82587

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
NodePostRestartProbeTest	post_restart_probe_test	null	integration	https://buildkite.com/redpanda/redpanda/builds/82587#019d467d-ba26-401a-a495-3e6ed73d614f	FLAKY	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0052, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=NodePostRestartProbeTest&test_method=post_restart_probe_test

test results on build#83822

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	AccessControlListTestUpgrade	test_describe_acls	{"authn_method": "none", "client_auth": false, "enable_authz": true, "use_sasl": true, "use_tls": false}	integration	https://buildkite.com/redpanda/redpanda/builds/83822#019dda5d-9d8a-45e6-9a9e-dcd9d55013dc	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=AccessControlListTestUpgrade&test_method=test_describe_acls
FLAKY(PASS)	ShadowLinkingReplicationTests	test_replication_basic	{"shuffle_leadership": false, "source_cluster_spec": {"cluster_type": "kafka", "kafka_quorum": "COMBINED_KRAFT", "kafka_version": "3.8.0"}, "storage_mode": "cloud"}	integration	https://buildkite.com/redpanda/redpanda/builds/83822#019dda5d-9d8a-45e6-9a9e-dcd9d55013dc	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0000, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_replication_basic
FLAKY(PASS)	WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/83822#019dda5d-9d8c-4590-9f48-37ee79e79671	16/21	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0971, p0=0.1229, reject_threshold=0.0100. adj_baseline=0.2640, p1=0.3598, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

test results on build#83856

test_status	test_class	test_method	test_arguments	test_kind	job_url	passed	reason	test_history
FLAKY(PASS)	ShadowLinkingReplicationTests	test_auto_prefix_trimming	{"source_cluster_spec": {"cluster_type": "redpanda"}, "storage_mode": "cloud", "with_failures": true}	integration	https://buildkite.com/redpanda/redpanda/builds/83856#019ddb8a-cc20-48db-a3f4-306930538733	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0007, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_auto_prefix_trimming
FLAKY(PASS)	ShadowLinkingReplicationTests	test_with_restart	{"storage_mode": "tiered_cloud"}	integration	https://buildkite.com/redpanda/redpanda/builds/83856#019ddb8a-cc1d-49ea-9474-708f656a3ec1	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0135, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=ShadowLinkingReplicationTests&test_method=test_with_restart
FLAKY(PASS)	PartitionMoveInterruption	test_cancelling_partition_move	{"compacted": true, "force_back": true, "replication_factor": 3, "unclean_abort": true}	integration	https://buildkite.com/redpanda/redpanda/builds/83856#019ddb88-328e-4ed1-97d5-609aed3eda96	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0002, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.1000, p1=0.3487, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=PartitionMoveInterruption&test_method=test_cancelling_partition_move
FLAKY(PASS)	WriteCachingFailureInjectionE2ETest	test_crash_all	{"use_transactions": false}	integration	https://buildkite.com/redpanda/redpanda/builds/83856#019ddb8a-cc1f-424d-9bb5-2063556eb59f	10/11	Test PASSES after retries.No significant increase in flaky rate(baseline=0.0948, p0=1.0000, reject_threshold=0.0100. adj_baseline=0.2582, p1=0.0504, trust_threshold=0.5000)	https://redpanda.metabaseapp.com/dashboard/87-tests?tab=142-dt-individual-test-history&test_class=WriteCachingFailureInjectionE2ETest&test_method=test_crash_all

andrwng · 2026-04-01T17:45:07Z

@@ -398,6 +398,7 @@ std::unique_ptr<model::record_batch_reader::impl> frontend::make_l1_reader(
    auto l1_io = ct_state->local().get_l1_io();


Do you have empirical numbers that show how effective the footer cache is? I'd like to understand the ballpark improvement before merging

I don't really expect a noticeable improvement from this. It's useful when reading the same object from many partitions on the same shard, which happens but is marginal. However, it is very simple to add, so thought it was worth it.

I guess it may also be helpful for if there are multiple/repeated readers of the same partition. Like maybe iceberg, shadow clusters, compaction, and fetches running concurrently? If that's the case, might be interesting to see if there's a meaningful tail latency improvement for fetches.

Each read_some() call that opens a new L1 object must parse the object footer via IO. Since footers are immutable once written, cache them shard-locally keyed on object_id. Footers are stored as shared_ptr<const l1::footer> so cache hits are zero-copy. The cache is backed by s3-fifo (via chunked_kv_cache) which avoids scan pollution: one-time footer reads are evicted without displacing frequently-accessed entries. Wired into the standard read path and read replicas, but not compaction readers (less latency-sensitive; bypassing saves slots for others). Capacity is tunable via cloud_topics_l1_footer_cache_max_size (default 512 per shard).

wdberkeley requested review from andrwng and Copilot April 1, 2026 00:15

github-actions Bot added area/build area/redpanda labels Apr 1, 2026

Copilot started reviewing on behalf of wdberkeley April 1, 2026 00:15 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Comment thread src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.h

Comment thread src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.cc Outdated

andrwng reviewed Apr 1, 2026

View reviewed changes

andrwng requested a review from Lazin April 1, 2026 17:45

wdberkeley force-pushed the l1-footer-cache branch from d406c10 to 5eb1ec0 Compare April 29, 2026 17:19

wdberkeley requested a review from a team as a code owner April 29, 2026 17:19

wdberkeley requested review from andrwng and ballard26 April 29, 2026 17:20

andrwng reviewed Apr 29, 2026

View reviewed changes

Comment thread src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.h Outdated

andrwng reviewed Apr 29, 2026

View reviewed changes

Comment thread src/v/cloud_topics/level_one/frontend_reader/l1_footer_cache.cc Outdated

andrwng previously approved these changes Apr 29, 2026

View reviewed changes

wdberkeley dismissed andrwng’s stale review via 918e642 April 29, 2026 22:55

wdberkeley force-pushed the l1-footer-cache branch from 5eb1ec0 to 918e642 Compare April 29, 2026 22:55

wdberkeley requested a review from andrwng April 29, 2026 22:55

andrwng approved these changes Apr 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ct/l1: add shard-local footer cache#30024

ct/l1: add shard-local footer cache#30024
wdberkeley wants to merge 1 commit intodevfrom
l1-footer-cache

wdberkeley commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

vbotbuildovich commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrwng Apr 1, 2026

Uh oh!

wdberkeley Apr 29, 2026

Uh oh!

andrwng Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		@@ -398,6 +398,7 @@ std::unique_ptr<model::record_batch_reader::impl> frontend::make_l1_reader(
		auto l1_io = ct_state->local().get_l1_io();

Conversation

wdberkeley commented Apr 1, 2026

Backports Required

Release Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

vbotbuildovich commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrwng Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

wdberkeley Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

andrwng Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vbotbuildovich commented Apr 1, 2026 •

edited

Loading