feat(bedrock): add automatic prompt caching support #1438

kevmyung · 2026-01-08T22:14:08Z

Summary

Add CacheConfig with strategy="auto" for automatic prompt caching in BedrockModel
Cache points are injected at the end of the last assistant message before each model call
Supports all Claude models on Bedrock that have prompt caching capability

Usage

from strands import Agent
from strands.models import BedrockModel, CacheConfig

model = BedrockModel(
    model_id="us.anthropic.claude-sonnet-4-5-20250929-v1:0",
    cache_config=CacheConfig(strategy="auto")
)
agent = Agent(model=model)

Test plan

Unit tests for cache point injection logic
Integration test with Claude models on Bedrock confirming cache hits

Closes #1432

src/strands/models/model.py

src/strands/models/bedrock.py

Unshure · 2026-01-09T15:46:23Z

src/strands/models/bedrock.py

+        This enables prompt caching for multi-turn conversations by placing a single
+        cache point that covers system prompt, tools, and conversation history.


Is this true? Tools and system prompts both have their own cache points defined in the converse stream model. I thought this was just to add automatic caching for the messages array

You're right that tools and system prompts have their own cache point options. The key insight is that Anthropic sends prompts in this order: tools → system → messages. (Link) When a cachePoint is placed at the end of the last assistant message, the cached prefix automatically includes everything before it (system prompt + tools + prior conversation). So a single cachePoint in messages effectively caches the entire context without needing separate cache points for system/tools.

That said, you can still place explicit cache points on system/tools as a fallback for cases like sliding window truncation where message history changes might cause cache misses

Got it, your explanation, and the the FAQ on this page helped me understand this: https://platform.claude.com/docs/en/build-with-claude/prompt-caching#faq

That being said, this comment is a bit misleading, and I would instead give a bit more of a general comment here.

Additionally, we will need to update our documentation to reflect the new caching behavior here. Would you be interested in making that update too? https://github.com/strands-agents/docs/blob/main/docs/user-guide/concepts/model-providers/amazon-bedrock.md?plain=1#L418

For the docs update, I'll create a separate PR after this one is merged.

Unshure · 2026-01-09T15:46:59Z

src/strands/models/bedrock.py

+        This enables prompt caching for multi-turn conversations by placing a single
+        cache point that covers system prompt, tools, and conversation history.
+
+        The cache point is automatically moved to the latest assistant message on each


Mentioned in a previous comment, but how does this compare to the simplified caching mentioned in the docs here: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-simplified

This implementation actually relies on that simplified caching. With simplified caching, we only need to move the cachePoint to the end of the last assistant message on each turn. Anthropic automatically matches the overlapping prefix between the previous cachePoint position and the new one. That's also why we're limiting this strategy to Claude models until other providers support it.

Awesome, thanks for the explanation! I would love for this cache_config to be applicable to both anthropic and nova models, and we can change the strategy based on the modelid. Thats out of scope for this pr, but just something to keep in mind!

Agreed! Nova models could also be supported, though the cache checkpoint implementation would be more complex than Claude's.

src/strands/models/bedrock.py

tests/strands/models/test_bedrock.py

azaylamba · 2026-01-12T14:24:50Z

This is an important change, thanks for this.

src/strands/models/bedrock.py

Unshure · 2026-01-12T23:48:55Z

src/strands/models/bedrock.py

+        This enables prompt caching for multi-turn conversations by placing a single
+        cache point that covers system prompt, tools, and conversation history.


Got it, your explanation, and the the FAQ on this page helped me understand this: https://platform.claude.com/docs/en/build-with-claude/prompt-caching#faq

That being said, this comment is a bit misleading, and I would instead give a bit more of a general comment here.

Additionally, we will need to update our documentation to reflect the new caching behavior here. Would you be interested in making that update too? https://github.com/strands-agents/docs/blob/main/docs/user-guide/concepts/model-providers/amazon-bedrock.md?plain=1#L418

Unshure · 2026-01-12T23:50:50Z

src/strands/models/bedrock.py

+        This enables prompt caching for multi-turn conversations by placing a single
+        cache point that covers system prompt, tools, and conversation history.
+
+        The cache point is automatically moved to the latest assistant message on each


Awesome, thanks for the explanation! I would love for this cache_config to be applicable to both anthropic and nova models, and we can change the strategy based on the modelid. Thats out of scope for this pr, but just something to keep in mind!

src/strands/models/bedrock.py

tests/strands/agent/test_agent.py

tests/strands/models/test_bedrock.py

src/strands/models/bedrock.py

strands-agent · 2026-01-13T12:04:29Z

🎯 Review - Automatic Prompt Caching

Excellent implementation of automatic prompt caching for Bedrock! This addresses #1432 nicely and will provide significant performance and cost benefits for multi-turn conversations.

What I Really Like ✅

Smart Cache Point Strategy: The _inject_cache_point() logic is elegant - automatically moving the cache point to the last assistant message ensures optimal cache utilization without manual management.
Comprehensive Tests: 155 lines of new tests in test_bedrock.py covering edge cases like:
- Cache point injection
- Cache point movement across turns
- Cleanup of stale cache points
Backward Compatibility: Deprecating cache_prompt while still supporting it shows good API stewardship.
Documentation: Clear docstrings and usage examples in the PR description.

Minor Suggestions 💡

1. Cache Point Detection Could Be More Explicit

In _inject_cache_point() around line 344, the logic for detecting an existing cache point at the right position relies on the loop continuing. Consider making this more explicit:

# Check if cache point was already found at the right position
last_assistant_content = messages[last_assistant_idx]["content"]
if last_assistant_content and "cachePoint" in last_assistant_content[-1]:
    logger.debug(f"Cache point already exists at end of last assistant message {last_assistant_idx}")
    return

# Add cache point at the end of the last assistant message
last_assistant_content.append({"cachePoint": {"type": "default"}})
logger.debug(f"Added cache point at end of assistant message {last_assistant_idx}")

2. Model Support Detection

The supports_caching property checks for "claude" or "anthropic" in the model ID. Consider if future Bedrock models might support caching:

@property
def supports_caching(self) -> bool:
    """Whether this model supports prompt caching.
    
    Returns True for Claude models on Bedrock that support caching.
    Add other models as they become available.
    """
    model_id = self.config.get("model_id", "").lower()
    # Claude 3 Opus, Sonnet, and Haiku on Bedrock support caching
    # Claude 3.5 Sonnet and later also support caching
    return "claude" in model_id or "anthropic" in model_id

3. Integration Test Clarity

The integration tests are great! One suggestion - add a comment explaining the cache hit verification:

# After second call, verify cache hit (cache_read_input_tokens > 0)
# This confirms the cache point strategy is working
assert result.metadata["converse_metrics"]["cache_read_input_tokens"] > 0

Questions for Discussion 🤔

Cache Invalidation: What happens if the system prompt or tools change between calls? Does the cache automatically invalidate, or should there be explicit cache busting?
Multiple Cache Points: The current strategy uses a single cache point. Are there scenarios where multiple cache points (e.g., one for system prompt, one for conversation) would be beneficial?
Performance Metrics: Would it be valuable to expose cache hit/miss metrics in the AgentResult.metadata for users to monitor cache effectiveness?

CI Status

I see CI is still pending. Once it passes, this looks ready for maintainer review!

Overall Assessment

This is a high-quality PR that will be valuable for the community. The automatic cache management removes complexity from users while providing real performance benefits. Great work, @kevmyung! 🎉

🦆

🤖 This is an experimental AI agent response from the Strands team, powered by Strands Agents. We're exploring how AI agents can help with community support and development. Your feedback helps us improve! If you'd prefer human assistance, please let us know.

src/strands/models/bedrock.py

Unshure · 2026-01-20T22:38:22Z

src/strands/models/bedrock.py

+                    self.config.get("model_id"),
+                )
+
        cleaned_messages: list[dict[str, Any]] = []


I just ran a test on this code, and it looks like cachePoint is being injected into the agent messages array, which is not what we want.

I have a python script:

from strands import Agent from strands.models import BedrockModel from strands.models.model import CacheConfig agent = Agent(model=BedrockModel(cache_config=CacheConfig(strategy="auto"))) agent("Hello!") agent("Hello2") print("\n") print(agent.messages)

And when I run it I get this output:

(strands-agents) ➜ sdk-python git:(feat/prompt-caching) ✗ python test.py Hello! It's nice to meet you. How are you doing today? Is there anything I can help you with?Hello again! I see you're saying hello for a second time - Hello2! 😊 How can I assist you today? [{'role': 'user', 'content': [{'text': 'Hello!'}]}, {'role': 'assistant', 'content': [{'text': "Hello! It's nice to meet you. How are you doing today? Is there anything I can help you with?"}, {'cachePoint': {'type': 'default'}}]}, {'role': 'user', 'content': [{'text': 'Hello2'}]}, {'role': 'assistant', 'content': [{'text': "Hello again! I see you're saying hello for a second time - Hello2! 😊 How can I assist you today?"}]}]

You can see above that when I print out agent.messages, the cache point does show up in the agents original messages.

It might make sense to move your _inject_cache_point function logic to below this cleaned_messages, as this acts as a "clean" copy of the messages that we can edit in the bedrock model provider so we dont inject information into the agent messages.

Fixed. Now injecting into cleaned_messages

Unshure · 2026-01-20T22:39:14Z

tests/strands/agent/test_agent.py

    assert len(agent.messages) == original_length + 2
+
+
+def test_cache_config_does_not_mutate_original_messages(mock_model, agenerator):


given my above comment, it looks like this test is not working as intended. I would update it so that it first fails with the current code, then update your code so you dont inject cachePoint into the messages array so that this test starts passing.

Unshure · 2026-01-20T22:40:08Z

tests/strands/models/test_bedrock.py

One more test in this file where the messages array that is passed into the stream method does not have a cachePoint content block added might be useful.

Added test_format_bedrock_messages_does_not_mutate_original.

src/strands/models/bedrock.py

Unshure

This looks good to me, just need to rebase and address the nit.

tests/strands/agent/test_agent.py

Add CacheConfig with strategy="auto" for BedrockModel to automatically inject cache points at the end of assistant messages in multi-turn conversations. - Add CacheConfig dataclass in model.py with strategy field - Add supports_caching property to check Claude model compatibility - Implement _inject_cache_point() for automatic cache point management - Export CacheConfig from models/__init__.py Closes strands-agents#1432

- Add warning when cache_config enabled but model doesn't support caching - Make supports_caching private (_supports_caching) - Fix log formatting to follow style guide - Clean up tests and imports

- Inject into cleaned_messages instead of original messages to avoid mutating agent.messages - Use reversed iteration for safe in-place deletion - Consolidate redundant cache point tests

github-actions bot added the size/m label Jan 8, 2026

kevmyung requested a deployment to manual-approval January 8, 2026 22:14 — with GitHub Actions Waiting

Unshure reviewed Jan 9, 2026

View reviewed changes

github-actions bot added size/m and removed size/m labels Jan 9, 2026

kevmyung requested a deployment to manual-approval January 9, 2026 18:48 — with GitHub Actions Waiting

kevmyung force-pushed the feat/prompt-caching branch from 00a7eca to 92e2a59 Compare January 9, 2026 18:51

github-actions bot added size/m and removed size/m labels Jan 9, 2026

kevmyung requested a deployment to manual-approval January 9, 2026 18:51 — with GitHub Actions Waiting

Unshure reviewed Jan 13, 2026

View reviewed changes

github-actions bot added size/m and removed size/m labels Jan 19, 2026

kevmyung requested a deployment to manual-approval January 19, 2026 21:54 — with GitHub Actions Waiting

strands-agent mentioned this pull request Jan 20, 2026

[FEATURE] Add PromptCachingHook for automatic Bedrock prompt caching management #1508

Open

Unshure self-assigned this Jan 20, 2026

Unshure reviewed Jan 20, 2026

View reviewed changes

github-actions bot added size/m and removed size/m labels Jan 20, 2026

kevmyung requested a deployment to manual-approval January 20, 2026 23:57 — with GitHub Actions Waiting

Unshure reviewed Jan 25, 2026

View reviewed changes

tests/strands/agent/test_agent.py Show resolved Hide resolved

kevmyung added 4 commits January 25, 2026 12:59

refactor: simplify cache point injection logic and add tests

66b9c03

refactor: address PR review feedback for prompt caching

f35b00a

- Add warning when cache_config enabled but model doesn't support caching - Make supports_caching private (_supports_caching) - Fix log formatting to follow style guide - Clean up tests and imports

refactor(bedrock): inject cache points into cleaned_messages

2a498bc

- Inject into cleaned_messages instead of original messages to avoid mutating agent.messages - Use reversed iteration for safe in-place deletion - Consolidate redundant cache point tests

kevmyung force-pushed the feat/prompt-caching branch from 9e23a68 to 2a498bc Compare January 25, 2026 20:28

github-actions bot added size/m and removed size/m labels Jan 25, 2026

kevmyung requested a deployment to manual-approval January 25, 2026 20:28 — with GitHub Actions Waiting

		This enables prompt caching for multi-turn conversations by placing a single
		cache point that covers system prompt, tools, and conversation history.

		assert len(agent.messages) == original_length + 2


		def test_cache_config_does_not_mutate_original_messages(mock_model, agenerator):

feat(bedrock): add automatic prompt caching support #1438

Are you sure you want to change the base?

feat(bedrock): add automatic prompt caching support #1438

Conversation

kevmyung commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevmyung Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

azaylamba commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

strands-agent commented Jan 13, 2026

🎯 Review - Automatic Prompt Caching

What I Really Like ✅

Minor Suggestions 💡

1. Cache Point Detection Could Be More Explicit

2. Model Support Detection

3. Integration Test Clarity

Questions for Discussion 🤔

CI Status

Overall Assessment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Unshure left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

kevmyung commented Jan 8, 2026 •

edited

Loading

kevmyung Jan 9, 2026 •

edited

Loading

azaylamba commented Jan 12, 2026 •

edited

Loading