feat: allow hooks to retry model invocations on exceptions #1405

zastrowm · 2025-12-31T15:48:40Z

Description

Users need the ability to retry model calls on arbitrary exceptions beyond just ModelThrottledException, and also retry based on response validation. This feature adds a low-level mechanism that enables that and more by letting hooks implement custom retry logic for both exceptions and successful responses.

Public API Changes

New Field: `AfterModelCallEvent.retry`

Hook providers can now set retry=True to retry model invocations on both exceptions and successful calls:

# Example 1: Retry on exceptions
class RetryOnServiceUnavailable(HookProvider):
    def __init__(self, max_retries=3):
        self.max_retries = max_retries
        self.retry_count = 0

    def register_hooks(self, registry, **kwargs):
        registry.add_callback(BeforeInvocationEvent, self.reset_counts)
        registry.add_callback(AfterModelCallEvent, self.handle_retry)

    def reset_counts(self, event = None):
      	self.retry_count = 0

    async def handle_retry(self, event):
        if event.exception:
            if "ServiceUnavailable" in str(event.exception):
                logger.info("ServiceUnavailable encountered")
                count = self.retry_count
                if count < self.max_retries:
                    logger.info("Retrying model call")
                    self.retry_count = count + 1
                    event.retry = True
                    await asyncio.sleep(2 ** count)  # Exponential backoff
        else:
            # reset counts in the succesful case
            self.reset_counts(None)

# Example 2: Retry on successful calls based on response validation
class MinimumResponseLengthHook(HookProvider):
    def __init__(self, min_length=50):
        self.min_length = min_length
        self.max_retries = 2
        self.retry_count = 0

    def register_hooks(self, registry, **kwargs):
        registry.add_callback(BeforeInvocationEvent, self.reset_counts)
        registry.add_callback(AfterModelCallEvent, self.handle_after_model_call)

    def reset_counts(self, event = None):
      	self.retry_count = 0

    async def handle_after_model_call(self, event):
        if event.stop_response:
            text = "".join(b.get("text", "") for b in event.stop_response.message.get("content", []))
            if len(text) < self.min_length and self.retry_count < self.max_retries:
                logger.info("Retrying model call due to length limitation")
                self.retry_count += 1
                event.retry = True

The retry field is writable within hook callbacks and defaults to False. It can be set for both successful calls (to validate response content) and failed calls (to retry exceptions).

Use Cases

Custom Exception Retry: Retry on ServiceUnavailableException (503), rate limit errors, or any application-specific exceptions
Response Validation: Retry if response doesn't meet quality criteria (length, format, content)
Flexible Retry Logic: Hooks control retry count, delay strategy (exponential backoff, jitter), and conditions
Request-Specific Retry: Different retry strategies based on request context, exception type, or response content

Implementation Notes

Hook Retry Integration

For now, hook-initiated retries work alongside the existing ModelThrottledException retry mechanism. If we implement #527 or #283, we should see if we can migrate the existing retry logic to be a hook instead of the current hard coding.

No Framework-Enforced Limits

The framework doesn't enforce retry count limits or delays for hook-initiated retries - hooks manage their own state and logic. This provides maximum flexibility while keeping the framework simple.

Related Issues

#370, #1386

Documentation PR

After approval of this PR, will generate the documentation PR

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

I ran hatch run prepare

Checklist

I have read the CONTRIBUTING document
I have added any necessary tests that prove my fix is effective or my feature works
I have updated the documentation accordingly
I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
My changes generate no new warnings
Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Add retry_model field to AfterModelCallEvent that enables hook providers to retry model invocations when exceptions occur. This provides flexibility for users to implement custom retry logic for any exception type. - Add retry_model: bool field to AfterModelCallEvent (default False) - Implement _can_write() to allow hooks to modify retry_model - Update _handle_model_execution() to check retry_model and retry when set - Hook retries integrate with existing ModelThrottledException retry logic - Hook retries respect MAX_ATTEMPTS limit - Add comprehensive test coverage for retry scenarios Resolves #9

Hooks should be able to retry without framework-enforced limits. Removed the attempt count check that artificially limited hook retries. Hooks now control their own retry logic within the loop iterations.

Reasoning: This will unlock use cases where folks want to re-trigger the model due to other reasons (like steering).

codecov · 2025-12-31T15:50:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

src/strands/hooks/events.py

src/strands/event_loop/event_loop.py

Now that strands-agents/sdk-python/pull/1405 is implemented, go ahead and add docs about how to use it. Expand the section to detail all writable properties.

Now that strands-agents/sdk-python/pull/1405 is implemented, go ahead and add docs about how to use it. Expand the section to detail all writable properties. --------- Co-authored-by: Mackenzie Zastrow <zastrowm@users.noreply.github.com>

strands-agent and others added 8 commits December 31, 2025 10:47

fix: remove MAX_ATTEMPTS limit on hook-initiated retries

9c2c490

Hooks should be able to retry without framework-enforced limits. Removed the attempt count check that artificially limited hook retries. Hooks now control their own retry logic within the loop iterations.

Additional changes from write operations

d5711d8

Move tests to proper file

8b4712d

Simplify tests

8480f8b

Enable retrying model calls even when no exception is triggered

d74b9dc

Reasoning: This will unlock use cases where folks want to re-trigger the model due to other reasons (like steering).

Update event documentation

016beab

Reformat files

2217b45

github-actions bot added the size/m label Dec 31, 2025

zastrowm had a problem deploying to auto-approve December 31, 2025 15:48 — with GitHub Actions Failure

zastrowm had a problem deploying to auto-approve December 31, 2025 16:14 — with GitHub Actions Failure

zastrowm marked this pull request as ready for review December 31, 2025 16:43

dbschmigelski reviewed Dec 31, 2025

View reviewed changes

src/strands/hooks/events.py Show resolved Hide resolved

Rename retry_model to retry

c8841b6

github-actions bot added size/m and removed size/m labels Dec 31, 2025

zastrowm had a problem deploying to auto-approve December 31, 2025 17:46 — with GitHub Actions Failure

Unshure previously approved these changes Dec 31, 2025

View reviewed changes

src/strands/event_loop/event_loop.py Show resolved Hide resolved

Mark retry as writable

3b593a5

dbschmigelski previously approved these changes Dec 31, 2025

View reviewed changes

zastrowm dismissed stale reviews from dbschmigelski and Unshure via 3b593a5 December 31, 2025 18:22

github-actions bot added size/m and removed size/m labels Dec 31, 2025

zastrowm had a problem deploying to auto-approve December 31, 2025 18:22 — with GitHub Actions Failure

Unshure approved these changes Dec 31, 2025

View reviewed changes

dbschmigelski approved these changes Dec 31, 2025

View reviewed changes

zastrowm merged commit db01eee into strands-agents:main Jan 2, 2026
14 of 15 checks passed

zastrowm mentioned this pull request Jan 2, 2026

[FEATURE] Retries for ServiceUnavailableException (503 errors) #370

Open

This was referenced Jan 2, 2026

Add Model Call Retry documentation with example hook strands-agents/docs#407

Merged

Implement a retry_strategy object for retrying operations on the agent zastrowm/sdk-python#15

Open

github-actions bot mentioned this pull request Jan 5, 2026

Release notes for v1.21.0 of the sdk-python zastrowm/sdk-python#19

Open

dbschmigelski mentioned this pull request Jan 23, 2026

feat(hooks): add retry mechanism for tool calls #1556

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow hooks to retry model invocations on exceptions #1405

feat: allow hooks to retry model invocations on exceptions #1405

Uh oh!

zastrowm commented Dec 31, 2025 •

edited

Loading

Uh oh!

codecov bot commented Dec 31, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: allow hooks to retry model invocations on exceptions #1405

feat: allow hooks to retry model invocations on exceptions #1405

Uh oh!

Conversation

zastrowm commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Public API Changes

New Field: AfterModelCallEvent.retry

Use Cases

Implementation Notes

Hook Retry Integration

No Framework-Enforced Limits

Related Issues

Documentation PR

Type of Change

Testing

Checklist

Uh oh!

codecov bot commented Dec 31, 2025

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zastrowm commented Dec 31, 2025 •

edited

Loading

New Field: `AfterModelCallEvent.retry`