Skip to content

Conversation

@zastrowm
Copy link
Member

@zastrowm zastrowm commented Dec 31, 2025

Description

Users need the ability to retry model calls on arbitrary exceptions beyond just ModelThrottledException, and also retry based on response validation. This feature adds a low-level mechanism that enables that and more by letting hooks implement custom retry logic for both exceptions and successful responses.

Public API Changes

New Field: AfterModelCallEvent.retry

Hook providers can now set retry=True to retry model invocations on both exceptions and successful calls:

# Example 1: Retry on exceptions
class RetryOnServiceUnavailable(HookProvider):
    def __init__(self, max_retries=3):
        self.max_retries = max_retries
        self.retry_count = 0

    def register_hooks(self, registry, **kwargs):
        registry.add_callback(BeforeInvocationEvent, self.reset_counts)
        registry.add_callback(AfterModelCallEvent, self.handle_retry)

    def reset_counts(self, event = None):
      	self.retry_count = 0

    async def handle_retry(self, event):
        if event.exception:
            if "ServiceUnavailable" in str(event.exception):
                logger.info("ServiceUnavailable encountered")
                count = self.retry_count
                if count < self.max_retries:
                    logger.info("Retrying model call")
                    self.retry_count = count + 1
                    event.retry = True
                    await asyncio.sleep(2 ** count)  # Exponential backoff
        else:
            # reset counts in the succesful case
            self.reset_counts(None)

# Example 2: Retry on successful calls based on response validation
class MinimumResponseLengthHook(HookProvider):
    def __init__(self, min_length=50):
        self.min_length = min_length
        self.max_retries = 2
        self.retry_count = 0

    def register_hooks(self, registry, **kwargs):
        registry.add_callback(BeforeInvocationEvent, self.reset_counts)
        registry.add_callback(AfterModelCallEvent, self.handle_after_model_call)

    def reset_counts(self, event = None):
      	self.retry_count = 0

    async def handle_after_model_call(self, event):
        if event.stop_response:
            text = "".join(b.get("text", "") for b in event.stop_response.message.get("content", []))
            if len(text) < self.min_length and self.retry_count < self.max_retries:
                logger.info("Retrying model call due to length limitation")
                self.retry_count += 1
                event.retry = True

The retry field is writable within hook callbacks and defaults to False. It can be set for both successful calls (to validate response content) and failed calls (to retry exceptions).

Use Cases

  • Custom Exception Retry: Retry on ServiceUnavailableException (503), rate limit errors, or any application-specific exceptions
  • Response Validation: Retry if response doesn't meet quality criteria (length, format, content)
  • Flexible Retry Logic: Hooks control retry count, delay strategy (exponential backoff, jitter), and conditions
  • Request-Specific Retry: Different retry strategies based on request context, exception type, or response content

Implementation Notes

Hook Retry Integration

For now, hook-initiated retries work alongside the existing ModelThrottledException retry mechanism. If we implement #527 or #283, we should see if we can migrate the existing retry logic to be a hook instead of the current hard coding.

No Framework-Enforced Limits

The framework doesn't enforce retry count limits or delays for hook-initiated retries - hooks manage their own state and logic. This provides maximum flexibility while keeping the framework simple.

Related Issues

#370, #1386

Documentation PR

After approval of this PR, will generate the documentation PR

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

strands-agent and others added 8 commits December 31, 2025 10:47
Add retry_model field to AfterModelCallEvent that enables hook providers
to retry model invocations when exceptions occur. This provides flexibility
for users to implement custom retry logic for any exception type.

- Add retry_model: bool field to AfterModelCallEvent (default False)
- Implement _can_write() to allow hooks to modify retry_model
- Update _handle_model_execution() to check retry_model and retry when set
- Hook retries integrate with existing ModelThrottledException retry logic
- Hook retries respect MAX_ATTEMPTS limit
- Add comprehensive test coverage for retry scenarios

Resolves #9
Hooks should be able to retry without framework-enforced limits.
Removed the attempt count check that artificially limited hook retries.
Hooks now control their own retry logic within the loop iterations.
Reasoning: This will unlock use cases where folks want to re-trigger the model due to other reasons (like steering).
@codecov
Copy link

codecov bot commented Dec 31, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@zastrowm zastrowm marked this pull request as ready for review December 31, 2025 16:43
Unshure
Unshure previously approved these changes Dec 31, 2025
dbschmigelski
dbschmigelski previously approved these changes Dec 31, 2025
@zastrowm zastrowm merged commit db01eee into strands-agents:main Jan 2, 2026
14 of 15 checks passed
zastrowm added a commit to zastrowm/docs that referenced this pull request Jan 2, 2026
Now that strands-agents/sdk-python/pull/1405 is implemented, go ahead and add docs about how to use it. Expand the section to detail all writable properties.
zastrowm added a commit to strands-agents/docs that referenced this pull request Jan 2, 2026
Now that strands-agents/sdk-python/pull/1405 is implemented, go ahead and add docs about how to use it. Expand the section to detail all writable properties.

---------

Co-authored-by: Mackenzie Zastrow <zastrowm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants