feat: configurable RetryConfig for transient-failure retries#20
Merged
jackparnell merged 1 commit intomainfrom Apr 9, 2026
Merged
feat: configurable RetryConfig for transient-failure retries#20jackparnell merged 1 commit intomainfrom
jackparnell merged 1 commit intomainfrom
Conversation
Adds a frozen RetryConfig dataclass that callers pass via:
ColonyClient(api_key, retry=RetryConfig(...))
AsyncColonyClient(api_key, retry=RetryConfig(...))
Fields:
max_retries — number of retries after the initial attempt (default 2)
base_delay — base backoff in seconds (default 1.0)
max_delay — cap on per-retry delay (default 10.0)
retry_on — frozenset of statuses that trigger retry
(default {429, 502, 503, 504})
The Nth retry waits min(base_delay * 2**(N-1), max_delay), unless the
server provides a Retry-After header which always overrides.
Why: downstream packages (langchain-colony, crewai-colony) currently
re-implement retry logic on top of the SDK with their own RetryConfig
clones. That logic belongs in one place — here. With this change they
can delete their wrappers and just pass `retry=` through.
Behavior change: 5xx gateway errors (502/503/504) are now retried by
default. They almost always represent transient infra issues that
clear on retry. 500 is intentionally NOT retried by default — it
usually signals a bug in the request, not a transient failure, so
retrying just amplifies the problem. Opt back into the old behaviour
with `RetryConfig(retry_on=frozenset({429}))`.
Internals:
- _should_retry(status, attempt, config) and
_compute_retry_delay(attempt, config, retry_after_header) helpers
shared by sync + async _raw_request paths
- _raw_request signature gains a separate `_token_refreshed` flag so
the 401-refresh path doesn't consume the configurable retry budget
- ColonyClient/AsyncColonyClient gain a `retry` attribute (the
RetryConfig instance, defaults to RetryConfig() if None passed)
Tests: 14 new sync + 7 new async tests covering defaults, max_retries=0,
custom retry_on, exponential backoff, max_delay capping, Retry-After
override, mixed 429/503 retry, and the token-refresh isolation. Coverage
stays at 100% (463/463 statements).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
ColonistOne
added a commit
that referenced
this pull request
Apr 9, 2026
Two changes that ship together so v1.5.0 can be the first release cut
via the new automation:
1. Release workflow at .github/workflows/release.yml — triggered on
`v*` tag push. Stages:
- test: runs ruff, mypy, pytest before anything else
- build: builds wheel + sdist, refuses to proceed if
the tag version doesn't match pyproject.toml
- publish: uploads to PyPI via OIDC trusted publishing
(no API token stored anywhere — short-lived
token minted by PyPI from the GitHub Actions
OIDC identity at publish time)
- github-release: extracts the matching CHANGELOG section and
creates a GitHub Release with the wheel + sdist
attached
2. Version bump 1.4.0 → 1.5.0 in pyproject.toml and __init__.py.
3. CHANGELOG: consolidated the 1.5.0 section into a clean, ordered
summary covering everything that's landed since 1.4.0:
- AsyncColonyClient (PR #18)
- Typed error hierarchy (PR #19)
- RetryConfig + 5xx default retry (PR #20)
- py.typed + verify_webhook + Dependabot (PR #21)
- Pagination iterators (PR #23)
- Coverage + Codecov (PR #17)
- This release automation
Coverage at 100% (514/514 statements). 215 tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6 tasks
ColonistOne
added a commit
that referenced
this pull request
Apr 9, 2026
Two changes that ship together so v1.5.0 can be the first release cut
via the new automation:
1. Release workflow at .github/workflows/release.yml — triggered on
`v*` tag push. Stages:
- test: runs ruff, mypy, pytest before anything else
- build: builds wheel + sdist, refuses to proceed if
the tag version doesn't match pyproject.toml
- publish: uploads to PyPI via OIDC trusted publishing
(no API token stored anywhere — short-lived
token minted by PyPI from the GitHub Actions
OIDC identity at publish time)
- github-release: extracts the matching CHANGELOG section and
creates a GitHub Release with the wheel + sdist
attached
2. Version bump 1.4.0 → 1.5.0 in pyproject.toml and __init__.py.
3. CHANGELOG: consolidated the 1.5.0 section into a clean, ordered
summary covering everything that's landed since 1.4.0:
- AsyncColonyClient (PR #18)
- Typed error hierarchy (PR #19)
- RetryConfig + 5xx default retry (PR #20)
- py.typed + verify_webhook + Dependabot (PR #21)
- Pagination iterators (PR #23)
- Coverage + Codecov (PR #17)
- This release automation
Coverage at 100% (514/514 statements). 215 tests passing.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a frozen
RetryConfigdataclass that callers pass via:max_retries20disables retries.base_delay1.0base_delay * 2**(N-1).max_delay10.0retry_onfrozenset({429, 502, 503, 504})The server's
Retry-Afterheader always overrides the computed delay.Why
Downstream packages (
langchain-colony,crewai-colony) currently re-implement retry logic on top of the SDK with their ownRetryConfigclones. That logic belongs in one place — here. With this change they can delete their wrappers and just passretry=through.Previously the SDK only retried 429s. It now also retries
502 Bad Gateway,503 Service Unavailable, and504 Gateway Timeout— these almost always represent transient infra issues that clear on retry.500 Internal Server Erroris intentionally not retried by default. It usually signals a bug in the request, not a transient failure, so retrying just amplifies the problem.To restore the old 1.4.x behaviour:
Internals
_should_retry(status, attempt, config)and_compute_retry_delay(attempt, config, retry_after_header)helpers shared by sync + async_raw_requestpaths._raw_requestsignature gains a separate_token_refreshedflag so the 401-refresh path doesn't consume the configurable retry budget. (Otherwise a 401 followed by a 429 storm would only getmax_retries - 1retries, which is surprising.)ColonyClient/AsyncColonyClientgain aretryattribute — theRetryConfiginstance, defaulting toRetryConfig()whenNonepassed.Examples
Test plan
TestRetryConfig) covering: defaults, frozen-ness, custom config wiring,max_retries=0, custommax_retries, default 503 retry, default 500 no-retry, customretry_on, exponential backoff math,max_delaycapping,Retry-Afteroverride, mixed 429/503 retry-then-success, token refresh not consuming retry budgetTestAsyncRetryConfig) mirroring the same scenariosruff check/ruff format --check/mypy src/all cleanFollow-up (next PRs, not this one)
After release,
crewai-colonyandlangchain-colonycan delete their customRetryConfigdataclasses and pass the SDK'sRetryConfigstraight through to the underlying client.