fix: Slack client cache — LRU bound, empty-token, revocation eviction#8
Conversation
…eviction Three bugs in _client_cache: 1. Unbounded dict replaced with OrderedDict LRU (max 100 entries) that evicts the oldest client and closes its aiohttp session. 2. _get_client used truthiness (`token or`) which treated "" the same as None; OAuth callback passes "" intentionally. Now uses `is None`. 3. No cache invalidation on token revocation — added _invalidate_client() and wired it into _handle_slack_error for invalid_auth/token_revoked. Also adds pytest-cov to dev deps and enforces 50% coverage floor in CI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces an LRU-bounded cache for Slack client instances to manage resource usage and adds logic to invalidate cached clients upon authentication failures. It also includes pytest-cov in the development dependencies. Key feedback points include a concurrency risk when closing sessions of evicted clients, a bug in the rate limit detection logic where type checking prevents the correct exception from being raised, and a recommendation to make the hardcoded cache size configurable for better scalability.
| # Close the evicted client's session if possible | ||
| try: | ||
| if hasattr(evicted_client, "session") and evicted_client.session: | ||
| asyncio.get_running_loop().create_task(evicted_client.session.close()) | ||
| except RuntimeError: | ||
| pass |
There was a problem hiding this comment.
Closing the session of an evicted client using create_task is risky in a concurrent environment. Since AsyncWebClient instances are cached and shared, an eviction triggered by one request (e.g., when a new workspace token is added) will terminate the underlying connection pool for any other concurrent requests that are still using the evicted client instance. This will lead to RuntimeError or connection failures in those requests.
A safer approach would be to share a single aiohttp.ClientSession across all AsyncWebClient instances (by passing it to the constructor), or to rely on garbage collection for the sessions of evicted clients to ensure they are only closed when no longer in use.
There was a problem hiding this comment.
Fixed in PR #13 -- removed the create_task(evicted_client.session.close()) call. The evicted client's aiohttp session will now be cleaned up by the garbage collector once all references are released, avoiding the race condition with concurrent requests.
| if isinstance(resp, dict) and error_code == "ratelimited": | ||
| raise AdapterRateLimitError("slack") from error |
There was a problem hiding this comment.
The rate limit check is currently broken for standard Slack API errors. When using slack_sdk, the resp object is typically a SlackResponse instance, which is not a dict, so the isinstance(resp, dict) check will fail and the specialized AdapterRateLimitError will not be raised. Since error_code has already been safely extracted above (handling both dicts and SlackResponse objects), you should rely on error_code directly for this check.
| if isinstance(resp, dict) and error_code == "ratelimited": | |
| raise AdapterRateLimitError("slack") from error | |
| # Check for rate limiting | |
| if error_code == "ratelimited": | |
| raise AdapterRateLimitError("slack") from error |
There was a problem hiding this comment.
Fixed in PR #13 -- _handle_slack_error now correctly handles SlackResponse objects by checking resp.data (dict) first, falling back to isinstance(resp, dict). The rate limit check no longer requires the response to be a dict. Also now extracts the Retry-After header from resp.headers and passes it to AdapterRateLimitError.
| self._client_cache: dict[str, Any] = {} | ||
| # Cache of AsyncWebClient instances keyed by bot token (LRU-bounded) | ||
| self._client_cache: OrderedDict[str, Any] = OrderedDict() | ||
| self._client_cache_max = 100 # max cached clients |
There was a problem hiding this comment.
The maximum cache size for Slack clients is hardcoded to 100. In large-scale multi-workspace deployments, this limit might be reached frequently, leading to cache churn and unnecessary session recreations. Consider making this value configurable via SlackAdapterConfig to allow users to tune it based on their needs.
There was a problem hiding this comment.
Fixed in PR #13 -- added client_cache_max: int | None to SlackAdapterConfig (defaults to 100). Users can now tune the LRU cache size for their deployment scale.
- Fix Slack rate limit detection: _handle_slack_error now correctly handles SlackResponse objects (not just dicts) by checking resp.data, and extracts the Retry-After header for rate limit errors - Fix Teams JWT verification blocking: wrap synchronous PyJWKClient.get_signing_key_from_jwt in asyncio.to_thread to avoid blocking the event loop during JWKS fetch - Make Slack client cache max configurable: add client_cache_max field to SlackAdapterConfig (defaults to 100) - Fix unsafe session cleanup on LRU eviction: remove create_task session close that could break concurrent requests using evicted clients; rely on garbage collection instead - Add tests for SlackResponse handling, rate limiting, and configurable cache max Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
dictwithOrderedDict-based LRU cache (max 100 entries). Evicted clients get their aiohttp sessions closed._get_clientto useis Noneinstead of truthiness, so_get_client("")no longer falls through to the context token._invalidate_client()and wired it into_handle_slack_errorforinvalid_auth,token_revoked, andaccount_inactiveerror codes.pytest-covto dev deps and--cov=chat_sdk --cov-fail-under=50to the CI test step.Test plan
🤖 Generated with Claude Code