feat(slack): Socket Mode transport (vercel/chat#162)#86
Conversation
Add an opt-in Socket Mode transport to ``SlackAdapter`` so bots can consume Slack events over a persistent WebSocket instead of webhooks. Mirrors the upstream JS port at vercel/chat#162. Highlights: * New ``SlackAdapterConfig`` fields: ``mode`` (``"webhook"`` default or ``"socket"``), ``app_token`` (xapp-* with prefix validation), and ``socket_forwarding_secret`` for the forwarded-event HTTP path. * ``signing_secret`` is now optional in socket mode (Slack does not sign socket events). ``_verify_signature`` refuses when no secret is configured so a stray webhook in socket mode can't HMAC against ``""``. * ``initialize()`` opens the WebSocket when ``mode == "socket"``. ``disconnect()`` cancels a tracked reconnect loop and closes the client; both ``start_socket_mode()`` and ``stop_socket_mode()`` are idempotent. * ``_route_socket_event`` dispatches ``events_api`` / ``slash_commands`` / ``interactive`` payloads into the same handlers the webhook path uses (no fork). Skips Slack retries (``retry_attempt > 0``) but still acks. Interactive view-submission errors round-trip through the ack payload. * ``handle_webhook`` now accepts forwarded socket events at any time via ``x-slack-socket-token`` (constant-time compared against the configured secret); refuses direct webhook POSTs in socket mode. * New ``ModalResponse(action="clear")`` produces ``response_action: clear`` for closing the entire modal view stack. * ``slack-socket`` extra in ``pyproject.toml`` (``slack-sdk`` + ``aiohttp``); the import inside ``start_socket_mode`` stays lazy. Hazard coverage: explicit ``asyncio.Task`` tracking and shutdown signaling for the WebSocket loop (#5); ``contextvars.copy_context()`` preserved through the socket dispatch path so per-event token resolution still inherits into spawned handlers (#6); single-process SocketModeClient lifecycle (#11); ``app_token`` format validated, never logged (#12). Tests: 24 new unit tests covering config validation, modal-clear emission, forwarded-event auth (accept / reject / no-secret-configured / invalid-token), the full ``_route_socket_event`` matrix (events_api / slash / interactive / retry / unknown), lifecycle (start, stop, idempotency, first-connect failure surfaced, transient-disconnect reconnect), and the ContextVar boundary on multi-workspace token resolution. The new file stubs ``slack_sdk.socket_mode.*`` in ``sys.modules`` to match the existing ``test_slack_client_cache.py`` pattern. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
📝 WalkthroughWalkthroughThis PR adds Socket Mode transport to the Slack adapter alongside the existing webhook mode. Socket Mode enables receiving Slack events via persistent WebSocket connections with an outer reconnect loop, exponential backoff, and context-aware token resolution for multi-workspace deployments. Events may arrive over the WebSocket or be forwarded via HTTP with cryptographic validation. ChangesSlack Socket Mode Transport
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request implements Slack Socket Mode support, enabling the adapter to handle events via a long-lived WebSocket. It includes a robust background reconnect loop with exponential backoff, logic for handling forwarded socket events in serverless setups, and updated configuration validation. Review feedback focuses on preventing a potential task leak during the connection phase and optimizing the backoff sleep mechanism by replacing the current polling approach with a more efficient asyncio.Event-based wait.
| first_done, _ = await asyncio.wait( | ||
| {asyncio.create_task(connected.wait()), self._socket_task}, | ||
| return_when=asyncio.FIRST_COMPLETED, | ||
| ) |
There was a problem hiding this comment.
The task created for connected.wait() remains running if the background loop task (self._socket_task) finishes first (e.g., due to a connection error or crash during the initial attempt). It is better to explicitly track and cancel this temporary wait task to avoid a potential task leak.
| first_done, _ = await asyncio.wait( | |
| {asyncio.create_task(connected.wait()), self._socket_task}, | |
| return_when=asyncio.FIRST_COMPLETED, | |
| ) | |
| wait_task = asyncio.create_task(connected.wait()) | |
| first_done, pending = await asyncio.wait( | |
| {wait_task, self._socket_task}, | |
| return_when=asyncio.FIRST_COMPLETED, | |
| ) | |
| if wait_task in pending: | |
| wait_task.cancel() |
There was a problem hiding this comment.
Addressed in b8a6e07 ("fix(slack): asyncio.Event for socket shutdown + cancel orphan wait_task"). The orphan wait_task is now explicitly cancelled when the loop task finishes first, and additionally drained on the new timeout-cleanup path added in e1c8047. Marking stale.
Generated by Claude Code
| async def _socket_sleep_with_backoff(self, seconds: float) -> None: | ||
| """Sleep for ``seconds`` but wake immediately on shutdown.""" | ||
| # Poll every 0.25s so a stop_socket_mode call doesn't have to wait | ||
| # the full backoff window. asyncio.Event would be slightly cleaner | ||
| # but we don't want to add an Event per loop iteration. | ||
| deadline = asyncio.get_event_loop().time() + seconds | ||
| while not self._socket_shutdown: | ||
| remaining = deadline - asyncio.get_event_loop().time() | ||
| if remaining <= 0: | ||
| return | ||
| await asyncio.sleep(min(0.25, remaining)) |
There was a problem hiding this comment.
The _socket_sleep_with_backoff method uses a polling loop to check for shutdown every 0.25s. This is less efficient than using an asyncio.Event. By using a single asyncio.Event on the adapter instance (e.g., self._socket_shutdown_event), you can wait for either the timeout or the shutdown signal immediately using asyncio.wait_for(self._socket_shutdown_event.wait(), timeout=seconds). This avoids periodic wake-ups and is more idiomatic in async Python. Contrary to the inline comment, you only need one event instance per adapter, not one per loop iteration.
There was a problem hiding this comment.
Addressed in b8a6e07. _socket_sleep_with_backoff now uses a single self._socket_shutdown_event: asyncio.Event on the adapter and await asyncio.wait_for(self._socket_shutdown_event.wait(), timeout=seconds) — no more 250ms polling. Marking stale.
Generated by Claude Code
Address gemini-code-assist review on PR #86: 1. Line 1375: ``connected.wait()`` was wrapped in an untracked task that stayed pending if the loop task finished first (e.g., first connect failed). Track it explicitly and cancel when it lands in ``pending``. 2. Line 1504: replace the ``not self._socket_shutdown`` polling loop with a per-adapter ``asyncio.Event`` waited via ``asyncio.wait_for``. One event per adapter (not per backoff iteration); wakeup latency is now bounded by the event-loop scheduler rather than the previous 250ms poll cadence. Updates ``start_socket_mode`` (clear), ``stop_socket_mode`` (set), and the four loop checks. 24 socket-mode tests still pass. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill
left a comment
There was a problem hiding this comment.
Code review — Slack Socket Mode port
Reviewed against upstream vercel/chat#162 (7e90d9c) on f55378a, applying docs/UPSTREAM_SYNC.md hazards #5/#6/#11/#12 and docs/SELF_REVIEW.md. Reviewed feat(slack): Socket Mode transport + the asyncio.Event follow-up. Net assessment: solid port with thoughtful Python-specific hardening. Two real issues worth tightening; the rest are nits.
🟡 Medium
-
No connect timeout on
SocketModeClient.connect()(hazard #11).start_socket_modedoesawait asyncio.wait({wait_task, self._socket_task}, return_when=FIRST_COMPLETED)with notimeout=. If the slack_sdk handshake hangs (DNS black-hole, TLS stall, mis-routed firewall),initialize()blocks forever and the caller has no observable signal. Upstream inherits the same problem from@slack/socket-mode, so this is parity, butdocs/UPSTREAM_SYNC.mdHazard #11 explicitly flags "shared sessions plus explicit cleanup". Suggest wrapping the wait withasyncio.wait_for(..., timeout=connect_timeout_s)(configurable, default ~30s) and on timeout callingawait self.stop_socket_mode()before raising. Add a regression test that injects aconnect()thatawait asyncio.sleep(60)s. -
Forwarded events have no freshness check, even though the envelope ships a
timestamp(hazard #12)._handle_forwarded_socket_eventaccepts any payload whosex-slack-socket-tokenmatches. The forwarded JSON includestimestamp: Date.now()from the listener but neither side validates it. A leaked forwarding secret + replayed body would re-execute handlers indefinitely (and dedupe only catches identical message IDs, not re-injected modal submits). Upstream has the same gap; worth either (a) adding a 5-minute window check mirroring_verify_signature, or (b) documenting it in the non-parity table so future readers don't assume the timestamp is load-bearing. Same goes for the missing per-request nonce.
🔵 Nit
-
stop_socket_modesuppressesExceptionafterawait task(adapter.py:1413).with contextlib.suppress(asyncio.CancelledError, Exception):is broader than necessary —Exceptioncovers everything that isn't a system-exiting BaseException, so a real bug surfaced post-cancellation is lost. The loop'sexceptalready logs; suggest narrowing to(asyncio.CancelledError,)and letting anything else log to stderr. -
asyncio.get_event_loop().create_task(...)in_route_socket_event.wrap_async(adapter.py:1568). Deprecated since 3.10 when no loop is set. Inside a running coroutineget_running_loop()is correct and matchesstart_socket_mode(which already uses it). -
_route_socket_eventininteractiveswallows dispatch errors then ack's empty (adapter.py:1672–1681). Upstream lets the error propagate to theslack_eventlistener; Python catches and ack's with no body. Net effect: aview_submissionwhose handler raises silently closes the modal instead of showing errors. Consider logging aterror(already done) plus ack'ing with{"response_action": "errors", "errors": {"_": "internal error"}}so the user sees the modal didn't take. -
Cross-contamination test gap.
TestSocketContextVarcovers single-event isolation, but there's no test that fires two concurrent_route_socket_event(events_api)calls for different teams viaasyncio.gatherand asserts each handler sees the right token. Given thecontextvars.copy_context()reliance (hazard #6) this is the exact regression class the doc warns about. -
Extra
is_ext_shared_channelfield added to the synthesizedevent_callbackpayload (adapter.py:1597) that upstream'srouteSocketEventdoesn't include. Harmless additive, but if you want strict parity-on-the-wire, drop it (and the existing webhook path doesn't add it either, so this introduces a socket-vs-webhook asymmetry).
✅ Looks good
app_tokenvalidated forxapp-prefix at construction; never logged. (Hazard #12)_verify_signaturecorrectly refuses when_signing_secret is Noneso socket-mode adapters can't silently HMAC against"". (Hazard #12)hmac.compare_digestused forx-slack-socket-token. (Hazard #12)socket_forwarding_secretdefaults toapp_tokenmatching upstream + lets ops override via env — the docstring even calls out the recommendation. (Hazard #12)_socket_tasktracked, cancelled, awaited on shutdown. Thewait_taskorphan is now explicitly cancelled per the follow-up. (Hazard #5)_socket_shutdown_event(asyncio.Event) replaces the 250ms poll —_socket_sleep_with_backoffcorrectly useswait_for;seconds=0is well-defined (immediateTimeoutError) and unreachable here since_socket_initial_backoff_s = 1.0. (Hazard #5)- Reconnect resets
backoff = self._socket_initial_backoff_safter a successful connect; oldSocketModeClientisdisconnect()'d before the next iteration'sSocketModeClient(...). No instance leak. (Hazard #11) - Lazy import of
slack_sdk.socket_mode.aiohttpinsidestart_socket_mode/_socket_mode_loop. (Hazard #10) mode="webhook"default preserves all existing webhook behavior; the dispatch path branches only on thex-slack-socket-tokenheader andself._mode == "socket". Backwards-compatible.create_slack_adapterrejectsmode="socket"+ multi-workspace upfront — a real footgun that upstream silently allows.- Outer reconnect layer + slack_sdk's
auto_reconnect_enabled=Trueare documented divergence in the non-parity table; they don't fight because the outer loop only re-enters after the inner client isdisconnect()'d andis_connected()returns False.
Posted by an automated reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code
Hardens the socket-mode port against the seven review findings on PR #86: * Add ``connect_timeout_s`` config (default 30s) and wrap the initial socket-mode handshake in ``asyncio.wait_for`` so a hung ``SocketModeClient.connect()`` can't make ``initialize()`` block forever (hazard #11). On timeout we tear the loop down before raising. * Reject forwarded socket events whose ``timestamp`` field is outside the same 5-minute window ``_verify_signature`` enforces, so a captured forwarded payload can't be replayed indefinitely (hazard #12). * Narrow the ``contextlib.suppress`` in ``stop_socket_mode`` to ``CancelledError`` only — surprising loop crashes are no longer silently swallowed during shutdown. * Replace ``asyncio.get_event_loop().create_task`` in the socket-mode ``wrap_async`` helper with ``get_running_loop`` (Python 3.12+ compatibility, hazard #5). * Have the socket-mode interactive branch ack with ``response_action: errors`` instead of an empty ack when dispatch raises — an empty ack on ``view_submission`` silently closes the modal so the user sees no signal anything went wrong. * Drop ``is_ext_shared_channel`` from the synthesized ``event_callback`` payload in the socket-mode events_api branch so socket and webhook paths feed identical shapes into ``_process_event_payload`` (hazard #7). * Add a regression test firing two concurrent ``_route_socket_event`` events_api dispatches for different teams via ``asyncio.gather`` to pin down the existing ``copy_context()`` isolation against future drift (hazard #6). Also fixes a pre-existing B010 lint warning in the test stub. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
patrick-chinchill
left a comment
There was a problem hiding this comment.
Re-review of e1c8047 ("address PR #86 review findings")
Verified fixes (7/7)
connect_timeout_s(default 30s) +wait_for+ cleanup on timeout — Verified atsrc/chat_sdk/adapters/slack/adapter.py:256-258, 1399-1415. Regression testtest_hung_connect_raises_timeout_and_cleans_upinjects a 60s sleep and asserts_socket_task/_socket_clientare nulled.test_connect_timeout_default_is_30spins the default. PASS.- 5-min freshness on forwarded events + replay test + missing-timestamp test — Implemented at
adapter.py:817-827; tests attests/test_slack_socket_mode.py:782-820. PASS for the check shape, but see medium #M1 below — the units are wrong. stop_socket_modecontextlib.suppressnarrowed to(asyncio.CancelledError,)—adapter.py:1451-1452. PASS.asyncio.get_running_loop()everywhere —grepconfirms noget_event_loop()remains in the file (all four call sites useget_running_loop()). PASS.- Interactive dispatch error →
response_action: errorsack —adapter.py:1721; regression test assertsbody_arg.get("response_action") == "errors". PASS. - Concurrent cross-team
asyncio.gathertest —tests/test_slack_socket_mode.py:648-723; the slow first-team resolver (asyncio.sleep(0.02)) interleaves the second dispatch, so it would actually fail withoutcopy_context()isolation. Real test, not phantom. PASS. is_ext_shared_channeldropped from synthesized event_callback —adapter.py:1635-1641; test at:867-895. Matches upstreamrouteSocketEventshape. PASS for parity, but see nit #N1.
uv run pytest tests/test_slack_socket_mode.py -q → 31 passed.
New findings
M1 — Forwarded-event timestamp wire format is incompatible with the upstream emitter (medium)
adapter.py:822 does abs(int(time.time()) - ts_int) > 300, treating ts_int as seconds. Upstream's forwardSocketEvent always serializes timestamp: Date.now() (packages/adapter-slack/src/index.ts:1837, 6188, 6217, 6244, 6285, 6314, 6353, 6393, 6537, 6584 — every test fixture confirms this) — i.e. milliseconds. A real JS-listener-emitted forwarded payload arrives with ts ≈ 1.78e12; the Python receiver computes abs(1.78e9 - 1.78e12) ≈ 1.78e12 > 300 and rejects with 401 every time, ~56,000 years of skew. The replay test only passes because it manually emits seconds (int(_time.time()) at tests/test_slack_socket_mode.py:296, 791). Since start_socket_mode_listener was intentionally not ported (docs/UPSTREAM_SYNC.md:498), the only expected producer is a JS process — which this rejects 100% of the time. Two viable fixes:
- Detect ms vs s by magnitude (
if ts_int > 10**12: ts_int //= 1000) and update tests to also exercise the JS-shaped wire format, or - Multiply the budget:
> 300_000and treat the field as ms.
Either way, add a fidelity test that round-trips a literalDate.now()-shaped int (e.g.int(time.time() * 1000)).
M2 — Socket-mode loses external/shared-channel tracking (medium, latent upstream parity)
_process_event_payload at adapter.py:959 reads payload.get("is_ext_shared_channel") to populate _external_channels. The webhook path forwards the whole raw payload (so this works); the socket path now synthesizes a payload that omits the field (adapter.py:1635-1641). Downstream consumers of _external_channels (used for posting policies — _externalChannels at upstream 4359, 4449, 4787) silently never see socket-mode shared channels. Upstream has the same bug (its routeSocketEvent also drops it), so dropping the field matches upstream parity — but the previous reviewer's framing of this as a "fix" is misleading, and it is a real loss of functionality compared to the webhook path. Recommend either:
- Carrying
is_ext_shared_channelthrough on the socket path as a Python-only divergence (tracked indocs/UPSTREAM_SYNC.md), or - At minimum documenting the limitation in the socket-mode docs and the hazard table so it isn't quietly inherited at the next sync.
Nits
- N1 — The "fix" framing in commit
e1c8047for finding #7 reads as defense-in-depth, but the actual behavior change drops a field downstream code consumes. A one-line comment in_process_event_payloadnear theis_ext_shared_channelbranch noting "socket-mode payloads never carry this; mirrors upstreamrouteSocketEvent" would prevent re-introducing it on a future port. - N2 —
_route_socket_event's 3 branches (events_api / slash / interactive) all add multi-workspace token resolution that upstream'srouteSocketEventdoesn't have. This is reachable only by bypassing the factory (which rejects the combo atcreate_slack_adapterline 3729). Either remove the dead resolver paths in_route_socket_event(keep the factory guard as the single point of truth), or move the multi-workspaceValidationErrorintoSlackAdapter.__init__so the dead branches can be removed entirely. - N3 —
connect_timeout_sfield default is30.0inSlackAdapterConfig(types.py:63), making theif config.connect_timeout_s is not None else 30.0ternary atadapter.py:256-258dead unless a caller explicitly passesNone. Either drop the ternary or default the field toNone. - N4 —
asyncio.wait_forwrapping a shieldedasyncio.wait(..., FIRST_COMPLETED)(adapter.py:1399-1407) means the innerasyncio.waitkeeps running after timeout. It currently completes naturally becausestop_socket_mode()cancels_socket_taskand thewait_task.cancel()resolves thewait_task— so it's safe, but a comment would help future readers (or replace the shield with explicit cancellation of_socket_taskin the timeout branch). - N5 — The
slash_commandssocket branch ignores the_handle_slash_commandreturn value (matches upstream — slash responses go viaresponse_url), but unlike upstream there's nowrapAsync(...).then(forwardToResponseUrl)wiring. This is consistent with how the webhook slash path also expects handlers to useresponse_urlthemselves, but worth a# slash responses are out-of-band via response_url, not the ackcomment in_route_socket_event.
Upstream parity coverage hunted
- Lifecycle:
disconnect()→stop_socket_mode()matches upstreamdisconnect(). PASS. - Reconnect: Python adds an outer reconnect loop on top of
slack_sdk's built-inauto_reconnect_enabled; upstream relies entirely on@slack/socket-mode's internal reconnect. Documented as belt-and-suspenders inUPSTREAM_SYNC.md:497. PASS. start_socket_mode_listener(transient serverless variant): intentionally not ported —UPSTREAM_SYNC.md:498. The forwarded-event receiver is ported, which is why M1 matters.- Ack protocol: upstream-parity for events_api (empty), slash (empty), interactive (parsed JSON body or empty). PASS.
- Test fidelity vs upstream
index.test.ts:5840-6600: all 5 factory-validation tests, all 4 socket-mode handleWebhook tests, all 6 routeSocketEvent dispatch tests, all 7 socket-mode-forwarding tests, the 3startSocketModeListenertests (intentionally skipped per #498), and 3 routeSocketEvent-with-options tests are covered, plus 7 review-finding regressions. PASS.
Re-review verdict: FOLLOW-UP NEEDED — fixes 1, 3, 4, 5, 6, 7 are solid; fix 2 (forwarded-event freshness) is wired correctly but uses the wrong time unit and would reject every real upstream-emitted forwarded payload. M1 is a one-liner; M2 is a doc/decision item.
Posted by an automated re-reviewer agent. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code
…stamps Address PR #86 re-review M1 (critical interop break). Upstream's forwardSocketEvent always emits ``timestamp: Date.now()`` — milliseconds since epoch. Python's ``time.time()`` returns seconds. The previous freshness check compared the two directly, treating a real JS-emitted ms timestamp (~1.78e12) as 56,000 years skewed and rejecting every real forwarded event with 401. Since startSocketModeListener was intentionally not ported (UPSTREAM_SYNC.md non-parity), the only expected producer is a JS process — making this a 100% interop break. Auto-detect the unit by magnitude: anything > 10**11 is ms (that magnitude crossed in 2001), normalize to seconds before comparing to ``time.time()``. Accepts both wire formats so a future Python-emitted listener also works. Adds two tests: - test_js_emitted_milliseconds_timestamp_accepted: a real Date.now()-shaped ms timestamp now passes freshness. - test_js_emitted_milliseconds_replay_rejected: a 6-minute-old ms timestamp is still rejected. All 33 socket-mode tests pass. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Final upstream-coverage audit before merging the 7 sync PRs (#84-#90) identified one undocumented N/A item: vercel/chat#415 (Teams SDK 2.0.8 + User-Agent) is a JS-only botbuilder dependency bump. The Python Teams adapter uses raw aiohttp (no botbuilder), so there is no equivalent dependency to bump. The optional User-Agent: Vercel.ChatSDK header on the ~9 outbound aiohttp call sites is a defense-in-depth nice-to-have; deferred as a follow-up rather than landed in this sync. Updates: - CHANGELOG.md: tick all completed items and link them to their PRs (#84, #85, #86, #87, #88, #89, #90, plus already-merged PR #74). Document #415 inline as N/A. - docs/UPSTREAM_SYNC.md non-parity table: add row for Teams User-Agent header divergence so future syncers don't try to "port" the JS bump. Item #6 (concurrency.maxConcurrent) is already implementation-covered in the Python port (existing divergence row at L492). The 4 new TS concurrency tests in chat.test.ts have Python-specific equivalents at test_chat_faithful.py L2969-3055 that don't name-match — leaving as deferred fidelity-baseline polish since the behavior is verified. Verdict from the coverage audit: all 18 substantive ports across PRs #84-#90 are upstream-verified. No commits in chat@4.26.0..f55378a were missed. Ready to start merging. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
* N1: comment near ``_process_event_payload`` is_ext_shared_channel branch noting socket-mode payloads never carry this field — mirrors upstream's ``routeSocketEvent`` shape, documented as a known divergence so it isn't quietly re-introduced on a future port. * N3: drop the dead ``is not None else 30.0`` ternary; the config field is typed ``float`` with a 30s default so the fallback is unreachable. * N4: comment on ``asyncio.wait_for(asyncio.shield(asyncio.wait(...)))`` explaining why the shielded inner wait doesn't leak — both inner tasks get cancelled in the timeout branch. * N5: comment near the ``slash_commands`` ack noting slash responses are out-of-band via ``response_url``, not the WebSocket ack. Comment-only / dead-code-only changes; no behavior change. All 33 socket-mode tests pass; full suite 3701 passed (1 pre-existing failure). https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
``github-code-quality`` and other static analyzers periodically flag the ``await wait_task`` / ``await task`` inside ``with contextlib.suppress(asyncio.CancelledError):`` blocks in ``_socket_connect_with_timeout`` (~L1432) and ``stop_socket_mode`` (~L1471) as "Statement has no effect" — they model ``await`` syntactically rather than as a side-effecting suspension that drains a cancelled task. The pattern is intentional and load-bearing: - After ``task.cancel()`` we MUST ``await`` the task to deterministically drain it. Without the await, the cancellation request is queued but the function returns before the task actually tears down. - For ``stop_socket_mode`` this breaks ``test_stop_idempotent`` (a second call can race the first's cleanup) and risks "Task was destroyed but it is pending!" warnings from the asyncio loop. - ``contextlib.suppress(asyncio.CancelledError)`` absorbs the expected ``CancelledError`` raised by the awaited task on cancel. The same false-positive pattern was flagged and explained on earlier commits — the inline comments now make the intent visible at the source so future readers (human or bot) don't have to re-discover it via review history. Tests: 33 socket-mode tests pass, ruff clean.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4a0cdd5a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| except ImportError as exc: # pragma: no cover - import-time failure | ||
| raise ValidationError( | ||
| "slack", | ||
| "slack_sdk is not installed. Install with `pip install chat-sdk[slack]`.", |
There was a problem hiding this comment.
Fix socket-mode dependency guidance in import error
When start_socket_mode() cannot import slack_sdk.socket_mode.aiohttp (for example because aiohttp is missing), this error tells users to install chat-sdk[slack], but this commit adds aiohttp only under the slack-socket extra in pyproject.toml. Following the current message can leave users in a broken loop where socket mode still fails after reinstall, so the runtime guidance should point to the extra that actually satisfies the dependency.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in ab8cd26.
The error message now points at chat-sdk[slack-socket] (the extra that actually pulls in aiohttp per pyproject.toml:51), so users following the hint will land on a working install. Added a regression test (TestSocketModeImportError) that asserts the message references chat-sdk[slack-socket] and explicitly guards against drifting back to chat-sdk[slack].
Generated by Claude Code
…extra The aiohttp transport lives behind chat-sdk[slack-socket] (which adds aiohttp on top of slack-sdk), not chat-sdk[slack]. Following the old hint left users stuck because aiohttp was still missing after reinstall. Adds a regression test covering the import-error pathway so the hint can't drift back to the wrong extras name. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
|
@codex review Latest commits since the prior review:
Worth a fresh pass on the reconnect / shutdown lifecycle (hazard #5), forwarded-event freshness (hazard #12), and ContextVar propagation (hazard #6). The Generated by Claude Code |
… path PR #86 commit ab8cd26 added ``test_missing_aiohttp_transport_message_points_at_slack_socket_extra`` which exercises this branch directly, so the ``# pragma: no cover - import-time failure`` marker is stale. Removing it keeps coverage honest and prevents future regressions in the hint-message wording from silently slipping past coverage gating. Self-review (PR #86): no behavior change, just the inline marker. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab8cd26d5b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| app's events for one workspace) and silently mixing the two would mask | ||
| a config mistake. | ||
| """ | ||
| if config is not None and (config.mode or "webhook") == "socket" and (config.client_id or config.client_secret): |
There was a problem hiding this comment.
Reject env-sourced OAuth creds in socket mode factory
Extend this guard to include credentials loaded from environment variables, not just config.client_id / config.client_secret. Right now create_slack_adapter(SlackAdapterConfig(mode="socket")) succeeds when SLACK_CLIENT_ID and SLACK_CLIENT_SECRET are set, even though the factory/docstring says multi-workspace OAuth is unsupported in socket mode; that silently admits an unsupported configuration and defers failure to runtime behavior instead of failing fast at construction.
Useful? React with 👍 / 👎.
|
@codex review Re-requesting on the latest state (commit Fresh focus: confirm the import-error hint now points at Generated by Claude Code |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
| # model ``await`` syntactically rather than as a | ||
| # side-effecting suspension — that's a false positive; | ||
| # do not remove this line. | ||
| await wait_task |
| # this as "statement has no effect" because they model | ||
| # ``await`` syntactically — that's a false positive; do | ||
| # not remove this line. | ||
| await task |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/chat_sdk/adapters/slack/adapter.py`:
- Line 195: Replace the truthy fallback using "mode = config.mode or 'webhook'"
with an explicit None check so falsy-but-valid values aren't overridden; change
to conditional assignment using the pattern "config.mode if config.mode is not
None else 'webhook'". Update the same pattern wherever similar fallbacks are
used in this module (the other occurrence flagged in adapter.py) to follow the
"x if x is not None else default" rule, referencing the variable name "mode" and
the "config.mode" attribute to locate the spots to change.
In `@tests/test_slack_socket_mode.py`:
- Around line 309-310: Replace bare truthiness checks on the mock method
chat.process_message.called with explicit call-count assertions: use
chat.process_message.assert_called_once() (or assert_called_once_with(...) if
specific args are expected) to ensure the handler is invoked exactly once;
update the same pattern at the other occurrences referenced (the assertions
around lines 366-367 and 397-400) so all tests assert exact call counts rather
than truthiness.
- Line 493: The negative assertion in the import-hint regression check
mistakenly includes a trailing backtick ("chat-sdk[slack]`") so it may miss the
actual bad hint; update the assertion to check for the correct string
"chat-sdk[slack]" (i.e., change the assert that currently reads assert
"chat-sdk[slack]`" not in msg, msg to assert "chat-sdk[slack]" not in msg, msg)
so the test properly guards against the wrong import hint (referencing the
failing assert and the msg variable).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2ce4e35a-65da-4ca8-a5e1-2a3a5e08464b
📒 Files selected for processing (6)
docs/UPSTREAM_SYNC.mdpyproject.tomlsrc/chat_sdk/adapters/slack/adapter.pysrc/chat_sdk/adapters/slack/types.pysrc/chat_sdk/types.pytests/test_slack_socket_mode.py
| if config is None: | ||
| config = SlackAdapterConfig() | ||
|
|
||
| mode = config.mode or "webhook" |
There was a problem hiding this comment.
Replace truthy fallback on mode with explicit None checks.
Using or here violates the repo rule and can silently coerce unexpected falsy values.
Suggested patch
- mode = config.mode or "webhook"
+ mode = config.mode if config.mode is not None else "webhook"
...
- if config is not None and (config.mode or "webhook") == "socket" and (config.client_id or config.client_secret):
+ if (
+ config is not None
+ and (config.mode if config.mode is not None else "webhook") == "socket"
+ and (config.client_id or config.client_secret)
+ ):Also applies to: 3825-3825
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/chat_sdk/adapters/slack/adapter.py` at line 195, Replace the truthy
fallback using "mode = config.mode or 'webhook'" with an explicit None check so
falsy-but-valid values aren't overridden; change to conditional assignment using
the pattern "config.mode if config.mode is not None else 'webhook'". Update the
same pattern wherever similar fallbacks are used in this module (the other
occurrence flagged in adapter.py) to follow the "x if x is not None else
default" rule, referencing the variable name "mode" and the "config.mode"
attribute to locate the spots to change.
| assert chat.process_message.called | ||
|
|
There was a problem hiding this comment.
Use explicit call-count assertions instead of .called truthiness.
These assertions can pass even if handlers are invoked multiple times. Prefer assert_called_once() (or assert_called_once_with(...)) to make regressions fail deterministically.
Suggested tightening
- assert chat.process_message.called
+ chat.process_message.assert_called_once()
...
- assert chat.process_message.called
+ chat.process_message.assert_called_once()
...
- assert chat.process_slash_command.called
+ chat.process_slash_command.assert_called_once()As per coding guidelines, “Every test must fail when the code is wrong… no bare truthiness checks when specific values are available.”
Also applies to: 366-367, 397-400
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_slack_socket_mode.py` around lines 309 - 310, Replace bare
truthiness checks on the mock method chat.process_message.called with explicit
call-count assertions: use chat.process_message.assert_called_once() (or
assert_called_once_with(...) if specific args are expected) to ensure the
handler is invoked exactly once; update the same pattern at the other
occurrences referenced (the assertions around lines 366-367 and 397-400) so all
tests assert exact call counts rather than truthiness.
| msg = str(exc_info.value) | ||
| assert "chat-sdk[slack-socket]" in msg, msg | ||
| # Guard against regressing back to the wrong extra name. | ||
| assert "chat-sdk[slack]`" not in msg, msg |
There was a problem hiding this comment.
Fix the negative assertion typo in the import-hint regression check.
The current string includes a trailing backtick, so it may miss the actual wrong hint text and weaken the guard.
Proposed fix
- assert "chat-sdk[slack]`" not in msg, msg
+ assert "chat-sdk[slack]" not in msg, msg📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert "chat-sdk[slack]`" not in msg, msg | |
| assert "chat-sdk[slack]" not in msg, msg |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_slack_socket_mode.py` at line 493, The negative assertion in the
import-hint regression check mistakenly includes a trailing backtick
("chat-sdk[slack]`") so it may miss the actual bad hint; update the assertion to
check for the correct string "chat-sdk[slack]" (i.e., change the assert that
currently reads assert "chat-sdk[slack]`" not in msg, msg to assert
"chat-sdk[slack]" not in msg, msg) so the test properly guards against the wrong
import hint (referencing the failing assert and the msg variable).
Summary
Port of upstream
vercel/chat#162— adds an opt-in Slack Socket Mode transport so bots can consume Slack events over a persistent WebSocket instead of (or alongside) signed HTTP webhooks.The webhook code path is unchanged. Socket Mode is enabled per-adapter via
SlackAdapterConfig(mode="socket", app_token="xapp-...").Design notes
Lifecycle (hazard #5: async task lifecycle)
start_socket_mode()spawns a trackedasyncio.Taskthat runs the connect / run / reconnect loop. Returns once the first connect succeeds (or raises if the first connect fails — bad app token, offline, etc.). Idempotent.stop_socket_mode()flips a shutdown flag, disconnects the liveSocketModeClient, cancels the loop task, and awaits it. Idempotent.initialize()callsstart_socket_mode()whenmode == "socket".disconnect()always callsstop_socket_mode()(no-op in webhook mode).Reconnect strategy
We layer an outer reconnect loop on top of
slack_sdk.socket_mode.aiohttp.SocketModeClient(which already does its ownauto_reconnect_enabled=True). Exponential backoff 1s → 30s with the loop polling for shutdown every 250ms sostop_socket_mode()never has to wait the full backoff window. Rationale: the inner SDK retries inside an established session, but doesn't retry ifconnect()itself raises (network down at startup, transient DNS failure). The outer loop is belt-and-suspenders, not a behavior change. Documented indocs/UPSTREAM_SYNC.md.Acknowledgment protocol
Each socket request hits
_on_socket_requestwhich constructs anack(payload=None)callback that wrapsclient.send_socket_mode_response(SocketModeResponse(...)). Slack retries (retry_attempt > 0) are still acked but not dispatched, so we don't double-process. Interactiveview_submissionerrors round-trip through the ack payload.Forwarded events (serverless variant)
handle_webhooknow acceptsx-slack-socket-token-bearing JSON POSTs at any time (constant-time compared viahmac.compare_digest). This lets a separate long-lived process run the WebSocket and POST events to a stateless webhook handler — the upstreamstartSocketModeListenerpattern. The full Vercel cron glue isn't ported (Vercel-specific), but the receiving end is.Hazard coverage
Tasktracking + shutdown flag; no bareensure_future.contextvars.copy_context()so the per-eventRequestContextContextVar persists into spawned handler tasks.slack_sdk.socket_mode.*imports stay insidestart_socket_mode/ handler body.SocketModeClientper session; tear down before reconnect.app_tokenvalidated to start withxapp-at construction; never logged. Webhook signing now refuses when no secret is set (was implicitly safe before, explicitly safe now).Other in-scope changes
ModalResponse(action="clear")— closes the entire modal view stack; emitsresponse_action: clear. Matches upstream'sModalClearResponsetype.slack-socket = ["slack-sdk>=3.27.0", "aiohttp>=3.9"].docs/UPSTREAM_SYNC.md: documents the outer reconnect loop and the not-ported serverlessstartSocketModeListenerglue.Divergences from upstream
@slack/socket-mode's internal reconnect only)connect()can raise before the inner loop owns the lifecycle; we don't want adapter setup to give up on first DNS blip. Inner auto-reconnect still runs.startSocketModeListener/runSocketModeListener(serverless cron variant)waitUntilsemantics. The receiving side (x-slack-socket-tokenhandling inhandle_webhook) is ported so a separate Python process can run the listener.Both rows added to the by-design non-parity table in
docs/UPSTREAM_SYNC.md.Test plan
uv run ruff check src/ tests/ scripts/uv run ruff format --check src/ tests/ scripts/uv run python scripts/audit_test_quality.py— 0 hard failuresuv run pyrefly check src/chat_sdk/adapters/slack/— 0 errorsuv run pytest tests/ --tb=short -q— 3692 passed, 1 pre-existing failure (tests/test_github_webhook.py::TestGitHubAdapterConstructor::test_throws_when_no_auth, unrelated)tests/test_slack_socket_mode.py:app_tokenrequired,xapp-prefix enforced, signing optional in socket mode, env var pickup, multi-workspace rejected by factory, forwarding-secret fallbackclearaction emitsresponse_action: clear_route_socket_event: events_api dispatch, missing-event-field handling, slash command ack-then-dispatch, interactive ack with errors body, retry skipped but acked, unknown event type ackedUpstream PR: vercel/chat#162
https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Generated by Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests