feat(attachments): port rehydrateAttachment hook (#52)#67
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (16)
📝 WalkthroughWalkthroughAdds attachment rehydration: attachments gain a serializable Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Queue
participant Serializer
participant Chat
participant Adapter
Client->>Queue: enqueue message (Attachment with fetch_data)
Queue->>Serializer: persist (serialize to JSON) -- callables removed
Serializer->>Chat: dequeue (JSON/dict)
Chat->>Chat: _rehydrate_message(json, adapter)
Chat->>Chat: _coerce_attachments() -> Attachment objects with fetch_metadata
Chat->>Adapter: rehydrate_attachment(attachment) [if fetch_data missing]
Adapter-->>Chat: attachment (fetch_data restored or unchanged)
Chat->>Client: deliver rehydrated message
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a4e9235ebb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if isinstance(raw, Message): | ||
| return raw |
There was a problem hiding this comment.
Apply attachment rehydration to Message instances
_rehydrate_message returns immediately for Message inputs, which skips the new rehydrate_attachment hook entirely. This breaks queue/debounce rehydration for real persisted backends: both RedisStateAdapter.dequeue and PostgresStateAdapter.dequeue deserialize queued JSON into Message.from_json(...), so attachments arrive as Message objects with fetch_data already stripped. In those environments, attachments will remain non-downloadable after dequeue because the adapter hook is never reached.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 26a08bd — the P1 issue was a real Python-specific divergence: our Redis/Postgres dequeue returns Message instances (upstream returns raw JSON dicts), so upstream's early-return on Message silently skipped rehydration for every persistent-backend dequeue. Fix: removed the early-return, fall through to rehydrate pass. Documented in docs/UPSTREAM_SYNC.md.
There was a problem hiding this comment.
Code Review
This pull request implements an attachment rehydration mechanism to restore download capabilities after JSON serialization. It introduces the Adapter.rehydrate_attachment hook and Attachment.fetch_metadata field, with specific implementations for the Slack, Teams, Google Chat, Telegram, and WhatsApp adapters. The core Chat class has been updated to utilize these hooks during message rehydration. Review feedback points out that the Teams and Slack adapters inefficiently create new HTTP clients for every download and suggests using pooled sessions instead.
| def _build_teams_fetch_data(self, url: str) -> Callable[[], Awaitable[bytes]]: | ||
| """Build a lazy ``fetch_data`` closure for a Teams file URL.""" | ||
|
|
||
| async def fetch_data() -> bytes: | ||
| import httpx | ||
|
|
||
| async with httpx.AsyncClient() as http: | ||
| resp = await http.get(url) | ||
| resp.raise_for_status() | ||
| return resp.content | ||
|
|
||
| return fetch_data |
There was a problem hiding this comment.
The current implementation of _build_teams_fetch_data uses httpx.AsyncClient() as a context manager inside the closure, which creates and destroys a new connection pool for every file download. This is inefficient and inconsistent with the rest of the adapter, which uses a pooled aiohttp.ClientSession via the _get_http_session() helper.
| def _build_teams_fetch_data(self, url: str) -> Callable[[], Awaitable[bytes]]: | |
| """Build a lazy ``fetch_data`` closure for a Teams file URL.""" | |
| async def fetch_data() -> bytes: | |
| import httpx | |
| async with httpx.AsyncClient() as http: | |
| resp = await http.get(url) | |
| resp.raise_for_status() | |
| return resp.content | |
| return fetch_data | |
| def _build_teams_fetch_data(self, url: str) -> Callable[[], Awaitable[bytes]]: | |
| """Build a lazy ``fetch_data`` closure for a Teams file URL.""" | |
| async def fetch_data() -> bytes: | |
| session = await self._get_http_session() | |
| async with session.get(url) as resp: | |
| resp.raise_for_status() | |
| return await resp.read() | |
| return fetch_data |
There was a problem hiding this comment.
Addressed in 26a08bd — Teams _build_teams_fetch_data now uses the pooled _get_http_session() helper (matches the pattern at teams/adapter.py:1268).
| import httpx | ||
|
|
||
| async with httpx.AsyncClient() as http: | ||
| resp = await http.get(url, headers={"Authorization": f"Bearer {token}"}) | ||
| resp.raise_for_status() | ||
| content_type = resp.headers.get("content-type", "") | ||
| if "text/html" in content_type: | ||
| raise RuntimeError( | ||
| "Failed to download file from Slack: received HTML login page. " | ||
| 'Ensure your Slack app has the "files:read" OAuth scope.' | ||
| ) | ||
| return resp.content |
There was a problem hiding this comment.
Deferred — Slack adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient for Slack API calls). Adding one is out of scope for this PR; tracked as a follow-up. Teams version is fixed in this PR.
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (4)
tests/test_whatsapp_webhook.py (1)
700-710: Strengthen the rehydration test to validate callback wiring.This currently only checks existence (
fetch_data is not None). It should also verify the async callback callsdownload_mediawith the expectedmediaId.💡 Suggested test hardening
-from unittest.mock import MagicMock +from unittest.mock import AsyncMock, MagicMock ... - def test_rehydrates_fetch_data_from_media_id(self): + `@pytest.mark.asyncio` + async def test_rehydrates_fetch_data_from_media_id(self): from chat_sdk.types import Attachment adapter = _make_adapter() + adapter.download_media = AsyncMock(return_value=b"ok") attachment = Attachment( type="image", fetch_metadata={"mediaId": "media-42"}, ) rehydrated = adapter.rehydrate_attachment(attachment) assert rehydrated.fetch_data is not None + data = await rehydrated.fetch_data() + assert data == b"ok" + adapter.download_media.assert_awaited_once_with("media-42") assert rehydrated.fetch_metadata == {"mediaId": "media-42"}As per coding guidelines,
tests/**/*.py: Every test must fail when the code is wrong, and use AsyncMock where async behavior is involved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_whatsapp_webhook.py` around lines 700 - 710, Update the test_rehydrates_fetch_data_from_media_id to assert the async callback returned in adapter.rehydrate_attachment actually calls the adapter.download_media with the expected mediaId: replace the loose existence check of rehydrated.fetch_data with an AsyncMock for adapter.download_media, obtain the async callback from rehydrated.fetch_data, await/call it, and assert adapter.download_media was awaited/called once with "media-42"; keep references to Attachment, adapter.rehydrate_attachment, and adapter.download_media so the test fails if the wiring is broken.tests/test_telegram_adapter.py (1)
381-393: Tighten rehydration assertion by executing the async callback.Right now it only proves callback presence. Execute
fetch_data()with anAsyncMockdownloader to provefileIdwiring is correct.💡 Suggested test hardening
+from unittest.mock import AsyncMock ... - def test_rehydrate_attachment_uses_file_id_from_fetch_metadata(self): + `@pytest.mark.asyncio` + async def test_rehydrate_attachment_uses_file_id_from_fetch_metadata(self): from chat_sdk.types import Attachment adapter = _make_adapter() + adapter.download_file = AsyncMock(return_value=b"ok") attachment = Attachment( type="image", fetch_metadata={"fileId": "AgACAgIAAxkB"}, ) rehydrated = adapter.rehydrate_attachment(attachment) assert rehydrated.fetch_data is not None + data = await rehydrated.fetch_data() + assert data == b"ok" + adapter.download_file.assert_awaited_once_with("AgACAgIAAxkB") # fetch_metadata is preserved so the attachment stays serializable/rehydratable again. assert rehydrated.fetch_metadata == {"fileId": "AgACAgIAAxkB"}As per coding guidelines,
tests/**/*.py: Every test must fail when the code is wrong, and use AsyncMock where async behavior is involved.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_telegram_adapter.py` around lines 381 - 393, Update the test_rehydrate_attachment_uses_file_id_from_fetch_metadata to execute the async fetch callback: create an AsyncMock downloader that returns a sentinel result and inject or patch it into the adapter instance used by adapter.rehydrate_attachment; call rehydrated.fetch_data() (await it) and assert the AsyncMock was awaited with the fileId "AgACAgIAAxkB" and that the awaited call returns the sentinel result, while still asserting rehydrated.fetch_metadata == {"fileId": "AgACAgIAAxkB"} to ensure serializability.src/chat_sdk/chat.py (1)
2172-2172: Avoid truthiness fallbacks in attachment key coercion.On Line 2172 and Line 2176,
orcan incorrectly ignore valid falsy values from the camelCase field. Use explicitis not Nonefallback here.As per coding guidelines: "Use `x if x is not None else default` instead of `x or default` to avoid truthiness traps when porting from TypeScript."💡 Proposed fix
- mime_type=att.get("mimeType") or att.get("mime_type"), + mime_type=( + att.get("mimeType") + if att.get("mimeType") is not None + else att.get("mime_type") + ), @@ - fetch_metadata=att.get("fetchMetadata") or att.get("fetch_metadata"), + fetch_metadata=( + att.get("fetchMetadata") + if att.get("fetchMetadata") is not None + else att.get("fetch_metadata") + ),Also applies to: 2176-2176
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/chat_sdk/chat.py` at line 2172, The attachment key coercion uses truthy fallback (e.g., att.get("mimeType") or att.get("mime_type")) which can drop valid falsy values; update the logic in the attachment construction (around the code that references att.get("mimeType") and the occurrence near Line 2176) to use explicit None-check fallbacks like "value_if_camelcase if value_if_camelcase is not None else value_if_snakecase" so the camelCase field is used when present even if falsy.src/chat_sdk/adapters/google_chat/adapter.py (1)
2685-2687: Use explicitNonefallback and preserve the resolved URL.
meta.get("url") or attachment.urlhas truthiness pitfalls, and reconstructing withurl=attachment.urldrops the resolved fallback URL.As per coding guidelines: "Use `x if x is not None else default` instead of `x or default` to avoid truthiness traps when porting from TypeScript."💡 Proposed fix
- meta = attachment.fetch_metadata or {} + meta = attachment.fetch_metadata if attachment.fetch_metadata is not None else {} resource_name = meta.get("resourceName") - url = meta.get("url") or attachment.url + meta_url = meta.get("url") + url = meta_url if meta_url is not None else attachment.url @@ - url=attachment.url, + url=url,Also applies to: 2692-2692
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/chat_sdk/adapters/google_chat/adapter.py` around lines 2685 - 2687, The code uses truthy fallback which can drop valid falsy URLs and later reconstructs the URL from attachment.url; change the URL resolution to use explicit None-checking: retrieve meta via attachment.fetch_metadata or {}, keep resource_name from meta.get("resourceName") as-is, and set url = meta.get("url") if meta.get("url") is not None else attachment.url so the resolved URL is preserved wherever `url` is used (also apply the same explicit None fallback at the other occurrence mentioned around line 2692).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/chat_sdk/adapters/google_chat/adapter.py`:
- Around line 2660-2666: The code performs an authenticated GET to a
metadata-provided url variable using adapter._get_access_token() and
adapter._get_http_session() without validating the URL, which can leak tokens
via SSRF; before calling session.get(...) validate the url by parsing it
(scheme/netloc), enforce allowed schemes (https only), verify the
hostname/netloc against an allowlist or known metadata hostnames and reject or
normalize IPs (disallow private/reserved addresses), and refuse requests with
redirects to untrusted hosts; only after these checks call
adapter._get_http_session().get(...) with the bearer token and handle rejection
by logging and skipping the fetch.
In `@src/chat_sdk/adapters/slack/adapter.py`:
- Around line 1776-1825: The rehydrate_attachment closure rebuilds fetch_data
using an unvalidated URL and then calls _fetch_slack_file which blindly sends
the bot token — this allows a tampered attachment to exfiltrate tokens. Before
using url in rehydrate_attachment/fetch_data, parse and validate it (ensure
scheme is https and host is a trusted Slack host such as files.slack.com,
slack.com, or other allowed Slack workspace domains), and if it fails validation
raise an error or return the attachment unchanged; only call _fetch_slack_file
when the URL passes validation. Update rehydrate_attachment, the inner
fetch_data closure, and any call sites that accept fetch_metadata["url"] to
perform this whitelist validation and explicitly reference functions
rehydrate_attachment, fetch_data, and _fetch_slack_file when making the change.
In `@src/chat_sdk/adapters/teams/adapter.py`:
- Around line 594-629: rehydrate_attachment currently trusts
fetch_metadata["url"]/attachment.url and passes it to _build_teams_fetch_data,
enabling SSRF; validate the URL before rebuilding the closure by parsing it
(e.g., urllib.parse), ensuring scheme is http or https, enforcing allowed host
patterns (e.g., Microsoft Graph/expected domains) or performing a DNS
resolution/IP check to reject private/reserved addresses, and if validation
fails return the original attachment unchanged; implement the validation logic
in rehydrate_attachment (or a helper called from it) and only call
_build_teams_fetch_data(url) when the URL passes these checks.
In `@src/chat_sdk/adapters/telegram/adapter.py`:
- Line 1379: The current assignment uses truthiness fallback for
attachment.fetch_metadata which can mis-handle falsy but valid values; change
the meta assignment to use an explicit None check: set meta =
attachment.fetch_metadata if attachment.fetch_metadata is not None else {} so
you reference attachment.fetch_metadata and assign to meta without using `or` to
avoid truthiness traps.
In `@src/chat_sdk/adapters/whatsapp/adapter.py`:
- Line 646: The current assignment uses truthiness fallback which can mis-handle
falsy but valid metadata; change the expression that sets meta to use an
explicit None check instead (i.e., assign meta = attachment.fetch_metadata if
attachment.fetch_metadata is not None else {}) so that attachment.fetch_metadata
is used when present even if falsy; update the code around the variable meta in
the adapter.py function/method where meta is defined to reference
attachment.fetch_metadata explicitly rather than relying on `or` fallback.
In `@src/chat_sdk/chat.py`:
- Around line 2105-2107: The list comprehension loses type-safety because
rehydrate is untyped from getattr; declare rehydrate with an explicit type
(e.g., rehydrate: Optional[Callable[[Attachment], Attachment]] =
getattr(adapter, "rehydrate_attachment", None)) and update the comprehension to
guard by isinstance and fetch_data so only Attachment objects are passed to
rehydrate (for example: msg.attachments = [att if not isinstance(att,
Attachment) or att.fetch_data is not None else rehydrate(att) for att in
msg.attachments] while keeping the existing callable check), ensuring
msg.attachments remains a list[Attachment] and non-Attachment objects cannot
leak in.
In `@src/chat_sdk/types.py`:
- Line 521: The dual-key lookup using "or" collapses valid empty values (e.g.,
{}) to the alternate key; update occurrences where fetch_metadata is assigned
from att (the expression at the fetch_metadata assignment in types.py and the
similar occurrence later) to use a None-aware conditional: check the primary
key's value and if it is not None use it, otherwise use the secondary key —
i.e., replace the "att.get('fetchMetadata') or att.get('fetch_metadata')"
pattern with a conditional that returns att.get('fetchMetadata') if it is not
None else att.get('fetch_metadata'), and do the same for the other occurrence
mentioned.
In `@tests/test_teams_adapter.py`:
- Around line 355-375: The tests currently only check that
adapter.rehydrate_attachment(...) returns a non-None callable; update both tests
to stub the fetch path with an AsyncMock, call
adapter.rehydrate_attachment(attachment), await rehydrated.fetch_data(), and
assert the awaited result (e.g., returned bytes or the URL) matches the expected
value so the test fails if rehydration wiring is wrong; reference the
adapter.rehydrate_attachment function, the Attachment instances, and the
rehydrated.fetch_data async callable when making these changes.
---
Nitpick comments:
In `@src/chat_sdk/adapters/google_chat/adapter.py`:
- Around line 2685-2687: The code uses truthy fallback which can drop valid
falsy URLs and later reconstructs the URL from attachment.url; change the URL
resolution to use explicit None-checking: retrieve meta via
attachment.fetch_metadata or {}, keep resource_name from
meta.get("resourceName") as-is, and set url = meta.get("url") if meta.get("url")
is not None else attachment.url so the resolved URL is preserved wherever `url`
is used (also apply the same explicit None fallback at the other occurrence
mentioned around line 2692).
In `@src/chat_sdk/chat.py`:
- Line 2172: The attachment key coercion uses truthy fallback (e.g.,
att.get("mimeType") or att.get("mime_type")) which can drop valid falsy values;
update the logic in the attachment construction (around the code that references
att.get("mimeType") and the occurrence near Line 2176) to use explicit
None-check fallbacks like "value_if_camelcase if value_if_camelcase is not None
else value_if_snakecase" so the camelCase field is used when present even if
falsy.
In `@tests/test_telegram_adapter.py`:
- Around line 381-393: Update the
test_rehydrate_attachment_uses_file_id_from_fetch_metadata to execute the async
fetch callback: create an AsyncMock downloader that returns a sentinel result
and inject or patch it into the adapter instance used by
adapter.rehydrate_attachment; call rehydrated.fetch_data() (await it) and assert
the AsyncMock was awaited with the fileId "AgACAgIAAxkB" and that the awaited
call returns the sentinel result, while still asserting
rehydrated.fetch_metadata == {"fileId": "AgACAgIAAxkB"} to ensure
serializability.
In `@tests/test_whatsapp_webhook.py`:
- Around line 700-710: Update the test_rehydrates_fetch_data_from_media_id to
assert the async callback returned in adapter.rehydrate_attachment actually
calls the adapter.download_media with the expected mediaId: replace the loose
existence check of rehydrated.fetch_data with an AsyncMock for
adapter.download_media, obtain the async callback from rehydrated.fetch_data,
await/call it, and assert adapter.download_media was awaited/called once with
"media-42"; keep references to Attachment, adapter.rehydrate_attachment, and
adapter.download_media so the test fails if the wiring is broken.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c7d147da-d63d-4559-962b-b99fa823f6f6
📒 Files selected for processing (14)
CHANGELOG.mdsrc/chat_sdk/adapters/google_chat/adapter.pysrc/chat_sdk/adapters/slack/adapter.pysrc/chat_sdk/adapters/teams/adapter.pysrc/chat_sdk/adapters/telegram/adapter.pysrc/chat_sdk/adapters/whatsapp/adapter.pysrc/chat_sdk/chat.pysrc/chat_sdk/types.pytests/test_chat_faithful.pytests/test_google_chat_adapter.pytests/test_slack_webhook.pytests/test_teams_adapter.pytests/test_telegram_adapter.pytests/test_whatsapp_webhook.py
…ness fallbacks Second review pass on PR #67 (rehydrate_attachment). The previous fixup addressed pyrefly only — this commit resolves the remaining review feedback. SSRF guards (3 adapters) - Slack, Teams, Google Chat all rebuild fetch_data closures from serialized fetch_metadata["url"] in rehydrate_attachment. A tampered URL in persisted queue state could exfiltrate the workspace bot/OAuth token to an attacker-controlled host. Each adapter now validates the URL's scheme (https only) and host against a platform-specific allowlist before forwarding the auth header. Upstream TS does not validate; this is a Python-first divergence documented in docs/UPSTREAM_SYNC.md. - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com - Teams: Microsoft-owned hosts (graph.microsoft.com, smba.trafficmanager.net, *.sharepoint.com, *.botframework.com, *.office.com, attachments.office.net, …) - Google Chat: chat.googleapis.com, *.googleapis.com, *.googleusercontent.com, *.google.com Message-instance rehydration (P1) - Chat._rehydrate_message used to early-return on Message inputs, matching upstream TS's `raw instanceof Message` shortcut. That shortcut is safe in upstream because its state adapters return raw JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter both upgrade the dequeued dict to `Message.from_json(...)` before returning, so the early return would skip rehydrate_attachment for every persistent-backend dequeue and leave fetch_data stripped. We now fall through and apply the rehydrate pass on Message inputs too (already-hydrated attachments with fetch_data are filtered out). Truthiness fallbacks (Port Rule #1) - telegram, whatsapp rehydrate_attachment and types.py dual-key fetch_metadata lookup now use explicit `is not None` instead of `or`, so an empty-dict fetch_metadata is preserved. Teams connection pooling - _build_teams_fetch_data used httpx.AsyncClient as a throwaway context manager per download. Refactored to use the shared aiohttp session (_get_http_session) that the rest of the adapter already goes through. Test hardening - test_slack_webhook.py and test_teams_adapter.py now stub the fetch path with AsyncMock, await rehydrated.fetch_data(), and assert the URL + token that were forwarded. Previously the tests only checked that `fetch_data is not None` — they would have passed even if rehydration returned a dummy closure. - New tests per adapter verify the SSRF guard rejects untrusted hosts and the allowlist accepts the intended Slack/Teams/GCP hosts. - New regression test in test_chat_faithful.py drives a Message- instance dequeue through the chat queue and asserts rehydrate_attachment still fires. Slack adapter connection pooling (deferred) - _fetch_slack_file still uses httpx.AsyncClient per call. The Slack adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient for Slack API calls), so adding one is a larger refactor left for a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Review verdict: LGTM. Refreshed and reviewed latest head c871b2b against main. No issues found in attachment rehydration or provider download safety paths. Verification: targeted adapter/chat/type coverage passed (349 passed, 2 skipped) and ruff passed on touched files. Formal GitHub approval is blocked because the authenticated account owns this PR. |
…ialization Upstream Adapter.rehydrateAttachment rebuilds the fetch_data download closure after a JSON roundtrip through the state adapter — essential for queue/debounce concurrency strategies, where entries pass through JSON.stringify and lose any callable fields. This PR ports the hook to Python: Adapter gains an optional rehydrate_attachment method (default no-op on BaseAdapter), Attachment gains a serializable fetch_metadata dict, and Chat._rehydrate_message now threads the active adapter and invokes the hook on any attachment whose fetch_data was stripped. Per-adapter implementations land on Slack (url + teamId), Teams (url), Google Chat (resourceName + url), Telegram (fileId), and WhatsApp (mediaId); Discord, GitHub and Linear intentionally do not implement it (upstream parity — they use public URLs or have no file attachments). Closes 3 [concurrency: queue attachment rehydration] fidelity gaps. Refs #52. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- chat.py:2107 — type the rehydrate-attachment callable so the list comprehension narrows to list[Attachment]. Unblocks CI. - _coerce_attachments: replace `or` fallbacks with `is not None` (Port Rule #1 truthiness trap) - google_chat rehydrate_attachment: preserve resolved URL when reconstructing, drop truthiness fallback on meta["url"] - Harden telegram and whatsapp rehydrate tests to execute the async callback and verify download-method wiring (AsyncMock). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ness fallbacks Second review pass on PR #67 (rehydrate_attachment). The previous fixup addressed pyrefly only — this commit resolves the remaining review feedback. SSRF guards (3 adapters) - Slack, Teams, Google Chat all rebuild fetch_data closures from serialized fetch_metadata["url"] in rehydrate_attachment. A tampered URL in persisted queue state could exfiltrate the workspace bot/OAuth token to an attacker-controlled host. Each adapter now validates the URL's scheme (https only) and host against a platform-specific allowlist before forwarding the auth header. Upstream TS does not validate; this is a Python-first divergence documented in docs/UPSTREAM_SYNC.md. - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com - Teams: Microsoft-owned hosts (graph.microsoft.com, smba.trafficmanager.net, *.sharepoint.com, *.botframework.com, *.office.com, attachments.office.net, …) - Google Chat: chat.googleapis.com, *.googleapis.com, *.googleusercontent.com, *.google.com Message-instance rehydration (P1) - Chat._rehydrate_message used to early-return on Message inputs, matching upstream TS's `raw instanceof Message` shortcut. That shortcut is safe in upstream because its state adapters return raw JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter both upgrade the dequeued dict to `Message.from_json(...)` before returning, so the early return would skip rehydrate_attachment for every persistent-backend dequeue and leave fetch_data stripped. We now fall through and apply the rehydrate pass on Message inputs too (already-hydrated attachments with fetch_data are filtered out). Truthiness fallbacks (Port Rule #1) - telegram, whatsapp rehydrate_attachment and types.py dual-key fetch_metadata lookup now use explicit `is not None` instead of `or`, so an empty-dict fetch_metadata is preserved. Teams connection pooling - _build_teams_fetch_data used httpx.AsyncClient as a throwaway context manager per download. Refactored to use the shared aiohttp session (_get_http_session) that the rest of the adapter already goes through. Test hardening - test_slack_webhook.py and test_teams_adapter.py now stub the fetch path with AsyncMock, await rehydrated.fetch_data(), and assert the URL + token that were forwarded. Previously the tests only checked that `fetch_data is not None` — they would have passed even if rehydration returned a dummy closure. - New tests per adapter verify the SSRF guard rejects untrusted hosts and the allowlist accepts the intended Slack/Teams/GCP hosts. - New regression test in test_chat_faithful.py drives a Message- instance dequeue through the chat queue and asserts rehydrate_attachment still fires. Slack adapter connection pooling (deferred) - _fetch_slack_file still uses httpx.AsyncClient per call. The Slack adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient for Slack API calls), so adding one is a larger refactor left for a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hiness pass
- _coerce_attachments + Message.from_json{_compat} now preserve the
data: bytes | None field through rehydrate paths (was silently dropped)
- Close the mime_type truthiness fallback in types.py:517,591 that the
round-2 sweep missed
- Docstring note on rehydrate_attachment: must be sync
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c871b2b to
7224871
Compare
Per gemini-code-assist review on PR #83. Without the repo prefix, GitHub auto-links the upstream PR numbers to local PRs in chat-sdk-python, which collides with the local refs (#64, #66, #67, #74, #82) elsewhere in the file. Use vercel/chat#NNN so the upstream refs link correctly. https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Summary
Adapter.rehydrateAttachmenthook +Threadserialization integration (issue Portrehydrate_attachmentto Adapter protocol +_rehydrate_message#52).Attachmentgains a serializablefetch_metadata: dict[str, str] | Nonefield that survives JSON roundtrips and carries adapter-specific identifiers (Slackurl+teamId, Teamsurl, Google ChatresourceName+url, TelegramfileId, WhatsAppmediaId). Persisted throughMessage.to_json/from_json/from_json_compat.Adapterprotocol exposes an optionalrehydrate_attachment(attachment) -> Attachmenthook (no-op default onBaseAdapter).Chat._rehydrate_message(raw, adapter=None)now threads the active adapter and calls the hook on every attachment that lost itsfetch_dataclosure during the queue/debounce JSON roundtrip — matches upstreamadapter?.rehydrateAttachment?.(att)via duck-typing so adapters without the hook (e.g.MockAdapter) remain no-ops.Fidelity impact
Closes all 3
[concurrency: queue attachment rehydration]gaps inchat.test.ts:should call rehydrateAttachment on deserialized attachments missing fetchDatashould skip rehydration for attachments that already have fetchDatashould leave attachments unchanged when adapter has no rehydrateAttachmentverify_test_fidelity.py: 40 missing → 37 missing (no new gaps introduced).Test plan
uv run ruff check src/ tests/ scripts/uv run ruff format --check src/ tests/ scripts/uv run python scripts/audit_test_quality.py(0 hard failures)uv run python scripts/verify_test_fidelity.py(3[concurrency: queue attachment rehydration]gaps closed; no new gaps)uv run pytest tests/ --tb=short -q— 3564 passed, 2 skipped (was 3545 passed)get_installationstate.enqueuewithjson.loads(json.dumps(entry.message.to_json()))(mirrors upstream'svi.mocked(state.enqueue).mockImplementation).Notes
BaseAdapter.rehydrate_attachmentis a no-op (returns the attachment unchanged); adapters that want platform-specific rehydration override it.MockAdapterdoes not inheritBaseAdapterand therefore lacks the attribute by default, which preserves the TS behavior that tests rely on ("should leave attachments unchanged when adapter has no rehydrateAttachment").## UnreleasedinCHANGELOG.mdfor bundled0.4.26.2.src/chat_sdk/types.py(+37 lines, mostly nearAttachmentdataclass L249 andSerializedAttachmentL303) andsrc/chat_sdk/chat.py(+95 lines,_rehydrate_messageat L2041 plus_coerce_attachmentshelper). Kept edits surgical to minimize rebase pain with other in-flight PRs (PortonOptionsLoadhandler for dynamic select dropdown population #50, Teams adapter: port certificate-based mTLS authentication #58, Google Chat adapter: implement file upload support #59, Global handler-dispatch bound (reactions/actions/slash/modals) #61).🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
Documentation