Skip to content

feat(attachments): port rehydrateAttachment hook (#52)#67

Merged
patrick-chinchill merged 4 commits into
mainfrom
feat/rehydrate-attachment-52
Apr 24, 2026
Merged

feat(attachments): port rehydrateAttachment hook (#52)#67
patrick-chinchill merged 4 commits into
mainfrom
feat/rehydrate-attachment-52

Conversation

@patrick-chinchill
Copy link
Copy Markdown
Collaborator

@patrick-chinchill patrick-chinchill commented Apr 24, 2026

Summary

  • Port upstream Adapter.rehydrateAttachment hook + Thread serialization integration (issue Port rehydrate_attachment to Adapter protocol + _rehydrate_message #52).
  • Attachment gains a serializable fetch_metadata: dict[str, str] | None field that survives JSON roundtrips and carries adapter-specific identifiers (Slack url+teamId, Teams url, Google Chat resourceName+url, Telegram fileId, WhatsApp mediaId). Persisted through Message.to_json / from_json / from_json_compat.
  • Adapter protocol exposes an optional rehydrate_attachment(attachment) -> Attachment hook (no-op default on BaseAdapter). Chat._rehydrate_message(raw, adapter=None) now threads the active adapter and calls the hook on every attachment that lost its fetch_data closure during the queue/debounce JSON roundtrip — matches upstream adapter?.rehydrateAttachment?.(att) via duck-typing so adapters without the hook (e.g. MockAdapter) remain no-ops.
  • Per-adapter implementations land on Slack, Teams, Google Chat, Telegram, WhatsApp. Discord / GitHub / Linear intentionally do not implement it (upstream parity — those platforms either use public URLs or have no file attachments).

Fidelity impact

Closes all 3 [concurrency: queue attachment rehydration] gaps in chat.test.ts:

  • should call rehydrateAttachment on deserialized attachments missing fetchData
  • should skip rehydration for attachments that already have fetchData
  • should leave attachments unchanged when adapter has no rehydrateAttachment

verify_test_fidelity.py: 40 missing → 37 missing (no new gaps introduced).

Test plan

  • uv run ruff check src/ tests/ scripts/
  • uv run ruff format --check src/ tests/ scripts/
  • uv run python scripts/audit_test_quality.py (0 hard failures)
  • uv run python scripts/verify_test_fidelity.py (3 [concurrency: queue attachment rehydration] gaps closed; no new gaps)
  • uv run pytest tests/ --tb=short -q — 3564 passed, 2 skipped (was 3545 passed)
  • Slack multi-workspace rehydrate resolves per-team token via get_installation
  • JSON-roundtrip fidelity simulated via a test helper that wraps state.enqueue with json.loads(json.dumps(entry.message.to_json())) (mirrors upstream's vi.mocked(state.enqueue).mockImplementation).

Notes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Attachments now persist adapter-specific metadata so downloadable media are restored after queue/state roundtrips; per-adapter URL allowlists and workspace-specific token resolution protect and enable secure downloads across Google Chat, Slack, Teams, Telegram, and WhatsApp.
  • Tests

    • Added extensive tests covering attachment rehydration, URL allowlisting, workspace-token behavior, and serialization edge cases.
  • Documentation

    • Updated docs to describe serialization/recovery behavior and host allowlists.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 37809a15-b880-42e6-86ef-89365c65696b

📥 Commits

Reviewing files that changed from the base of the PR and between 26a08bd and 7224871.

📒 Files selected for processing (16)
  • CHANGELOG.md
  • docs/UPSTREAM_SYNC.md
  • src/chat_sdk/adapters/google_chat/adapter.py
  • src/chat_sdk/adapters/slack/adapter.py
  • src/chat_sdk/adapters/teams/adapter.py
  • src/chat_sdk/adapters/telegram/adapter.py
  • src/chat_sdk/adapters/whatsapp/adapter.py
  • src/chat_sdk/chat.py
  • src/chat_sdk/types.py
  • tests/test_chat_faithful.py
  • tests/test_google_chat_adapter.py
  • tests/test_slack_webhook.py
  • tests/test_teams_adapter.py
  • tests/test_telegram_adapter.py
  • tests/test_types.py
  • tests/test_whatsapp_webhook.py

📝 Walkthrough

Walkthrough

Adds attachment rehydration: attachments gain a serializable fetch_metadata field and adapters expose a rehydrate_attachment hook. Chat message rehydration calls adapter hooks to rebuild lazy download closures after JSON/queue serialization. Platform-specific implementations and tests added for Slack, Teams, Google Chat, Telegram, and WhatsApp.

Changes

Cohort / File(s) Summary
Documentation
CHANGELOG.md, docs/UPSTREAM_SYNC.md
Document new attachment rehydration behavior, SSRF/token-exfil allowlisting, and Python serialization non-parity notes.
Type System & Base Adapter
src/chat_sdk/types.py
Add Attachment.fetch_metadata; include in serialization; add BaseAdapter.rehydrate_attachment hook (no-op default).
Message Rehydration Core
src/chat_sdk/chat.py
Make _rehydrate_message adapter-aware, add _coerce_attachments, and invoke adapter rehydrate_attachment for attachments missing fetch_data.
Google Chat Adapter
src/chat_sdk/adapters/google_chat/adapter.py
Persist resourceName/url into fetch_metadata; extract download logic to _build_gchat_fetch_data; add _is_trusted_gchat_download_url; implement rehydrate_attachment with allowlist and auth handling.
Slack Adapter
src/chat_sdk/adapters/slack/adapter.py
Thread team_id into attachment creation; persist teamId+url in fetch_metadata; add _fetch_slack_file, _is_trusted_slack_download_url; implement rehydrate_attachment resolving workspace token via get_installation.
Teams Adapter
src/chat_sdk/adapters/teams/adapter.py
Persist contentUrl in fetch_metadata; create lazy fetch_data with _is_trusted_teams_download_url validation; add rehydrate_attachment.
Telegram Adapter
src/chat_sdk/adapters/telegram/adapter.py
Persist fileId in fetch_metadata; keep lazy fetch_data; add rehydrate_attachment to rebuild from fileId.
WhatsApp Adapter
src/chat_sdk/adapters/whatsapp/adapter.py
Persist mediaId in fetch_metadata; add rehydrate_attachment to rebuild lazy download closure.
Attachment & Rehydration Tests
tests/test_chat_faithful.py
Add queue/serialization integration tests that simulate JSON round-trips and assert adapter rehydration behavior across scenarios.
Adapter-Specific Tests
tests/test_google_chat_adapter.py, tests/test_slack_webhook.py, tests/test_teams_adapter.py, tests/test_telegram_adapter.py, tests/test_whatsapp_webhook.py
Add unit tests per adapter verifying fetch_metadata persistence, rehydration behavior, allowlist/SSRF rejection, token resolution, and fallbacks.
Serialization Tests
tests/test_types.py
Add tests asserting byte-preservation in-memory, omission on Message.to_json, and _coerce_attachments behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Queue
    participant Serializer
    participant Chat
    participant Adapter

    Client->>Queue: enqueue message (Attachment with fetch_data)
    Queue->>Serializer: persist (serialize to JSON) -- callables removed
    Serializer->>Chat: dequeue (JSON/dict)
    Chat->>Chat: _rehydrate_message(json, adapter)
    Chat->>Chat: _coerce_attachments() -> Attachment objects with fetch_metadata
    Chat->>Adapter: rehydrate_attachment(attachment) [if fetch_data missing]
    Adapter-->>Chat: attachment (fetch_data restored or unchanged)
    Chat->>Client: deliver rehydrated message
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hopped through JSON, metadata in paw,
fetch_metadata whispers where downloads draw,
Adapters mend bytes where callables fled,
Queues no longer leave attachments dead,
A merry rabbit bounces—downloads fed! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(attachments): port rehydrateAttachment hook (#52)' clearly and specifically identifies the main change: porting the rehydrateAttachment hook feature for attachments. It uses a conventional commit format, specifies the affected component (attachments), and concisely describes the action (port rehydrateAttachment hook).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/rehydrate-attachment-52

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4e9235ebb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/chat_sdk/chat.py Outdated
Comment on lines 2054 to 2055
if isinstance(raw, Message):
return raw
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Apply attachment rehydration to Message instances

_rehydrate_message returns immediately for Message inputs, which skips the new rehydrate_attachment hook entirely. This breaks queue/debounce rehydration for real persisted backends: both RedisStateAdapter.dequeue and PostgresStateAdapter.dequeue deserialize queued JSON into Message.from_json(...), so attachments arrive as Message objects with fetch_data already stripped. In those environments, attachments will remain non-downloadable after dequeue because the adapter hook is never reached.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 26a08bd — the P1 issue was a real Python-specific divergence: our Redis/Postgres dequeue returns Message instances (upstream returns raw JSON dicts), so upstream's early-return on Message silently skipped rehydration for every persistent-backend dequeue. Fix: removed the early-return, fall through to rehydrate pass. Documented in docs/UPSTREAM_SYNC.md.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements an attachment rehydration mechanism to restore download capabilities after JSON serialization. It introduces the Adapter.rehydrate_attachment hook and Attachment.fetch_metadata field, with specific implementations for the Slack, Teams, Google Chat, Telegram, and WhatsApp adapters. The core Chat class has been updated to utilize these hooks during message rehydration. Review feedback points out that the Teams and Slack adapters inefficiently create new HTTP clients for every download and suggests using pooled sessions instead.

Comment on lines +594 to +605
def _build_teams_fetch_data(self, url: str) -> Callable[[], Awaitable[bytes]]:
"""Build a lazy ``fetch_data`` closure for a Teams file URL."""

async def fetch_data() -> bytes:
import httpx

async with httpx.AsyncClient() as http:
resp = await http.get(url)
resp.raise_for_status()
return resp.content

return fetch_data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of _build_teams_fetch_data uses httpx.AsyncClient() as a context manager inside the closure, which creates and destroys a new connection pool for every file download. This is inefficient and inconsistent with the rest of the adapter, which uses a pooled aiohttp.ClientSession via the _get_http_session() helper.

Suggested change
def _build_teams_fetch_data(self, url: str) -> Callable[[], Awaitable[bytes]]:
"""Build a lazy ``fetch_data`` closure for a Teams file URL."""
async def fetch_data() -> bytes:
import httpx
async with httpx.AsyncClient() as http:
resp = await http.get(url)
resp.raise_for_status()
return resp.content
return fetch_data
def _build_teams_fetch_data(self, url: str) -> Callable[[], Awaitable[bytes]]:
"""Build a lazy ``fetch_data`` closure for a Teams file URL."""
async def fetch_data() -> bytes:
session = await self._get_http_session()
async with session.get(url) as resp:
resp.raise_for_status()
return await resp.read()
return fetch_data

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 26a08bd — Teams _build_teams_fetch_data now uses the pooled _get_http_session() helper (matches the pattern at teams/adapter.py:1268).

Comment on lines +1783 to +1794
import httpx

async with httpx.AsyncClient() as http:
resp = await http.get(url, headers={"Authorization": f"Bearer {token}"})
resp.raise_for_status()
content_type = resp.headers.get("content-type", "")
if "text/html" in content_type:
raise RuntimeError(
"Failed to download file from Slack: received HTML login page. "
'Ensure your Slack app has the "files:read" OAuth scope.'
)
return resp.content
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Creating a new httpx.AsyncClient for every file download is inefficient as it doesn't leverage connection pooling. Consider using a shared session or a global client instance for these requests, similar to how other adapters in this SDK manage HTTP sessions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deferred — Slack adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient for Slack API calls). Adding one is out of scope for this PR; tracked as a follow-up. Teams version is fixed in this PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (4)
tests/test_whatsapp_webhook.py (1)

700-710: Strengthen the rehydration test to validate callback wiring.

This currently only checks existence (fetch_data is not None). It should also verify the async callback calls download_media with the expected mediaId.

💡 Suggested test hardening
-from unittest.mock import MagicMock
+from unittest.mock import AsyncMock, MagicMock
...
-    def test_rehydrates_fetch_data_from_media_id(self):
+    `@pytest.mark.asyncio`
+    async def test_rehydrates_fetch_data_from_media_id(self):
         from chat_sdk.types import Attachment

         adapter = _make_adapter()
+        adapter.download_media = AsyncMock(return_value=b"ok")
         attachment = Attachment(
             type="image",
             fetch_metadata={"mediaId": "media-42"},
         )
         rehydrated = adapter.rehydrate_attachment(attachment)
         assert rehydrated.fetch_data is not None
+        data = await rehydrated.fetch_data()
+        assert data == b"ok"
+        adapter.download_media.assert_awaited_once_with("media-42")
         assert rehydrated.fetch_metadata == {"mediaId": "media-42"}

As per coding guidelines, tests/**/*.py: Every test must fail when the code is wrong, and use AsyncMock where async behavior is involved.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_whatsapp_webhook.py` around lines 700 - 710, Update the
test_rehydrates_fetch_data_from_media_id to assert the async callback returned
in adapter.rehydrate_attachment actually calls the adapter.download_media with
the expected mediaId: replace the loose existence check of rehydrated.fetch_data
with an AsyncMock for adapter.download_media, obtain the async callback from
rehydrated.fetch_data, await/call it, and assert adapter.download_media was
awaited/called once with "media-42"; keep references to Attachment,
adapter.rehydrate_attachment, and adapter.download_media so the test fails if
the wiring is broken.
tests/test_telegram_adapter.py (1)

381-393: Tighten rehydration assertion by executing the async callback.

Right now it only proves callback presence. Execute fetch_data() with an AsyncMock downloader to prove fileId wiring is correct.

💡 Suggested test hardening
+from unittest.mock import AsyncMock
...
-    def test_rehydrate_attachment_uses_file_id_from_fetch_metadata(self):
+    `@pytest.mark.asyncio`
+    async def test_rehydrate_attachment_uses_file_id_from_fetch_metadata(self):
         from chat_sdk.types import Attachment

         adapter = _make_adapter()
+        adapter.download_file = AsyncMock(return_value=b"ok")
         attachment = Attachment(
             type="image",
             fetch_metadata={"fileId": "AgACAgIAAxkB"},
         )
         rehydrated = adapter.rehydrate_attachment(attachment)
         assert rehydrated.fetch_data is not None
+        data = await rehydrated.fetch_data()
+        assert data == b"ok"
+        adapter.download_file.assert_awaited_once_with("AgACAgIAAxkB")
         # fetch_metadata is preserved so the attachment stays serializable/rehydratable again.
         assert rehydrated.fetch_metadata == {"fileId": "AgACAgIAAxkB"}

As per coding guidelines, tests/**/*.py: Every test must fail when the code is wrong, and use AsyncMock where async behavior is involved.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_telegram_adapter.py` around lines 381 - 393, Update the
test_rehydrate_attachment_uses_file_id_from_fetch_metadata to execute the async
fetch callback: create an AsyncMock downloader that returns a sentinel result
and inject or patch it into the adapter instance used by
adapter.rehydrate_attachment; call rehydrated.fetch_data() (await it) and assert
the AsyncMock was awaited with the fileId "AgACAgIAAxkB" and that the awaited
call returns the sentinel result, while still asserting
rehydrated.fetch_metadata == {"fileId": "AgACAgIAAxkB"} to ensure
serializability.
src/chat_sdk/chat.py (1)

2172-2172: Avoid truthiness fallbacks in attachment key coercion.

On Line 2172 and Line 2176, or can incorrectly ignore valid falsy values from the camelCase field. Use explicit is not None fallback here.

💡 Proposed fix
-                    mime_type=att.get("mimeType") or att.get("mime_type"),
+                    mime_type=(
+                        att.get("mimeType")
+                        if att.get("mimeType") is not None
+                        else att.get("mime_type")
+                    ),
@@
-                    fetch_metadata=att.get("fetchMetadata") or att.get("fetch_metadata"),
+                    fetch_metadata=(
+                        att.get("fetchMetadata")
+                        if att.get("fetchMetadata") is not None
+                        else att.get("fetch_metadata")
+                    ),
As per coding guidelines: "Use `x if x is not None else default` instead of `x or default` to avoid truthiness traps when porting from TypeScript."

Also applies to: 2176-2176

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/chat_sdk/chat.py` at line 2172, The attachment key coercion uses truthy
fallback (e.g., att.get("mimeType") or att.get("mime_type")) which can drop
valid falsy values; update the logic in the attachment construction (around the
code that references att.get("mimeType") and the occurrence near Line 2176) to
use explicit None-check fallbacks like "value_if_camelcase if value_if_camelcase
is not None else value_if_snakecase" so the camelCase field is used when present
even if falsy.
src/chat_sdk/adapters/google_chat/adapter.py (1)

2685-2687: Use explicit None fallback and preserve the resolved URL.

meta.get("url") or attachment.url has truthiness pitfalls, and reconstructing with url=attachment.url drops the resolved fallback URL.

💡 Proposed fix
-        meta = attachment.fetch_metadata or {}
+        meta = attachment.fetch_metadata if attachment.fetch_metadata is not None else {}
         resource_name = meta.get("resourceName")
-        url = meta.get("url") or attachment.url
+        meta_url = meta.get("url")
+        url = meta_url if meta_url is not None else attachment.url
@@
-            url=attachment.url,
+            url=url,
As per coding guidelines: "Use `x if x is not None else default` instead of `x or default` to avoid truthiness traps when porting from TypeScript."

Also applies to: 2692-2692

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/chat_sdk/adapters/google_chat/adapter.py` around lines 2685 - 2687, The
code uses truthy fallback which can drop valid falsy URLs and later reconstructs
the URL from attachment.url; change the URL resolution to use explicit
None-checking: retrieve meta via attachment.fetch_metadata or {}, keep
resource_name from meta.get("resourceName") as-is, and set url = meta.get("url")
if meta.get("url") is not None else attachment.url so the resolved URL is
preserved wherever `url` is used (also apply the same explicit None fallback at
the other occurrence mentioned around line 2692).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/chat_sdk/adapters/google_chat/adapter.py`:
- Around line 2660-2666: The code performs an authenticated GET to a
metadata-provided url variable using adapter._get_access_token() and
adapter._get_http_session() without validating the URL, which can leak tokens
via SSRF; before calling session.get(...) validate the url by parsing it
(scheme/netloc), enforce allowed schemes (https only), verify the
hostname/netloc against an allowlist or known metadata hostnames and reject or
normalize IPs (disallow private/reserved addresses), and refuse requests with
redirects to untrusted hosts; only after these checks call
adapter._get_http_session().get(...) with the bearer token and handle rejection
by logging and skipping the fetch.

In `@src/chat_sdk/adapters/slack/adapter.py`:
- Around line 1776-1825: The rehydrate_attachment closure rebuilds fetch_data
using an unvalidated URL and then calls _fetch_slack_file which blindly sends
the bot token — this allows a tampered attachment to exfiltrate tokens. Before
using url in rehydrate_attachment/fetch_data, parse and validate it (ensure
scheme is https and host is a trusted Slack host such as files.slack.com,
slack.com, or other allowed Slack workspace domains), and if it fails validation
raise an error or return the attachment unchanged; only call _fetch_slack_file
when the URL passes validation. Update rehydrate_attachment, the inner
fetch_data closure, and any call sites that accept fetch_metadata["url"] to
perform this whitelist validation and explicitly reference functions
rehydrate_attachment, fetch_data, and _fetch_slack_file when making the change.

In `@src/chat_sdk/adapters/teams/adapter.py`:
- Around line 594-629: rehydrate_attachment currently trusts
fetch_metadata["url"]/attachment.url and passes it to _build_teams_fetch_data,
enabling SSRF; validate the URL before rebuilding the closure by parsing it
(e.g., urllib.parse), ensuring scheme is http or https, enforcing allowed host
patterns (e.g., Microsoft Graph/expected domains) or performing a DNS
resolution/IP check to reject private/reserved addresses, and if validation
fails return the original attachment unchanged; implement the validation logic
in rehydrate_attachment (or a helper called from it) and only call
_build_teams_fetch_data(url) when the URL passes these checks.

In `@src/chat_sdk/adapters/telegram/adapter.py`:
- Line 1379: The current assignment uses truthiness fallback for
attachment.fetch_metadata which can mis-handle falsy but valid values; change
the meta assignment to use an explicit None check: set meta =
attachment.fetch_metadata if attachment.fetch_metadata is not None else {} so
you reference attachment.fetch_metadata and assign to meta without using `or` to
avoid truthiness traps.

In `@src/chat_sdk/adapters/whatsapp/adapter.py`:
- Line 646: The current assignment uses truthiness fallback which can mis-handle
falsy but valid metadata; change the expression that sets meta to use an
explicit None check instead (i.e., assign meta = attachment.fetch_metadata if
attachment.fetch_metadata is not None else {}) so that attachment.fetch_metadata
is used when present even if falsy; update the code around the variable meta in
the adapter.py function/method where meta is defined to reference
attachment.fetch_metadata explicitly rather than relying on `or` fallback.

In `@src/chat_sdk/chat.py`:
- Around line 2105-2107: The list comprehension loses type-safety because
rehydrate is untyped from getattr; declare rehydrate with an explicit type
(e.g., rehydrate: Optional[Callable[[Attachment], Attachment]] =
getattr(adapter, "rehydrate_attachment", None)) and update the comprehension to
guard by isinstance and fetch_data so only Attachment objects are passed to
rehydrate (for example: msg.attachments = [att if not isinstance(att,
Attachment) or att.fetch_data is not None else rehydrate(att) for att in
msg.attachments] while keeping the existing callable check), ensuring
msg.attachments remains a list[Attachment] and non-Attachment objects cannot
leak in.

In `@src/chat_sdk/types.py`:
- Line 521: The dual-key lookup using "or" collapses valid empty values (e.g.,
{}) to the alternate key; update occurrences where fetch_metadata is assigned
from att (the expression at the fetch_metadata assignment in types.py and the
similar occurrence later) to use a None-aware conditional: check the primary
key's value and if it is not None use it, otherwise use the secondary key —
i.e., replace the "att.get('fetchMetadata') or att.get('fetch_metadata')"
pattern with a conditional that returns att.get('fetchMetadata') if it is not
None else att.get('fetch_metadata'), and do the same for the other occurrence
mentioned.

In `@tests/test_teams_adapter.py`:
- Around line 355-375: The tests currently only check that
adapter.rehydrate_attachment(...) returns a non-None callable; update both tests
to stub the fetch path with an AsyncMock, call
adapter.rehydrate_attachment(attachment), await rehydrated.fetch_data(), and
assert the awaited result (e.g., returned bytes or the URL) matches the expected
value so the test fails if rehydration wiring is wrong; reference the
adapter.rehydrate_attachment function, the Attachment instances, and the
rehydrated.fetch_data async callable when making these changes.

---

Nitpick comments:
In `@src/chat_sdk/adapters/google_chat/adapter.py`:
- Around line 2685-2687: The code uses truthy fallback which can drop valid
falsy URLs and later reconstructs the URL from attachment.url; change the URL
resolution to use explicit None-checking: retrieve meta via
attachment.fetch_metadata or {}, keep resource_name from
meta.get("resourceName") as-is, and set url = meta.get("url") if meta.get("url")
is not None else attachment.url so the resolved URL is preserved wherever `url`
is used (also apply the same explicit None fallback at the other occurrence
mentioned around line 2692).

In `@src/chat_sdk/chat.py`:
- Line 2172: The attachment key coercion uses truthy fallback (e.g.,
att.get("mimeType") or att.get("mime_type")) which can drop valid falsy values;
update the logic in the attachment construction (around the code that references
att.get("mimeType") and the occurrence near Line 2176) to use explicit
None-check fallbacks like "value_if_camelcase if value_if_camelcase is not None
else value_if_snakecase" so the camelCase field is used when present even if
falsy.

In `@tests/test_telegram_adapter.py`:
- Around line 381-393: Update the
test_rehydrate_attachment_uses_file_id_from_fetch_metadata to execute the async
fetch callback: create an AsyncMock downloader that returns a sentinel result
and inject or patch it into the adapter instance used by
adapter.rehydrate_attachment; call rehydrated.fetch_data() (await it) and assert
the AsyncMock was awaited with the fileId "AgACAgIAAxkB" and that the awaited
call returns the sentinel result, while still asserting
rehydrated.fetch_metadata == {"fileId": "AgACAgIAAxkB"} to ensure
serializability.

In `@tests/test_whatsapp_webhook.py`:
- Around line 700-710: Update the test_rehydrates_fetch_data_from_media_id to
assert the async callback returned in adapter.rehydrate_attachment actually
calls the adapter.download_media with the expected mediaId: replace the loose
existence check of rehydrated.fetch_data with an AsyncMock for
adapter.download_media, obtain the async callback from rehydrated.fetch_data,
await/call it, and assert adapter.download_media was awaited/called once with
"media-42"; keep references to Attachment, adapter.rehydrate_attachment, and
adapter.download_media so the test fails if the wiring is broken.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c7d147da-d63d-4559-962b-b99fa823f6f6

📥 Commits

Reviewing files that changed from the base of the PR and between d23b6d9 and a4e9235.

📒 Files selected for processing (14)
  • CHANGELOG.md
  • src/chat_sdk/adapters/google_chat/adapter.py
  • src/chat_sdk/adapters/slack/adapter.py
  • src/chat_sdk/adapters/teams/adapter.py
  • src/chat_sdk/adapters/telegram/adapter.py
  • src/chat_sdk/adapters/whatsapp/adapter.py
  • src/chat_sdk/chat.py
  • src/chat_sdk/types.py
  • tests/test_chat_faithful.py
  • tests/test_google_chat_adapter.py
  • tests/test_slack_webhook.py
  • tests/test_teams_adapter.py
  • tests/test_telegram_adapter.py
  • tests/test_whatsapp_webhook.py

Comment thread src/chat_sdk/adapters/google_chat/adapter.py
Comment thread src/chat_sdk/adapters/slack/adapter.py
Comment thread src/chat_sdk/adapters/teams/adapter.py
Comment thread src/chat_sdk/adapters/telegram/adapter.py Outdated
Comment thread src/chat_sdk/adapters/whatsapp/adapter.py Outdated
Comment thread src/chat_sdk/chat.py Outdated
Comment thread src/chat_sdk/types.py Outdated
Comment thread tests/test_slack_webhook.py
Comment thread tests/test_teams_adapter.py Outdated
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
…ness fallbacks

Second review pass on PR #67 (rehydrate_attachment). The previous fixup
addressed pyrefly only — this commit resolves the remaining review
feedback.

SSRF guards (3 adapters)
- Slack, Teams, Google Chat all rebuild fetch_data closures from
  serialized fetch_metadata["url"] in rehydrate_attachment. A tampered
  URL in persisted queue state could exfiltrate the workspace
  bot/OAuth token to an attacker-controlled host. Each adapter now
  validates the URL's scheme (https only) and host against a
  platform-specific allowlist before forwarding the auth header.
  Upstream TS does not validate; this is a Python-first divergence
  documented in docs/UPSTREAM_SYNC.md.
    - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com
    - Teams: Microsoft-owned hosts (graph.microsoft.com,
      smba.trafficmanager.net, *.sharepoint.com, *.botframework.com,
      *.office.com, attachments.office.net, …)
    - Google Chat: chat.googleapis.com, *.googleapis.com,
      *.googleusercontent.com, *.google.com

Message-instance rehydration (P1)
- Chat._rehydrate_message used to early-return on Message inputs,
  matching upstream TS's `raw instanceof Message` shortcut. That
  shortcut is safe in upstream because its state adapters return raw
  JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter
  both upgrade the dequeued dict to `Message.from_json(...)` before
  returning, so the early return would skip rehydrate_attachment for
  every persistent-backend dequeue and leave fetch_data stripped. We
  now fall through and apply the rehydrate pass on Message inputs too
  (already-hydrated attachments with fetch_data are filtered out).

Truthiness fallbacks (Port Rule #1)
- telegram, whatsapp rehydrate_attachment and types.py dual-key
  fetch_metadata lookup now use explicit `is not None` instead of
  `or`, so an empty-dict fetch_metadata is preserved.

Teams connection pooling
- _build_teams_fetch_data used httpx.AsyncClient as a throwaway
  context manager per download. Refactored to use the shared
  aiohttp session (_get_http_session) that the rest of the adapter
  already goes through.

Test hardening
- test_slack_webhook.py and test_teams_adapter.py now stub the fetch
  path with AsyncMock, await rehydrated.fetch_data(), and assert the
  URL + token that were forwarded. Previously the tests only checked
  that `fetch_data is not None` — they would have passed even if
  rehydration returned a dummy closure.
- New tests per adapter verify the SSRF guard rejects untrusted hosts
  and the allowlist accepts the intended Slack/Teams/GCP hosts.
- New regression test in test_chat_faithful.py drives a Message-
  instance dequeue through the chat queue and asserts
  rehydrate_attachment still fires.

Slack adapter connection pooling (deferred)
- _fetch_slack_file still uses httpx.AsyncClient per call. The Slack
  adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient
  for Slack API calls), so adding one is a larger refactor left for
  a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@patrick-chinchill
Copy link
Copy Markdown
Collaborator Author

Review verdict: LGTM. Refreshed and reviewed latest head c871b2b against main. No issues found in attachment rehydration or provider download safety paths. Verification: targeted adapter/chat/type coverage passed (349 passed, 2 skipped) and ruff passed on touched files. Formal GitHub approval is blocked because the authenticated account owns this PR.

patrick-chinchill and others added 4 commits April 24, 2026 03:04
…ialization

Upstream Adapter.rehydrateAttachment rebuilds the fetch_data download
closure after a JSON roundtrip through the state adapter — essential for
queue/debounce concurrency strategies, where entries pass through
JSON.stringify and lose any callable fields. This PR ports the hook to
Python: Adapter gains an optional rehydrate_attachment method (default
no-op on BaseAdapter), Attachment gains a serializable fetch_metadata
dict, and Chat._rehydrate_message now threads the active adapter and
invokes the hook on any attachment whose fetch_data was stripped.
Per-adapter implementations land on Slack (url + teamId), Teams (url),
Google Chat (resourceName + url), Telegram (fileId), and WhatsApp
(mediaId); Discord, GitHub and Linear intentionally do not implement it
(upstream parity — they use public URLs or have no file attachments).

Closes 3 [concurrency: queue attachment rehydration] fidelity gaps.
Refs #52.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- chat.py:2107 — type the rehydrate-attachment callable so the list
  comprehension narrows to list[Attachment]. Unblocks CI.
- _coerce_attachments: replace `or` fallbacks with `is not None`
  (Port Rule #1 truthiness trap)
- google_chat rehydrate_attachment: preserve resolved URL when
  reconstructing, drop truthiness fallback on meta["url"]
- Harden telegram and whatsapp rehydrate tests to execute the async
  callback and verify download-method wiring (AsyncMock).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ness fallbacks

Second review pass on PR #67 (rehydrate_attachment). The previous fixup
addressed pyrefly only — this commit resolves the remaining review
feedback.

SSRF guards (3 adapters)
- Slack, Teams, Google Chat all rebuild fetch_data closures from
  serialized fetch_metadata["url"] in rehydrate_attachment. A tampered
  URL in persisted queue state could exfiltrate the workspace
  bot/OAuth token to an attacker-controlled host. Each adapter now
  validates the URL's scheme (https only) and host against a
  platform-specific allowlist before forwarding the auth header.
  Upstream TS does not validate; this is a Python-first divergence
  documented in docs/UPSTREAM_SYNC.md.
    - Slack: files.slack.com, slack.com, *.slack.com, *.slack-edge.com
    - Teams: Microsoft-owned hosts (graph.microsoft.com,
      smba.trafficmanager.net, *.sharepoint.com, *.botframework.com,
      *.office.com, attachments.office.net, …)
    - Google Chat: chat.googleapis.com, *.googleapis.com,
      *.googleusercontent.com, *.google.com

Message-instance rehydration (P1)
- Chat._rehydrate_message used to early-return on Message inputs,
  matching upstream TS's `raw instanceof Message` shortcut. That
  shortcut is safe in upstream because its state adapters return raw
  JSON dicts from dequeue. Our RedisStateAdapter / PostgresStateAdapter
  both upgrade the dequeued dict to `Message.from_json(...)` before
  returning, so the early return would skip rehydrate_attachment for
  every persistent-backend dequeue and leave fetch_data stripped. We
  now fall through and apply the rehydrate pass on Message inputs too
  (already-hydrated attachments with fetch_data are filtered out).

Truthiness fallbacks (Port Rule #1)
- telegram, whatsapp rehydrate_attachment and types.py dual-key
  fetch_metadata lookup now use explicit `is not None` instead of
  `or`, so an empty-dict fetch_metadata is preserved.

Teams connection pooling
- _build_teams_fetch_data used httpx.AsyncClient as a throwaway
  context manager per download. Refactored to use the shared
  aiohttp session (_get_http_session) that the rest of the adapter
  already goes through.

Test hardening
- test_slack_webhook.py and test_teams_adapter.py now stub the fetch
  path with AsyncMock, await rehydrated.fetch_data(), and assert the
  URL + token that were forwarded. Previously the tests only checked
  that `fetch_data is not None` — they would have passed even if
  rehydration returned a dummy closure.
- New tests per adapter verify the SSRF guard rejects untrusted hosts
  and the allowlist accepts the intended Slack/Teams/GCP hosts.
- New regression test in test_chat_faithful.py drives a Message-
  instance dequeue through the chat queue and asserts
  rehydrate_attachment still fires.

Slack adapter connection pooling (deferred)
- _fetch_slack_file still uses httpx.AsyncClient per call. The Slack
  adapter has no pooled aiohttp helper (only slack_sdk.AsyncWebClient
  for Slack API calls), so adding one is a larger refactor left for
  a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…hiness pass

- _coerce_attachments + Message.from_json{_compat} now preserve the
  data: bytes | None field through rehydrate paths (was silently dropped)
- Close the mime_type truthiness fallback in types.py:517,591 that the
  round-2 sweep missed
- Docstring note on rehydrate_attachment: must be sync

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@patrick-chinchill patrick-chinchill force-pushed the feat/rehydrate-attachment-52 branch from c871b2b to 7224871 Compare April 24, 2026 10:05
@patrick-chinchill patrick-chinchill merged commit 57de87f into main Apr 24, 2026
9 of 10 checks passed
patrick-chinchill added a commit that referenced this pull request Apr 24, 2026
Parity catch-up release for upstream chat@4.26.0. Bundles 8 PRs (#64 #65 #66 #67 #73 #74 #75 #76) + small followup cleanups. See CHANGELOG.md for details.
patrick-chinchill pushed a commit that referenced this pull request May 7, 2026
Per gemini-code-assist review on PR #83. Without the repo prefix, GitHub
auto-links the upstream PR numbers to local PRs in chat-sdk-python, which
collides with the local refs (#64, #66, #67, #74, #82) elsewhere in the
file. Use vercel/chat#NNN so the upstream refs link correctly.

https://claude.ai/code/session_01FyMxQn2BEAzmwKS1GZczKj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant