Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## Unreleased

<<<<<<< HEAD
Parity catch-up with upstream `4.26.0`. No upstream version change.

### New public APIs
Expand Down Expand Up @@ -40,6 +41,17 @@ Parity catch-up with upstream `4.26.0`. No upstream version change.
`PostableObject` wrapping an async iterable with platform-specific
streaming options (`group_tasks`, `end_with`, `update_interval_ms`).
Mirrors upstream `streaming-plan.ts`. Issue #56.
- **`Adapter.rehydrate_attachment` hook + `Attachment.fetch_metadata`**:
port of upstream's `rehydrateAttachment` hook. `Chat._rehydrate_message`
invokes the hook on every attachment that lost its `fetch_data` closure
during a JSON roundtrip (queue / debounce / persistent state). The new
serializable `fetch_metadata: dict[str, str] | None` field persists
adapter-specific identifiers (Slack `url` + `teamId`, Teams `url`,
Google Chat `resourceName` + `url`, Telegram `fileId`, WhatsApp
`mediaId`). Implementations land on Slack, Teams, Google Chat, Telegram,
and WhatsApp. Each rehydrate closure validates the target URL against a
per-adapter allowlist before forwarding the auth token (SSRF defense).
Closes #52.

### Upstream parity

Expand Down
3 changes: 3 additions & 0 deletions docs/UPSTREAM_SYNC.md
Original file line number Diff line number Diff line change
Expand Up @@ -460,6 +460,8 @@ stay explicit instead of being rediscovered in code review.
| `ConcurrencyConfig.max_concurrent` | Enforced via `asyncio.Semaphore` in the `"concurrent"` strategy path; rejects non-integer or `<= 0` values, and rejects any non-`None` `max_concurrent` paired with a non-`"concurrent"` strategy | Accepted into the config type with docstring "Default: Infinity" but never read (3 writes, 0 reads) | Silent correctness bug upstream — consumers setting `max_concurrent=N` with `strategy="concurrent"` reasonably expect an N-way bound on in-flight handlers. We honor the documented contract via a semaphore and fail-fast on misconfiguration so it's never silent. `max_concurrent=None` stays compatible with every strategy (unbounded default). |
| Redis lock token format | `{token_prefix}_{ms}_{secrets.token_hex(16)}` — always 32 hex chars, CSPRNG-sourced | `ioredis_${Date.now()}_${Math.random().toString(36).substring(2, 15)}` — base36, ≤13 chars, **not** CSPRNG | Interop via `IoRedisStateAdapter(token_prefix="ioredis")` still works for lock-release (release/extend compare by full-string equality, and each runtime only releases what it issued), but the token byte-shape diverges. Intentional — CSPRNG should not be regressed to `Math.random()` for cosmetic byte-for-byte compatibility. |
| `StreamingPlan.is_supported()` / `get_fallback_text()` | Raise `RuntimeError` to fail loudly if a generic posting path (e.g. `ChannelImpl.post`, `post_postable_object`) tries to consume a `StreamingPlan` as a normal `PostableObject` | Silently return `True` / `""` — `ChannelImpl.post` would route through `postPostableObject` and post an empty-string fallback | Prevents `StreamingPlan` being silently routed through non-stream-aware posting paths where upstream would post a blank message or attempt a wrong-shape `adapter.post_object("stream", ...)` call. Internal dispatch is guarded by the `kind == "stream"` short-circuit in `post_postable_object` / `Thread.post`; this also protects third-party code that duck-types PostableObjects. |
| `rehydrate_attachment` URL allowlist (Slack / Teams / Google Chat) | Validates the downloaded URL's scheme + host against a per-adapter allowlist inside the fetch closure; raises `ValidationError` on untrusted hosts before forwarding bearer tokens | No validation — `fetchData` blindly GETs `fetchMetadata.url` and forwards the workspace/bot token | SSRF + token-exfil risk upstream: after the 4.26 `rehydrateAttachment` hook lands, a crafted `fetchMetadata` in persisted state can redirect auth'd downloads to an arbitrary host. Python port enforces `CLAUDE.md`'s "Validate external URLs before requests (SSRF)" rule. Allowlist: Slack = `{files.slack.com, slack.com, *.slack.com, *.slack-edge.com}`; Teams = `{smba.trafficmanager.net, graph.microsoft.com, attachments.office.net, *.botframework.com, *.graph.microsoft.com, *.sharepoint.com, *.officeapps.live.com, *.office.com, *.office365.com, *.onedrive.com, *.microsoft.com}`; Google Chat = `{chat.googleapis.com, googleapis.com, *.googleapis.com, *.googleusercontent.com, *.google.com}`. |
| `_rehydrate_message` with `Message` input | Falls through to the `rehydrate_attachment` pass even when the dequeued entry is already a `Message` instance | Early-returns on `raw instanceof Message` before rehydration | The Python port's Redis + Postgres `dequeue()` upgrade raw JSON to `Message.from_json(...)` before returning (upstream's dequeue returns the raw JSON.parse'd dict). Upstream's `instanceof Message` shortcut therefore only fires for in-memory state, but ours would fire for persistent backends too, leaving `fetch_data` stripped forever. The rehydrate pass still skips any attachment that already has `fetch_data`, so in-memory callers pay no cost. |

### Platform-specific gaps

Expand All @@ -478,6 +480,7 @@ stay explicit instead of being rediscovered in code review.
| `from_json()` | Accepts both camelCase and snake_case | camelCase only |
| Slack installation keys | camelCase (matches TS, with snake_case fallback) | camelCase |
| Redis/Postgres queue entries | Different wire format (message serialized via `to_json()`) | `JSON.stringify(entry)` directly |
| `Attachment.data` (bytes) | Not serialized by `to_json()` (bytes aren't JSON-safe). Preserved through in-memory rehydrate paths (`_coerce_attachments`, `Message.from_json{_compat}`) when raw dicts carry the field. A JSON roundtrip through Redis/Postgres state drops `data`; adapters should rely on `fetch_metadata` + `rehydrate_attachment` to reconstruct the download closure instead. | Same — `data` is not part of `SerializedAttachment` |

### Coverage confidence by module

Expand Down
155 changes: 117 additions & 38 deletions src/chat_sdk/adapters/google_chat/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from collections.abc import AsyncIterable, Awaitable, Callable
from datetime import datetime, timezone
from typing import Any, NoReturn
from urllib.parse import urlparse

from chat_sdk.adapters.google_chat.cards import card_to_google_card
from chat_sdk.adapters.google_chat.format_converter import GoogleChatFormatConverter
Expand Down Expand Up @@ -2612,53 +2613,131 @@ def _create_attachment(self, att: dict[str, Any]) -> Attachment:
elif content_type.startswith("audio/"):
att_type = "audio"

# Build fetchData closure
fetch_meta: dict[str, str] = {}
if resource_name:
fetch_meta["resourceName"] = resource_name
if url:
fetch_meta["url"] = url

fetch_data: Callable[[], Awaitable[bytes]] | None = None
if resource_name or url:
adapter = self

async def _fetch_data() -> bytes:
# Prefer media.download API
if resource_name:
token = await adapter._get_access_token()
download_url = f"https://chat.googleapis.com/v1/media/{resource_name}?alt=media"
session = await adapter._get_http_session()
async with session.get(
download_url,
headers={"Authorization": f"Bearer {token}"},
) as response:
if response.status >= 400:
raise NetworkError(
"gchat",
f"Failed to download media: {response.status}",
)
return await response.read()

# Fallback to direct URL fetch (downloadUri)
if url:
token = await adapter._get_access_token()
session = await adapter._get_http_session()
async with session.get(
url,
headers={"Authorization": f"Bearer {token}"},
) as response:
if response.status >= 400:
raise NetworkError(
"gchat",
f"Failed to fetch file: {response.status}",
)
return await response.read()

raise AuthenticationError("gchat", "Cannot fetch file: no URL or resource name")

fetch_data = _fetch_data
fetch_data = self._build_gchat_fetch_data(resource_name, url)

return Attachment(
type=att_type, # type: ignore[arg-type]
url=url,
name=att.get("contentName"),
mime_type=att.get("contentType"),
fetch_data=fetch_data,
fetch_metadata=fetch_meta or None,
)

@staticmethod
def _is_trusted_gchat_download_url(url: str) -> bool:
"""Gate Google Chat attachment downloads to Google-owned hosts.

After ``rehydrate_attachment`` reconstructs the fetch closure
from serialized ``fetch_metadata``, the URL may have been
tampered with in the state store. We refuse to forward the
OAuth access token unless the host is a known Google-owned host.

This is a Python-first divergence: upstream Google Chat adapter
does not validate the URL. See ``docs/UPSTREAM_SYNC.md`` Known
Non-Parity.
"""
try:
parsed = urlparse(url)
except (ValueError, TypeError):
return False
if parsed.scheme != "https":
return False
host = (parsed.hostname or "").lower()
if not host:
return False
allowed_suffixes = (
".googleapis.com",
".googleusercontent.com",
".google.com",
)
if host.endswith(allowed_suffixes):
return True
return host in {"chat.googleapis.com", "googleapis.com"}

def _build_gchat_fetch_data(
self,
resource_name: str | None,
url: str | None,
) -> Callable[[], Awaitable[bytes]]:
"""Build a lazy ``fetch_data`` closure for a Google Chat attachment."""
adapter = self

async def _fetch_data() -> bytes:
# Prefer media.download API
if resource_name:
token = await adapter._get_access_token()
download_url = f"https://chat.googleapis.com/v1/media/{resource_name}?alt=media"
session = await adapter._get_http_session()
async with session.get(
download_url,
headers={"Authorization": f"Bearer {token}"},
) as response:
if response.status >= 400:
raise NetworkError(
"gchat",
f"Failed to download media: {response.status}",
)
return await response.read()

# Fallback to direct URL fetch (downloadUri). Validate the
# host before forwarding the OAuth token — the URL may have
# been rebuilt from serialized metadata and tampered with.
if url:
if not adapter._is_trusted_gchat_download_url(url):
raise ValidationError(
"gchat",
f"Refusing to fetch Google Chat file from untrusted URL: {url}",
)
token = await adapter._get_access_token()
session = await adapter._get_http_session()
async with session.get(
url,
headers={"Authorization": f"Bearer {token}"},
) as response:
Comment thread
coderabbitai[bot] marked this conversation as resolved.
if response.status >= 400:
raise NetworkError(
"gchat",
f"Failed to fetch file: {response.status}",
)
return await response.read()

raise AuthenticationError("gchat", "Cannot fetch file: no URL or resource name")

return _fetch_data

def rehydrate_attachment(self, attachment: Attachment) -> Attachment:
"""Reconstruct ``fetch_data`` on a deserialized Google Chat attachment.

Pulls ``resourceName`` (preferred, used with media.download API) and
``url`` (fallback) from ``fetch_metadata``. Returns the attachment
unchanged when neither identifier is present.
"""
meta = attachment.fetch_metadata if attachment.fetch_metadata is not None else {}
resource_name = meta.get("resourceName")
meta_url = meta.get("url")
url = meta_url if meta_url is not None else attachment.url
if resource_name is None and url is None:
return attachment
return Attachment(
type=attachment.type,
url=url,
name=attachment.name,
mime_type=attachment.mime_type,
size=attachment.size,
width=attachment.width,
height=attachment.height,
data=attachment.data,
fetch_data=self._build_gchat_fetch_data(resource_name, url),
fetch_metadata=attachment.fetch_metadata,
)

# =========================================================================
Expand Down
Loading
Loading