From a94deec2f5dea309ff2dec55fdd273514983d890 Mon Sep 17 00:00:00 2001 From: patrick-chinchill Date: Mon, 6 Apr 2026 17:46:32 -0700 Subject: [PATCH] docs: Add comprehensive project documentation Add five documentation files covering architecture, upstream sync procedures, security model, testing strategy, and design decisions. Update README.md with a Documentation section linking to all docs. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 12 ++ docs/ARCHITECTURE.md | 374 ++++++++++++++++++++++++++++++++++++++++++ docs/DECISIONS.md | 125 ++++++++++++++ docs/SECURITY.md | 208 +++++++++++++++++++++++ docs/TESTING.md | 256 +++++++++++++++++++++++++++++ docs/UPSTREAM_SYNC.md | 258 +++++++++++++++++++++++++++++ 6 files changed, 1233 insertions(+) create mode 100644 docs/ARCHITECTURE.md create mode 100644 docs/DECISIONS.md create mode 100644 docs/SECURITY.md create mode 100644 docs/TESTING.md create mode 100644 docs/UPSTREAM_SYNC.md diff --git a/README.md b/README.md index 744d2bc..5627ad9 100644 --- a/README.md +++ b/README.md @@ -72,6 +72,18 @@ async def handle_mention(thread, message): | State abstraction (mem/redis/pg) | Built-in | DIY | DIY | | Drop down to native SDK | Yes | N/A | Partially | +## Documentation + +| Document | Description | +|----------|-------------| +| [Architecture](docs/ARCHITECTURE.md) | Module dependency graph, adapter protocol, card system, concurrency strategies, state backends, markdown pipeline, streaming pipeline | +| [Upstream Sync](docs/UPSTREAM_SYNC.md) | How to keep the Python port in sync with the Vercel Chat TS SDK, translation patterns, known footguns | +| [Security](docs/SECURITY.md) | Webhook verification per platform, SSRF protections, crypto details, known limitations, production audit checklist | +| [Testing](docs/TESTING.md) | Test categories, how to run tests, how to add adapter tests, coverage gaps, dispatch key validation | +| [Design Decisions](docs/DECISIONS.md) | Rationale for key architectural choices (hand-rolled parser, PascalCase builders, global singleton, zero deps) | +| [Contributing](CONTRIBUTING.md) | Dev setup, code quality, PR expectations | +| [Changelog](CHANGELOG.md) | Release history | + ## Development ```bash diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..3ca0636 --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,374 @@ +# Architecture + +Internal architecture guide for maintainers and contributors of `chat-sdk-python`. + +## Module Dependency Graph + +``` +Chat (orchestrator) + | + +-- ThreadImpl (message posting, streaming, state, subscriptions) + | | + | +-- ChannelImpl (channel-level posting, thread enumeration, metadata) + | | + | +-- Adapter (platform protocol -- Slack, Discord, Teams, etc.) + | | + | +-- BaseAdapter (default implementations for optional methods) + | +-- FormatConverter (markdown <-> platform format) + | +-- Cards renderer (CardElement -> platform-specific payload) + | + +-- StateAdapter (subscriptions, locking, cache, queues) + | | + | +-- MemoryStateAdapter (dev/testing) + | +-- RedisStateAdapter (production, Lua scripts for atomicity) + | +-- PostgresStateAdapter (production, row-level locking) + | + +-- Types (Adapter protocol, Message, Author, events, config dataclasses) + +-- Errors (ChatError, LockError, ChatNotImplementedError, RateLimitError) +``` + +### Import Rules + +- `types.py` imports only from `cards.py`, `errors.py`, and `logger.py`. +- `thread.py` imports from `types.py` and `errors.py`. It defines the Chat singleton access point (`set_chat_singleton`, `get_chat_singleton`) to avoid circular imports with `chat.py`. +- `channel.py` imports from `thread.py` (for singleton access and helpers) and `types.py`. +- `chat.py` imports from `thread.py`, `channel.py`, and `types.py`. It is the only module that creates `ThreadImpl` and `ChannelImpl` instances in production. +- Adapters import from `types.py`, `shared/`, `cards.py`, and their own sub-packages. They never import `chat.py` directly; they receive a `ChatInstance` reference during `initialize()`. + +### Circular Import Avoidance + +The `Thread -> Chat` dependency is broken by the singleton pattern in `thread.py`. The `Chat` class calls `set_chat_singleton(self)` during registration, and `ThreadImpl`/`ChannelImpl` call `get_chat_singleton()` for lazy adapter resolution during deserialization. This mirrors the `chat-singleton.ts` pattern from the TS SDK. + +## How Adapters Work + +### The Adapter Protocol + +Defined in `types.py` as a `Protocol` class with `@runtime_checkable`: + +```python +@runtime_checkable +class Adapter(Protocol): + @property + def name(self) -> str: ... + @property + def user_name(self) -> str: ... + @property + def bot_user_id(self) -> str | None: ... + + async def post_message(self, thread_id: str, message: AdapterPostableMessage) -> RawMessage: ... + async def edit_message(self, thread_id: str, message_id: str, message: AdapterPostableMessage) -> RawMessage: ... + async def delete_message(self, thread_id: str, message_id: str) -> None: ... + async def fetch_messages(self, thread_id: str, options: FetchOptions | None = None) -> FetchResult: ... + async def handle_webhook(self, request: Any, options: WebhookOptions | None = None) -> Any: ... + async def initialize(self, chat: ChatInstance) -> None: ... + # ... plus ~10 more required methods +``` + +Required methods cover the complete lifecycle: webhook handling, message CRUD, reactions, typing indicators, thread ID encoding/decoding, and format rendering. + +### BaseAdapter + +`BaseAdapter` in `types.py` provides default implementations for **optional** methods that raise `ChatNotImplementedError`: + +- `stream()` -- native streaming (only Slack implements this currently) +- `open_dm()` -- DM channel creation +- `post_ephemeral()` -- ephemeral messages +- `schedule_message()` -- future delivery +- `open_modal()` -- modal dialogs +- `fetch_channel_info()` -- channel metadata +- `list_threads()` -- thread enumeration + +Concrete adapters inherit from `BaseAdapter` and override what they support. + +### Format Converters + +Each adapter has a `FormatConverter` that extends `BaseFormatConverter`: + +``` +Markdown string + | + v parse_markdown() + mdast AST (dict) <-- canonical internal representation + | + v from_ast() +Platform format string (mrkdwn, HTML, Adaptive Card text, etc.) +``` + +The `BaseFormatConverter` provides: +- `from_markdown(md) -> str` -- parse then render +- `to_markdown(platform_text) -> str` -- parse then stringify +- `render_postable(message)` -- handles the full `AdapterPostableMessage` union (str, PostableRaw, PostableMarkdown, PostableAst, PostableCard, CardElement) +- Template helpers: `_render_list()`, `_default_node_to_text()` + +Each adapter subclass implements `from_ast(ast)` and `to_ast(platform_text)` for its platform's native format: + +| Adapter | Format | Converter | +|---------|--------|-----------| +| Slack | mrkdwn (Slack markdown) | `SlackFormatConverter` | +| Discord | Discord markdown | `DiscordFormatConverter` | +| Teams | HTML subset | `TeamsFormatConverter` | +| Telegram | HTML (MarkdownV2 considered too fragile) | `TelegramFormatConverter` | +| WhatsApp | WhatsApp formatting (*bold*, _italic_) | `WhatsAppFormatConverter` | +| Google Chat | Google Chat markup | `GoogleChatFormatConverter` | +| GitHub | Standard GFM | `GitHubFormatConverter` | +| Linear | Standard markdown | `LinearFormatConverter` | + +### Webhook Flow + +``` +HTTP POST from platform + | + v +chat.webhooks["slack"](request) + | + v +Chat._handle_webhook(adapter_name, request, options) + | + v +adapter.handle_webhook(request, options) + | (adapter verifies signature, parses event, normalizes to typed event) + v +chat.process_message(adapter, thread_id, message) + or chat.process_action(event) + or chat.process_reaction(event) + or chat.process_slash_command(event) + or chat.process_modal_submit(event) + | + v +asyncio.create_task(handler coroutine) +``` + +## How the Card System Works + +Cards provide cross-platform rich messaging. The card model is defined as TypedDicts in `cards.py`: + +``` +CardElement (root) + +-- title, subtitle, image_url + +-- children: list[CardChild] + | + +-- TextElement -> Slack: section block, Teams: TextBlock + +-- ImageElement -> Slack: image block, Teams: Image + +-- DividerElement -> Slack: divider block, Teams: --- + +-- ActionsElement -> Slack: actions block, Teams: ActionSet + | +-- ButtonElement -> Slack: button, Teams: Action.Submit + | +-- LinkButtonElement -> Slack: button with url, Teams: Action.OpenUrl + +-- FieldsElement -> Slack: section with fields, Teams: FactSet + +-- TableElement -> Slack: ASCII table in code block, Teams: Table + +-- SectionElement -> Groups children + +-- LinkElement -> Inline hyperlink +``` + +### PascalCase Builders + +Builder functions use PascalCase (`Card()`, `Button()`, `Text()`) to match the TS SDK. snake_case aliases are also provided (`card()`, `button()`, `text_element()`). + +### Platform Rendering + +Each adapter has a `cards.py` module with a renderer: + +- **Slack**: `card_to_block_kit()` -- produces Block Kit JSON +- **Discord**: `card_to_discord_embed()` -- produces Discord embed dicts +- **Teams**: `card_to_adaptive_card()` -- produces Adaptive Card JSON +- **Telegram**: `card_to_telegram_inline_keyboard()` -- produces inline keyboard markup +- **WhatsApp**: `card_to_whatsapp_interactive()` -- produces WhatsApp interactive message +- **Google Chat**: `card_to_gchat_card()` -- produces Google Chat card v2 +- **GitHub**: Falls back to markdown text +- **Linear**: Falls back to markdown text + +Platforms that cannot render cards natively get `card_to_fallback_text()`, which produces a plain-text representation with `**title**`, field labels, ASCII tables, and `[alt](url)` for images. + +## How Concurrency Works + +The `Chat` class manages four concurrency strategies, configured via `ChatConfig.concurrency`: + +### Drop (default) + +``` +Message arrives -> acquire_lock(thread_id, 30s TTL) + Lock acquired? + Yes -> dispatch to handlers -> release lock + No -> raise LockError (message dropped) +``` + +The simplest strategy. If another handler is already processing the same thread, the new message is dropped. Suitable for bots where only the latest context matters. + +### Queue + +``` +Message arrives -> acquire_lock + Lock acquired? + Yes -> dispatch to handlers -> drain_queue() -> release lock + No -> enqueue(message, max_size) -> return + (overflow behavior: drop-oldest or drop-newest) + +drain_queue(): + while queue not empty: + dequeue all entries + skip expired entries + dispatch latest entry (skip intermediate messages) + extend lock +``` + +Messages that arrive while the lock is held are queued. After the current handler completes, the queue is drained. Only the latest queued message is actually processed; intermediate messages are passed as `context.skipped`. + +### Debounce + +``` +Message arrives -> acquire_lock + Lock acquired? + Yes -> enqueue message -> debounce_loop() + No -> enqueue message (max_size=1, replaces previous) + +debounce_loop() (max 20 iterations): + sleep(debounce_ms) + extend lock + dequeue entry + if queue empty -> break (no new messages arrived, process this one) + if queue has more -> entry superseded, loop again + dispatch final message +``` + +Waits for the user to stop typing. Each new message resets the debounce timer. Only the final message after a quiet period is processed. + +### Concurrent + +``` +Message arrives -> dispatch to handlers (no lock, no queue) +``` + +No locking at all. Every message is processed immediately. Use when handlers are idempotent and fast. + +### Lock Scope + +Locks can be scoped to `thread` (default) or `channel`. The scope is determined by: +1. `ChatConfig.lock_scope` (static or callable) +2. `adapter.lock_scope` property (adapter default) + +Channel-scoped locking serializes all messages in a channel, which is useful for bots that maintain channel-level state. + +## How State Backends Work + +### StateAdapter Protocol + +The `StateAdapter` protocol in `types.py` defines 18 async methods across 6 categories: + +| Category | Methods | +|----------|---------| +| Subscriptions | `subscribe`, `unsubscribe`, `is_subscribed` | +| Locking | `acquire_lock`, `release_lock`, `extend_lock`, `force_release_lock` | +| Key/Value Cache | `get`, `set`, `set_if_not_exists`, `delete` | +| Lists | `append_to_list`, `get_list` | +| Queues | `enqueue`, `dequeue`, `queue_depth` | +| Lifecycle | `connect`, `disconnect` | + +### Lock Semantics + +- `acquire_lock(thread_id, ttl_ms)` returns a `Lock` object with a unique token (CSPRNG-generated), or `None` if already held. +- `release_lock(lock)` releases only if the token matches (prevents releasing someone else's lock). +- `extend_lock(lock, ttl_ms)` extends the TTL, returning `False` if the lock was lost. +- `force_release_lock(thread_id)` unconditionally releases (admin escape hatch). + +Lock tokens are generated using `secrets.token_hex(16)` for cryptographic randomness, prefixed with the backend name for debuggability (`mem_`, `redis_`, `pg_`). + +### Backend Implementations + +| Backend | Lock mechanism | Atomicity | Production-ready | +|---------|---------------|-----------|-----------------| +| Memory | In-process dict with expiry | Single-process only | No (dev/test) | +| Redis | `SET NX PX` + Lua scripts | Atomic via Lua | Yes | +| PostgreSQL | `INSERT ... ON CONFLICT` + row locks | Atomic via transactions | Yes | + +Redis uses Lua scripts for `release_lock` and `extend_lock` to ensure token-check-and-delete is atomic. PostgreSQL uses `SELECT FOR UPDATE SKIP LOCKED` for non-blocking lock acquisition. + +## The Markdown Pipeline + +``` +Input markdown string + | + v +parse_markdown(text) -> Root (mdast-compatible dict AST) + | + v +walk_ast(root, visitor) -> Root (transform nodes) + | + v +stringify_markdown(ast) -> str (back to markdown) + or +from_ast(ast) -> str (to platform format) +``` + +### Parser Details (`shared/markdown_parser.py`) + +The parser is hand-rolled (not based on any library) and produces [mdast](https://github.com/syntax-tree/mdast)-compatible dict nodes. + +**Block-level parsing** (line-by-line): +- Fenced code blocks (``` and ~~~) +- Thematic breaks (`---`, `***`, `___`) +- Headings (`# ` through `###### `) +- GFM tables (pipe-delimited with alignment row) +- Blockquotes (`> `) +- Ordered lists (`1. `, `2) `) +- Unordered lists (`- `, `* `, `+ `) +- Paragraphs (everything else) + +**Inline parsing** (regex-based, priority-ordered): +- Images: `![alt](url "title")` +- Links: `[text](url "title")` +- Inline code: `` `code` `` +- Bold: `**text**`, `__text__` +- Strikethrough: `~~text~~` +- Emphasis: `*text*`, `_text_` + +The inline parser uses an iterative approach for suffix text (to avoid stack overflow on long strings) while recursing into match content (bounded by match length). + +### AST Utilities + +- `walk_ast(node, visitor)` -- deep-copy + transform visitor pattern +- `ast_to_plain_text(node)` -- strip all formatting +- `table_to_ascii(node)` -- render mdast table as padded ASCII +- `stringify_markdown(ast)` -- AST back to markdown string + +## The Streaming Pipeline + +``` +LLM stream (AsyncIterable) + | + v +from_full_stream() -- normalize text-delta events, inject step separators + | + v +StreamingMarkdownRenderer -- buffer incomplete constructs + .push(chunk) -- append text + .render() -- get safe-to-display markdown (for edit_message) + .get_committable_text() -- get safe-for-append text (for native streaming) + .finish() -- flush everything + | + v +adapter.stream() or fallback (post + edit loop) +``` + +### StreamingMarkdownRenderer + +The renderer solves the problem of rendering incomplete markdown during LLM streaming. Key behaviors: + +1. **Table buffering**: Lines matching `|...|` are held back until a separator row (`|---|---|`) confirms them as a table. Without this, pipe characters in regular text would be misinterpreted. + +2. **Inline marker repair** (`_remend()`): Closes unclosed `**`, `*`, `~~`, `` ` ``, and `[` constructs by appending matching closers. This prevents broken formatting during mid-token streaming. + +3. **Code fence tracking**: Uses incremental O(1) fence toggle counting. When inside a code fence, table buffering and inline repair are skipped. + +4. **Table wrapping** (`_wrap_tables_for_append()`): For append-only streaming (Slack's native streaming API), confirmed tables are wrapped in code fences so pipe characters render as literal text. + +5. **Clean prefix detection** (`_find_clean_prefix()`): Finds the longest prefix where all inline markers are balanced, used by `get_committable_text()`. + +### Fallback Streaming (post + edit) + +When an adapter does not support native streaming, `ThreadImpl._fallback_stream()` uses a post-then-edit pattern: + +1. Post an initial placeholder message (configurable, default `"..."`) +2. Start a background `_edit_loop()` that updates the message at intervals (default 500ms) +3. Accumulate text from the stream +4. After stream ends, stop the edit loop and send a final edit with the complete text + +The edit loop uses `asyncio.create_task()` for the background timer and checks a `stopped` flag to terminate cleanly. diff --git a/docs/DECISIONS.md b/docs/DECISIONS.md new file mode 100644 index 0000000..ad7265b --- /dev/null +++ b/docs/DECISIONS.md @@ -0,0 +1,125 @@ +# Design Decisions + +Key design decisions in `chat-sdk-python` and their rationale. Understanding these prevents well-intentioned refactors from breaking the architecture. + +## Why Hand-Rolled Markdown Parser + +**Decision**: Implement a custom markdown parser (`shared/markdown_parser.py`) rather than using an existing Python library. + +**Rationale**: + +1. **mdast compatibility**: The SDK requires an AST that matches the [mdast specification](https://github.com/syntax-tree/mdast) -- the same AST format used by the TS SDK's `unified`/`remark` ecosystem. No Python markdown library produces mdast-compatible output. `markdown-it-py` produces its own token format. `mistune` produces a different AST. `commonmark` produces a different tree structure. All would require a translation layer that is more code than the parser itself. + +2. **Round-trip fidelity**: The parser must support `parse -> walk/transform -> stringify` without losing information. Existing libraries optimize for HTML output, not round-trip markdown editing. + +3. **Subset is sufficient**: The SDK only needs: paragraphs, headings, code blocks, blockquotes, lists, tables (GFM), thematic breaks, bold, italic, strikethrough, inline code, links, and images. This is roughly 550 lines of well-tested code. A full CommonMark parser (with all edge cases) is ~10x more code and handles constructs the SDK does not need. + +4. **Zero dependencies**: The parser uses only `re` from the standard library. Adding a markdown library would violate the zero-runtime-dependency constraint (see below). + +5. **Streaming compatibility**: The `StreamingMarkdownRenderer` needs to understand the same constructs the parser recognizes (code fences, tables, inline markers). Using a separate library for parsing and hand-rolling the streaming logic would create divergence. + +## Why Not Microsoft Bot Framework SDK for Teams Auth + +**Decision**: Implement Teams JWT verification directly rather than using the `botbuilder-python` SDK. + +**Rationale**: + +1. **No async support**: The Microsoft `botbuilder-python` SDK is synchronous. The chat-sdk is fully async. Wrapping sync calls in `run_in_executor` adds complexity and defeats the purpose of async I/O. + +2. **Heavy dependency**: `botbuilder-python` pulls in dozens of transitive dependencies and has a large API surface. We only need JWT verification and HTTP posting, which is ~200 lines of code. + +3. **Maintenance status**: The Python Bot Framework SDK has historically lagged behind the .NET and Node.js versions. Features and fixes arrive late. + +4. **Consistency**: All other adapters implement their own auth directly. Having Teams use a separate SDK would create an inconsistent pattern. + +The tradeoff is that we must maintain the JWKS fetching and JWT validation code ourselves, including handling key rotation (see [SECURITY.md](SECURITY.md#teams-jwks-key-rotation-window)). + +## Why PascalCase for Card/Button/Modal Builders + +**Decision**: Use PascalCase function names (`Card()`, `Button()`, `Text()`, `Modal()`, `Select()`) as the primary API, with snake_case aliases. + +**Rationale**: + +1. **TS SDK compatibility**: The TS SDK uses PascalCase for these builders (`Card()`, `Button()`, `Text()`). Users porting code from TS to Python should find the API familiar. + +2. **Visual distinction**: PascalCase makes builder calls visually distinct from regular function calls. `Card(title="Hi", children=[Text("Hello")])` reads as a declarative element tree, not a sequence of operations. + +3. **SQLAlchemy precedent**: Python's own ecosystem has precedent for PascalCase functions that construct objects. SQLAlchemy's `Column()`, `String()`, `Integer()`, `ForeignKey()` follow this pattern. + +4. **snake_case aliases provided**: Every PascalCase builder has a snake_case alias (`card()`, `button()`, `text_element()`). PEP 8 purists can use those. Both are exported in `__all__`. + +5. **Not classes**: The builders are functions that return TypedDicts, not class constructors. Using PascalCase for class constructors is standard Python. Using it for factory functions is a deliberate style choice for this domain. + +## Why Global Singleton on Chat + +**Decision**: `Chat` registers itself as a global singleton via `set_chat_singleton(self)`. Thread and Channel deserialization uses `get_chat_singleton()` to resolve adapters. + +**Rationale**: + +1. **Deserialization without context**: When a `ThreadImpl` is deserialized from JSON (e.g., from Redis state, from a queued entry, from modal context), it needs to resolve its adapter by name. Without a singleton, every deserialization call would need an explicit `chat` parameter threaded through the call stack. + +2. **Matches TS SDK**: The TS SDK uses `chat-singleton.ts` with the same pattern. Keeping the pattern identical reduces the cognitive load of syncing changes. + +3. **Single Chat instance is the normal case**: In practice, applications have exactly one `Chat` instance. The singleton is registered during initialization and remains for the lifetime of the process. + +4. **Testing escape hatch**: `clear_chat_singleton()` and `has_chat_singleton()` are provided for test isolation. Each test can register its own singleton. + +## Why Zero Runtime Dependencies in Core + +**Decision**: The `chat-sdk` core package (`pip install chat-sdk`) has zero runtime dependencies. All adapter-specific libraries are optional extras. + +**Rationale**: + +1. **Install-time safety**: Users should be able to `pip install chat-sdk` without worrying about dependency conflicts. Zero deps means zero conflicts. + +2. **Deployment size**: Serverless environments (AWS Lambda, Google Cloud Functions) have package size limits. A core package with no deps keeps the base small. + +3. **Adapter isolation**: Installing `chat-sdk[slack]` adds `slack-sdk`. Installing `chat-sdk[discord]` adds `pynacl` and `aiohttp`. Users only pay for what they use. + +4. **Lazy imports**: Adapter modules import their optional dependencies inside methods, not at the top of the file. This means `import chat_sdk` never fails, even if no adapter extras are installed. See [UPSTREAM_SYNC.md](UPSTREAM_SYNC.md#9-top-level-imports-of-optional-deps-must-be-lazy). + +The optional extras are defined in `pyproject.toml`: + +```toml +[project.optional-dependencies] +slack = ["slack-sdk>=3.27.0"] +discord = ["pynacl>=1.5", "aiohttp>=3.9"] +teams = ["aiohttp>=3.9"] +telegram = ["aiohttp>=3.9"] +whatsapp = ["aiohttp>=3.9"] +google-chat = ["aiohttp>=3.9", "google-auth>=2.0", "pyjwt>=2.8"] +github = ["pyjwt[crypto]>=2.8"] +linear = ["aiohttp>=3.9"] +redis = ["redis>=5.0"] +postgres = ["asyncpg>=0.29"] +crypto = ["cryptography>=42.0"] +all = [...] # everything +``` + +## Why `from __future__ import annotations` Everywhere + +**Decision**: Every Python file in the SDK starts with `from __future__ import annotations`. + +**Rationale**: + +1. **PEP 604 syntax on Python 3.10**: The SDK uses `X | Y` union syntax (e.g., `str | None`) and `list[str]` lowercase generics. Without the future import, these require Python 3.10+. With it, they work on Python 3.7+. Since the SDK targets Python 3.11+, this is belt-and-suspenders. + +2. **Forward reference resolution**: Annotations are stored as strings and resolved lazily. This eliminates circular reference issues in type hints (e.g., `Thread` referencing `Channel` and vice versa). + +3. **Consistency**: Rather than deciding per-file whether to use the import, it is applied everywhere as a project convention. This prevents accidental `NameError` when a contributor adds a forward reference. + +## Why BaseAdapter with ChatNotImplementedError Defaults + +**Decision**: `BaseAdapter` provides default implementations for optional methods that raise `ChatNotImplementedError` rather than using `abc.abstractmethod`. + +**Rationale**: + +1. **Progressive implementation**: Adapter authors can implement the minimum required methods first and add optional features later. `abc.abstractmethod` would force them to implement everything upfront. + +2. **Runtime discovery**: Code that checks `if hasattr(adapter, "stream") and adapter.stream` can discover at runtime whether an adapter supports a feature. With abstract methods, the adapter would have a method that raises `NotImplementedError`, and the check would need to be `try/except`. + +3. **Clear error messages**: `ChatNotImplementedError("slack", "scheduling")` produces `"slack does not support scheduling"`, which is more informative than `NotImplementedError`. + +4. **Matches TS SDK**: The TS SDK uses optional interface methods (possible in TypeScript but not in Python's Protocol). BaseAdapter with defaults is the Python equivalent. + +The required methods from the `Adapter` protocol (18 methods) must still be implemented. `BaseAdapter` only provides defaults for the ~10 optional methods. diff --git a/docs/SECURITY.md b/docs/SECURITY.md new file mode 100644 index 0000000..179d2b5 --- /dev/null +++ b/docs/SECURITY.md @@ -0,0 +1,208 @@ +# Security + +Security decisions, implementation details, and known limitations for `chat-sdk-python`. + +## Webhook Verification Per Platform + +Every adapter verifies incoming webhook requests before processing. The verification method depends on the platform. + +### Slack + +- **Method**: HMAC-SHA256 signature verification +- **Header**: `X-Slack-Signature` (v0 format: `v0=`) +- **Timestamp**: `X-Slack-Request-Timestamp` (rejected if > 5 minutes old to prevent replay attacks) +- **Signing key**: `signing_secret` from the Slack app configuration +- **Base string**: `v0:{timestamp}:{body}` +- **Comparison**: Timing-safe via `hmac.compare_digest()` + +```python +# Simplified verification flow in Slack adapter +base_string = f"v0:{timestamp}:{body}" +expected = "v0=" + hmac.new( + signing_secret.encode(), base_string.encode(), hashlib.sha256 +).hexdigest() +if not hmac.compare_digest(expected, signature_header): + raise AuthenticationError("slack", "Invalid signature") +``` + +### Discord + +- **Method**: Ed25519 signature verification (using PyNaCl) +- **Headers**: `X-Signature-Ed25519`, `X-Signature-Timestamp` +- **Public key**: `public_key` from the Discord application +- **Verified message**: `{timestamp}{body}` +- **Library**: `nacl.signing.VerifyKey` + +### Teams + +- **Method**: JWT token validation (RS256) +- **Header**: `Authorization: Bearer ` +- **Issuer**: `https://api.botframework.com` or `https://sts.windows.net/{tenant-id}/` +- **Audience**: The app's `app_id` +- **JWKS endpoint**: `https://login.botframework.com/v1/.well-known/openidconfiguration` +- **Key rotation**: JWKS keys are cached with a configurable TTL. See "Known Limitations" below. + +The Teams adapter implements its own JWT validation rather than using the Microsoft Bot Framework SDK because there is no maintained async Python equivalent. The validation includes issuer checking, audience verification, expiry checking, and signature verification against the JWKS-provided RSA public keys. + +### Telegram + +- **Method**: Secret token comparison +- **Header**: `X-Telegram-Bot-Api-Secret-Token` +- **Comparison**: Timing-safe via `hmac.compare_digest()` +- **Caveat**: If `secret_token` is not configured, webhook verification is silently skipped. See "Known Limitations" below. + +### WhatsApp (Meta Cloud API) + +- **Method**: HMAC-SHA256 signature verification +- **Header**: `X-Hub-Signature-256` (format: `sha256=`) +- **Signing key**: `app_secret` from the Meta app +- **Comparison**: Timing-safe via `hmac.compare_digest()` + +### Google Chat + +- **Method**: Google-issued JWT token verification +- **Header**: `Authorization: Bearer ` +- **Verification**: `google.oauth2.id_token.verify_token()` from the `google-auth` library +- **Audience**: The Google Cloud project number +- **Fallback**: If `use_application_default_credentials` is enabled, the adapter trusts Google Cloud's internal authentication. + +### GitHub + +- **Method**: HMAC-SHA256 signature verification +- **Header**: `X-Hub-Signature-256` (format: `sha256=`) +- **Signing key**: `webhook_secret` from the GitHub app +- **Comparison**: Timing-safe via `hmac.compare_digest()` + +### Linear + +- **Method**: HMAC-SHA256 signature verification +- **Header**: `Linear-Signature` +- **Signing key**: `webhook_secret` from the Linear app +- **Comparison**: Timing-safe via `hmac.compare_digest()` + +## SSRF Protections + +### Teams `service_url` Validation + +The Teams adapter receives a `serviceUrl` in every activity payload. This URL is used to send replies back to Teams. The adapter validates that the URL matches the expected Microsoft domains: + +- `https://*.botframework.com/` +- `https://smba.trafficmanager.net/` + +Requests to arbitrary URLs (which could target internal services) are rejected. + +### WhatsApp Media URL Validation + +When downloading media attachments from WhatsApp messages, the adapter validates that media URLs point to Meta's CDN domains before fetching them: + +- `https://*.whatsapp.net/` +- `https://scontent*.xx.fbcdn.net/` + +### Slack `response_url` Validation + +Slack's `response_url` (used for responding to slash commands and interactive messages) is validated to ensure it points to Slack's domains: + +- `https://hooks.slack.com/` + +## Crypto: AES-256-GCM for Slack Token Encryption + +The Slack adapter supports multi-workspace OAuth installations. Bot tokens for each workspace can be encrypted at rest using AES-256-GCM. + +### Implementation (`adapters/slack/crypto.py`) + +```python +def encrypt_token(plaintext: str, key: bytes) -> EncryptedTokenData: + iv = os.urandom(12) # 96-bit random IV + aesgcm = AESGCM(key) + ct_with_tag = aesgcm.encrypt(iv, plaintext.encode(), None) + ciphertext = ct_with_tag[:-16] # everything except last 16 bytes + tag = ct_with_tag[-16:] # 128-bit auth tag + return EncryptedTokenData( + iv=base64.b64encode(iv), + data=base64.b64encode(ciphertext), + tag=base64.b64encode(tag), + ) +``` + +- **Algorithm**: AES-256-GCM (authenticated encryption) +- **IV**: 12 bytes from `os.urandom()` (CSPRNG) +- **Auth tag**: 16 bytes (128-bit) +- **Key format**: 32-byte key, accepted as 64-char hex or 44-char base64 +- **Library**: `cryptography` (via the `crypto` extra) +- **Lazy import**: `cryptography` is imported inside the encrypt/decrypt functions, not at module level + +### Key Management + +The encryption key is provided via the `encryption_key` config option on the Slack adapter. It is the deployer's responsibility to: + +1. Generate a strong 256-bit key (`python -c "import secrets; print(secrets.token_hex(32))"`) +2. Store it securely (environment variable, secrets manager) +3. Rotate it by re-encrypting stored tokens with the new key + +## Timing-Safe HMAC Comparisons + +All webhook signature verifications use `hmac.compare_digest()` for constant-time comparison. This prevents timing side-channel attacks where an attacker could determine the correct signature byte-by-byte by measuring response times. + +This applies to: +- Slack signing secret verification +- GitHub webhook signature verification +- WhatsApp webhook signature verification +- Linear webhook signature verification +- Telegram secret token verification +- Lock token comparison in state adapters + +## Lock Token Generation + +Lock tokens serve as proof of ownership. A holder must present the correct token to release or extend a lock. + +- **Generator**: `secrets.token_hex(16)` -- 128 bits of cryptographic randomness +- **Format**: `{backend}_{timestamp_ms}_{hex}` (e.g., `mem_1700000000000_a1b2c3d4...`) +- **Why CSPRNG**: If tokens were predictable, a malicious actor (or a bug in a concurrent process) could forge a token and release someone else's lock, causing data races. + +## Known Limitations + +### Telegram: Silent Skip When No `secret_token` + +If the `TelegramAdapterConfig` does not include a `secret_token`, the Telegram adapter processes webhooks without any authentication. This is because Telegram does not require webhook verification -- it is optional. + +**Risk**: Anyone who discovers the webhook URL can send fake updates. + +**Mitigation**: Always configure `secret_token` in production. The adapter should log a warning when operating without verification, but this is not currently implemented. + +### Teams: JWKS Key Rotation Window + +The Teams adapter caches JWKS (JSON Web Key Set) public keys to avoid fetching them on every request. When Microsoft rotates its signing keys, there is a window where: + +1. The old key is still in the cache +2. Microsoft starts signing with the new key +3. Verification fails until the cache expires and new keys are fetched + +**Risk**: Legitimate webhooks may be rejected during key rotation (~minutes). + +**Mitigation**: The cache TTL is set to 24 hours by default. If verification fails with a cached key, the adapter should (but does not currently) attempt a cache refresh before rejecting the request. This is a known improvement area. + +### No Input Sanitization on Card Content + +Card element content (titles, text, button labels) is passed through to platform APIs without HTML escaping. Each platform's API is responsible for sanitizing output. This is intentional -- double-escaping would produce visible escape sequences. + +**Risk**: If a platform API has an XSS vulnerability, malicious card content could exploit it. + +**Mitigation**: Trust platform APIs to handle their own output encoding. Do not pass user-controlled input directly into card content without application-level sanitization. + +## What to Audit Before Production Deployment + +1. **Webhook secrets are configured** for every adapter in use. Never deploy with empty or default secrets. + +2. **Encryption key is set** for Slack multi-workspace OAuth if bot tokens are persisted (Redis/Postgres state backends). + +3. **Telegram `secret_token` is set**. Without it, anyone can submit fake updates to your webhook endpoint. + +4. **State backend is production-grade**. `MemoryStateAdapter` emits a warning in production environments but does not prevent usage. Use Redis or PostgreSQL. + +5. **HTTPS is enforced** on all webhook endpoints. Webhook signatures are useless if the request body can be observed and replayed over HTTP. + +6. **Rate limiting** is configured at the HTTP layer (reverse proxy / load balancer). The SDK raises `RateLimitError` / `AdapterRateLimitError` when platform rate limits are hit, but does not implement client-side rate limiting. + +7. **Dependency audit**: Run `pip audit` or equivalent to check for known vulnerabilities in dependencies (`cryptography`, `slack-sdk`, `pynacl`, `aiohttp`, etc.). + +8. **Log level**: Set to `info` or `warn` in production. `debug` level logs message content and adapter payloads, which may contain sensitive data. diff --git a/docs/TESTING.md b/docs/TESTING.md new file mode 100644 index 0000000..75cca0d --- /dev/null +++ b/docs/TESTING.md @@ -0,0 +1,256 @@ +# Testing + +Test strategy, conventions, and instructions for `chat-sdk-python`. + +## Test Categories + +### Unit Tests (`tests/test_*.py`) + +Test individual modules in isolation with mocked dependencies. Examples: + +- `test_cards.py` -- Card builder functions and fallback text generation +- `test_markdown_parser.py` -- Markdown parsing and stringification +- `test_streaming_markdown.py` -- StreamingMarkdownRenderer buffering behavior +- `test_memory_state.py` -- MemoryStateAdapter operations +- `test_types.py` -- Message serialization/deserialization +- `test_chat.py` -- Chat orchestrator handler registration and dispatch +- `test_thread.py` -- ThreadImpl posting, state, serialization +- `test_channel.py` -- ChannelImpl operations + +### Adapter Unit Tests (`tests/test__*.py`) + +Test platform-specific adapter logic: + +- `test_slack_adapter.py`, `test_slack_webhook.py`, `test_slack_cards.py`, etc. +- `test_discord_adapter.py`, `test_discord_cards.py`, `test_discord_format.py`, etc. +- `test_teams_adapter.py`, `test_teams_cards.py`, `test_teams_format.py`, etc. +- `test_telegram_adapter.py`, `test_telegram_webhook.py`, `test_telegram_cards.py`, etc. +- `test_whatsapp_adapter.py`, `test_whatsapp_webhook.py`, `test_whatsapp_cards.py`, etc. +- `test_github_adapter.py`, `test_github_webhook.py`, `test_github_cards.py`, etc. +- `test_google_chat_adapter.py`, `test_gchat_webhook.py`, `test_gchat_cards.py`, etc. +- `test_linear_adapter.py`, `test_linear_cards.py`, `test_linear_format.py`, etc. + +### Integration Tests (`tests/integration/`) + +Test multi-component flows using the mock adapter and memory state backend: + +- `test_mention_flow.py` -- Full mention -> handler -> post response flow +- `test_dm_flow.py` -- Direct message routing +- `test_action_flow.py` -- Button click -> action handler +- `test_reaction_flow.py` -- Reaction event handling +- `test_slash_command_flow.py` -- Slash command routing +- `test_concurrency.py` -- Drop/queue/debounce strategies +- `test_dedup.py` -- Message deduplication +- `test_channel_ops.py` -- Channel-level operations +- `test_subscription_flow.py` -- Thread subscription and subscribed message routing +- `test_assistant_threads.py` -- Slack Assistant thread events + +### Replay Tests (`tests/integration/test_replay_*.py`) + +Replay recorded webhook payloads through the full adapter pipeline to verify end-to-end correctness. These tests simulate the exact JSON payloads that platforms send. + +- `test_replay_mention.py` -- Slack @mention webhook replay +- `test_replay_dm.py` -- DM webhook replay +- `test_replay_actions_reactions.py` -- Action and reaction webhooks +- `test_replay_modal.py`, `test_replay_modals_extended.py` -- Modal submit/close +- `test_replay_slash_command.py` -- Slash command webhooks +- `test_replay_streaming.py` -- Streaming response flow +- `test_replay_events.py` -- Miscellaneous event types +- `test_replay_platforms.py` -- Cross-platform webhook replay +- `test_replay_multi_workspace.py` -- Multi-workspace Slack OAuth flow +- `test_replay_fetch_messages.py` -- Message history fetch + +### Dispatch Key Validation (`tests/test_dispatch_key_validation.py`) + +A dedicated test suite that verifies every adapter dispatches events with snake_case keys. See the detailed section below. + +## How to Run Tests + +```bash +# All tests +uv run pytest tests/ + +# Stop on first failure +uv run pytest tests/ -x + +# With coverage +uv run pytest tests/ --cov=src/chat_sdk --cov-report=term-missing + +# Specific test file +uv run pytest tests/test_cards.py -v + +# Specific test class or method +uv run pytest tests/test_dispatch_key_validation.py::TestSlackDispatchKeys -v + +# Integration tests only +uv run pytest tests/integration/ -v + +# Tests matching a keyword +uv run pytest tests/ -k "streaming" -v +``` + +### Configuration + +`pyproject.toml` configures pytest: + +```toml +[tool.pytest.ini_options] +asyncio_mode = "auto" +``` + +The `asyncio_mode = "auto"` setting means all `async def test_*` functions are automatically treated as async tests without needing `@pytest.mark.asyncio`. + +## How to Add Tests for a New Adapter + +1. **Create test files** following the naming convention: + ``` + tests/test__adapter.py # Core adapter tests + tests/test__webhook.py # Webhook verification tests + tests/test__cards.py # Card rendering tests + tests/test__format.py # Format converter tests + ``` + +2. **Use the shared fixtures** from `tests/conftest.py`: + ```python + from tests.conftest import FakeRequest, make_request, compute_hmac_sha256 + ``` + +3. **Add dispatch key validation** to `tests/test_dispatch_key_validation.py`: + ```python + class TestNewPlatformDispatchKeys: + def _make_adapter(self): + from chat_sdk.adapters.new_platform.adapter import NewPlatformAdapter + adapter = NewPlatformAdapter(config) + adapter._chat = _make_mock_chat() + return adapter + + def test_message_dispatch_keys(self): + adapter = self._make_adapter() + # Simulate event, verify process_message called with correct args + adapter._handle_message_event(fake_event) + adapter._chat.process_message.assert_called_once() + # ... + + def test_action_dispatch_keys(self): + adapter = self._make_adapter() + adapter._handle_action(fake_action) + action_obj = adapter._chat.process_action.call_args[0][0] + assert_no_camel_case_keys(action_obj) + ``` + +4. **Add integration replay tests** under `tests/integration/`: + ```python + # tests/integration/test_replay_new_platform.py + async def test_new_platform_mention_replay(): + chat = Chat(ChatConfig( + user_name="bot", + adapters={"new_platform": adapter}, + state=memory_state, + )) + # Submit recorded webhook JSON + # Assert handler was called with correct arguments + ``` + +5. **Test webhook signature verification** specifically: + ```python + async def test_rejects_invalid_signature(): + result = await adapter.handle_webhook(FakeRequest( + headers={"X-Signature": "invalid"}, + body=b'{"event": "test"}', + )) + assert result.status == 401 + ``` + +## Known Coverage Gaps + +The following modules are under 60% coverage as of the initial alpha release. These are tracked for improvement: + +| Module | Approximate Coverage | Gap Description | +|--------|---------------------|-----------------| +| `adapters/whatsapp/adapter.py` | ~55% | Media download, message status webhooks, group message handling | +| `adapters/google_chat/adapter.py` | ~50% | Workspace events, user info resolution, space membership | +| `state/postgres.py` | ~45% | Connection pooling edge cases, schema migration, concurrent lock contention | +| `state/redis.py` | ~50% | Lua script error handling, connection retry, cluster mode | + +These gaps exist primarily because: +- WhatsApp and Google Chat have complex webhook payload variations that need more recorded fixtures. +- Redis and PostgreSQL state adapters require running databases for full integration testing. The unit tests mock the database clients, which limits coverage of error paths. + +## The Dispatch Key Validation Pattern + +### What It Is + +`test_dispatch_key_validation.py` contains a test class for each adapter that verifies the adapter dispatches events to `Chat.process_*` methods using snake_case keys (Python convention) rather than camelCase keys (JavaScript/TypeScript convention). + +### What It Catches + +During the original TS-to-Python port, adapters were passing camelCase keys like `threadId`, `messageId`, `actionId`, and `privateMetadata` in the event objects dispatched to `Chat.process_message()`, `Chat.process_action()`, etc. The `Chat` class expects snake_case keys (`thread_id`, `message_id`, `action_id`, `private_metadata`). CamelCase keys silently produce `None` lookups or `KeyError` downstream. + +This was a **systemic bug** across all adapters -- every adapter had at least one camelCase key in its dispatch path. + +### How It Works + +1. Each test creates a minimal adapter instance with a mock `ChatInstance` that records all `process_*` calls. +2. The test calls the adapter's internal dispatch method (e.g., `_handle_message_event()`, `_handle_block_actions()`) with a realistic platform payload. +3. The test extracts the event object passed to the mock's `process_*` method. +4. `assert_no_camel_case_keys(event_obj)` recursively walks the object (supporting dicts, dataclasses, and lists) and asserts no key matches the camelCase pattern `^[a-z]+[A-Z]`. +5. The `raw` key is intentionally excluded from checking because it contains the original platform payload with the platform's native casing. + +### Helper Function + +```python +CAMEL_RE = re.compile(r"^[a-z]+[A-Z]") + +def assert_no_camel_case_keys(d, path="", *, skip_raw=True): + """Recursively check that no dict keys use camelCase.""" + if isinstance(d, dict): + for k, v in d.items(): + if isinstance(k, str) and CAMEL_RE.match(k): + raise AssertionError(f"camelCase key '{k}' found at {path}.{k}") + if skip_raw and k == "raw": + continue + assert_no_camel_case_keys(v, f"{path}.{k}") + elif isinstance(d, (list, tuple)): + for i, item in enumerate(d): + assert_no_camel_case_keys(item, f"{path}[{i}]") +``` + +### Adapters Covered + +- Slack: message, reaction, action, slash command, modal submit +- Google Chat: action (card click) +- Discord: action (component interaction) +- Telegram: action (callback query), reaction +- Teams: action (message with value) +- Linear: reaction, comment + +### Why It Exists + +This test suite is a regression guard. Without it, any refactoring of adapter dispatch code could reintroduce camelCase keys without any test failure. The test runs on every CI build and takes < 1 second. + +## Plans: Webhook JSON Fixtures + +The TS SDK repository contains recorded webhook JSON payloads used for testing. We plan to copy these fixtures into `tests/fixtures/` to enable: + +1. **Cross-language parity testing**: Verify that the Python adapter produces the same normalized `Message` objects as the TS adapter for identical webhook payloads. +2. **Platform version regression**: When platforms change their webhook format, the fixture files make it obvious what changed. +3. **Replay tests without mocks**: Drive the full adapter pipeline with real payloads instead of hand-constructed dicts. + +The fixture format will be: +``` +tests/fixtures/ + slack/ + mention.json + dm.json + action_button_click.json + reaction_added.json + slash_command.json + modal_submit.json + discord/ + message_create.json + interaction_button.json + teams/ + message_activity.json + action_submit.json + ... +``` diff --git a/docs/UPSTREAM_SYNC.md b/docs/UPSTREAM_SYNC.md new file mode 100644 index 0000000..d80a4ad --- /dev/null +++ b/docs/UPSTREAM_SYNC.md @@ -0,0 +1,258 @@ +# Upstream Sync Guide + +How to keep `chat-sdk-python` in sync with the [Vercel Chat TS SDK](https://github.com/vercel/chat). + +## How to Diff Upstream Changes + +```bash +# Clone or update the TS repo +git clone https://github.com/vercel/chat.git /tmp/vercel-chat +cd /tmp/vercel-chat +git log --oneline -20 # see recent commits + +# Compare a specific adapter +diff -u /tmp/vercel-chat/packages/adapter-slack/src/index.ts \ + /tmp/chat-sdk-python/src/chat_sdk/adapters/slack/adapter.py + +# Compare core types +diff -u /tmp/vercel-chat/packages/core/src/types.ts \ + /tmp/chat-sdk-python/src/chat_sdk/types.py +``` + +The Python module layout mirrors the TS package layout: + +| TS Package | Python Module | +|-----------|---------------| +| `packages/core/src/chat.ts` | `src/chat_sdk/chat.py` | +| `packages/core/src/thread.ts` | `src/chat_sdk/thread.py` | +| `packages/core/src/channel.ts` | `src/chat_sdk/channel.py` | +| `packages/core/src/types.ts` | `src/chat_sdk/types.py` | +| `packages/core/src/cards.ts` | `src/chat_sdk/cards.py` | +| `packages/core/src/modals.ts` | `src/chat_sdk/modals.py` | +| `packages/core/src/from-full-stream.ts` | `src/chat_sdk/from_full_stream.py` | +| `packages/core/src/markdown.ts` | `src/chat_sdk/shared/markdown_parser.py` + `base_format_converter.py` | +| `packages/core/src/streaming-markdown.ts` | `src/chat_sdk/shared/streaming_markdown.py` | +| `packages/adapter-slack/src/index.ts` | `src/chat_sdk/adapters/slack/adapter.py` | +| `packages/state-memory/src/index.ts` | `src/chat_sdk/state/memory.py` | +| `packages/state-redis/src/index.ts` | `src/chat_sdk/state/redis.py` | + +## What to Port vs What to Adapt + +### Port 1:1 + +These must stay structurally identical to the TS SDK: + +- **Type definitions** (`types.py`): All dataclass shapes, protocol methods, and event types must match TS. This is the interop contract. +- **Concurrency strategies** (`chat.py`): The drop/queue/debounce/concurrent logic, lock TTLs, and dedup keys must produce identical behavior. +- **Card element types** (`cards.py`): The TypedDict shapes must match TS so that platform card renderers produce the same output. +- **Thread ID encoding/decoding**: Each adapter's `encode_thread_id` / `decode_thread_id` must produce the same strings as TS for cross-language state sharing. +- **State key prefixes**: `thread-state:`, `channel-state:`, `dedupe:`, `modal-context:` must match. +- **Webhook signature verification**: Must use the same algorithms and constant-time comparison as TS. + +### Adapt for Python + +These are intentionally different from TS: + +- **Async model**: TS uses Promises; Python uses `async/await` with `asyncio`. `Promise.all()` becomes `asyncio.gather()`. `setTimeout()` becomes `asyncio.sleep()`. +- **Module structure**: TS uses one file per package. Python splits into modules (`adapter.py`, `cards.py`, `format_converter.py`, `types.py`) per adapter. +- **Error hierarchy**: Python uses exception classes instead of TS error strings. +- **Type system**: TS interfaces become `Protocol` classes. TS unions become `Union` types or `|` syntax. TS generics become `TypeVar`. +- **Optional dependencies**: TS uses package dependencies. Python uses extras (`pip install chat-sdk[slack]`) with lazy imports. + +## Architecture Decisions That Must Stay 1:1 + +1. **Global singleton on Chat**: Required for Thread/Channel deserialization without passing adapter references through every call. Both SDKs use the same pattern. + +2. **Thread ID format**: `{adapter}:{platform_id}` (e.g., `slack:C123:1234567890.123456`). State keys depend on this format. Changing it would break cross-language state sharing in deployments that mix TS and Python bots. + +3. **Lock token format**: `{backend}_{timestamp}_{random}`. The token must be a cryptographically random string for security. The format itself is not critical for interop, but the lock key format (`dedupe:{adapter}:{message_id}`) must match. + +4. **Concurrency strategy semantics**: Queue drain order, debounce loop iteration limits (20), message superseding logic, and TTL handling must match. + +5. **Card element type strings**: `"card"`, `"button"`, `"text"`, `"divider"`, `"actions"`, `"fields"`, `"table"`, `"section"`, `"image"`, `"link"`, `"link-button"` must match exactly. + +## Python-Specific Hardening + +These exist only in the Python port and have no TS equivalent: + +- `shared/errors.py`: Typed adapter error hierarchy (`AdapterRateLimitError`, `AuthenticationError`, `ValidationError`, `NetworkError`, `ResourceNotFoundError`, `PermissionError`). TS throws plain `Error` objects. +- `testing/__init__.py` + `shared/mock_adapter.py`: Test utilities with `MockAdapter`, `MockStateAdapter`, `create_test_message()`. +- `from __future__ import annotations` everywhere: Enables PEP 604 union syntax (`X | Y`) on Python 3.10. +- Input validation on adapter config dataclasses (e.g., rejecting empty `signing_secret`). +- `ContextVar`-based request context in Slack adapter (instance variable, not class variable). + +## Common TS-to-Python Translation Patterns + +### Async Patterns + +```typescript +// TS +await Promise.all([taskA(), taskB()]); +const result = await new Promise(resolve => setTimeout(() => resolve(x), 1000)); +``` + +```python +# Python +await asyncio.gather(task_a(), task_b()) +await asyncio.sleep(1.0) +result = x +``` + +### Object Construction + +```typescript +// TS +adapter.handle_webhook(request, { waitUntil: fn }); +``` + +```python +# Python +adapter.handle_webhook(request, WebhookOptions(wait_until=fn)) +``` + +### Type Guards + +```typescript +// TS +if ('markdown' in message) { ... } +``` + +```python +# Python +if isinstance(message, PostableMarkdown): ... +# or +if hasattr(message, 'markdown'): ... +``` + +## Known TS-to-Python Footguns + +These are the specific translation mistakes that caused bugs during the original port (caught across 9 rounds of review). Every contributor should internalize this list. + +### 1. `fn(x, {opts})` must become keyword args, not a dict + +```typescript +// TS +adapter.postMessage(threadId, message, { metadata: true }); +``` + +```python +# WRONG: passing a dict where keyword args are expected +adapter.post_message(thread_id, message, {"metadata": True}) + +# RIGHT: use the dataclass/keyword pattern +adapter.post_message(thread_id, message, metadata=True) +``` + +### 2. `or` vs `is not None` for empty string preservation + +```python +# WRONG: empty string is falsy in Python, so this silently drops it +text = event.get("text") or default_text + +# RIGHT: preserve empty strings when they are valid values +text = event.get("text") if event.get("text") is not None else default_text + +# ALSO RIGHT: explicit None check +raw = event.get("text") +text = raw if raw is not None else default_text +``` + +This bit us in adapter dispatch where empty `text` fields (valid for action events with no text) were being replaced with defaults. + +### 3. camelCase keys in dispatch dicts + +```python +# WRONG: preserving TS naming in Python dicts +event = {"threadId": thread_id, "messageId": msg_id} + +# RIGHT: Python uses snake_case throughout +event = ActionEvent(thread_id=thread_id, message_id=msg_id, ...) +``` + +This was a systemic bug across all adapters. The `test_dispatch_key_validation.py` test suite was written specifically to catch this. See [TESTING.md](TESTING.md) for details. + +### 4. `asyncio.ensure_future` vs `asyncio.create_task` + +```python +# WRONG: deprecated since Python 3.10, and does not work outside async context +asyncio.ensure_future(coro) + +# RIGHT: explicit create_task with error handling +task = asyncio.get_running_loop().create_task(coro) +task.add_done_callback(lambda t: log_error(t.exception()) if t.exception() else None) +``` + +The SDK's `_create_task()` helper wraps this pattern and gracefully handles the case where no event loop is running. + +### 5. `datetime.utcnow()` vs `datetime.now(tz=timezone.utc)` + +```python +# WRONG: returns naive datetime, deprecated in Python 3.12 +from datetime import datetime +now = datetime.utcnow() + +# RIGHT: timezone-aware datetime +from datetime import UTC, datetime +now = datetime.now(tz=UTC) +``` + +All timestamps in the SDK use `UTC` from the `datetime` module (aliased from `timezone.utc` in Python 3.11+). + +### 6. Raw dicts for process_* events must use typed dataclasses + +```python +# WRONG: plain dict loses type safety and makes key typos silent +chat.process_action({ + "action_id": action_id, + "thred_id": thread_id, # typo goes undetected +}) + +# RIGHT: typed dataclass catches typos at construction time +chat.process_action(ActionEvent( + adapter=self, + action_id=action_id, + thread_id=thread_id, # typo would be a TypeError + ... +)) +``` + +### 7. `ContextVar` as instance variable, not class variable + +```python +# WRONG: shared across all instances, causes cross-request contamination +class SlackAdapter: + _request_context: ContextVar[RequestContext] = ContextVar("request_context") + +# RIGHT: each instance gets its own ContextVar +class SlackAdapter: + def __init__(self): + self._request_context: ContextVar[RequestContext] = ContextVar("request_context") +``` + +### 8. `random.choices` vs `secrets.token_hex` for security tokens + +```python +# WRONG: predictable PRNG, not suitable for lock tokens +import random +token = ''.join(random.choices('abcdef0123456789', k=32)) + +# RIGHT: cryptographically secure random +import secrets +token = secrets.token_hex(16) +``` + +Lock tokens must be unpredictable because they serve as proof of lock ownership. A compromised token allows unauthorized lock release. + +### 9. Top-level imports of optional deps must be lazy + +```python +# WRONG: crashes at import time if slack-sdk is not installed +from slack_sdk.web.async_client import AsyncWebClient # top of file + +# RIGHT: import inside the function/method that uses it +def _get_client(self): + from slack_sdk.web.async_client import AsyncWebClient + return AsyncWebClient(token=self._bot_token) +``` + +Optional dependencies (slack-sdk, pynacl, aiohttp, etc.) must only be imported inside methods of their respective adapter. Users who install `chat-sdk` without extras should not get import errors from adapters they are not using.