Skip to content

feat: persist channel attachments to disk with recall tool#350

Merged
jamiepine merged 4 commits intomainfrom
feat/channel-attachment-persistence
Mar 7, 2026
Merged

feat: persist channel attachments to disk with recall tool#350
jamiepine merged 4 commits intomainfrom
feat/channel-attachment-persistence

Conversation

@jamiepine
Copy link
Member

@jamiepine jamiepine commented Mar 7, 2026

Summary

  • Adds persistent file/image storage for channel attachments with a new save_attachments channel config option
  • Introduces attachment_recall tool so the channel LLM can list, locate, and re-load saved files across conversation turns
  • Saves attachment metadata into conversation message metadata for history annotation reconstruction

Problem

When a user sends an image or file in a channel, the attachment is downloaded, converted to base64/text, and injected into the current LLM turn. Once that turn completes, the file data is gone — no recall, no handoff to workers, and platform URLs (especially Slack) expire.

Design

Full design doc: docs/design-docs/channel-attachment-persistence.md

Storage Layer

  • Migration 20260306000001_saved_attachments.sqlsaved_attachments table with indexes on channel_id and message_id
  • Disk layout — files saved to workspace/saved/ with filename dedup (report.pdfreport_2.pdf on collision)
  • Configsave_attachments: bool on ChannelConfig (opt-in, default false), hot-reloadable

Channel Integration

  • Both single-message and coalesced-batch paths save attachments when enabled
  • Downloaded bytes are reused for LLM content building (no double download)
  • Attachment metadata (id, filename, mime, size) injected into message metadata JSON
  • channel_recall tool transcripts now include [Attachment saved: ...] annotations from metadata

Recall Tool (attachment_recall)

Channel-level tool with three actions:

  • list — recent saved attachments for the channel
  • get_path — absolute disk path for delegation to workers
  • get_content — re-load image/text file as base64/inline for re-analysis (10 MB cap)

Only added to channels with save_attachments enabled.

Usage

[channel]
save_attachments = true

Changes

File Change
migrations/20260306000001_saved_attachments.sql New table + indexes
src/config/types.rs save_attachments field, saved_dir() helper
src/config/toml_schema.rs TOML schema for save_attachments
src/config/load.rs Wire through defaults + per-agent resolution
src/config/runtime.rs saved_dir() helper on RuntimeConfig
src/main.rs Create workspace/saved/ at startup
src/agent/channel_attachments.rs Save logic, dedup, metadata formatting, annotation helpers
src/agent/channel.rs Wire save flow into both message paths, enrich metadata
src/tools/attachment_recall.rs New tool implementation
src/tools.rs Register module, add/remove tool conditionally
src/tools/channel_recall.rs Append attachment annotations to transcripts
src/prompts/text.rs Register tool description
prompts/en/tools/attachment_recall_description.md.j2 LLM-facing tool description
docs/design-docs/channel-attachment-persistence.md Full design doc

Testing

  • just gate-pr passes (fmt, clippy, 428 lib tests, integration test compile)
  • Feature is opt-in and off by default — zero impact on existing channels

Note

Summary by Tembo: Implements persistent file/image storage for channel attachments with opt-in configuration and a new recall tool. Saves attachment metadata to disk and conversation history, enables re-analysis and worker delegation without re-downloading. Full design doc in docs/design-docs/channel-attachment-persistence.md. Feature is backwards-compatible (off by default).

Generated by Tembo for commit 5a1dd7f

Save file attachments sent in channels to workspace/saved/ and track them
in a saved_attachments table so the LLM can reference files across turns.

- New save_attachments channel config option (opt-in, default false)
- saved_attachments SQLite table with dedup filename logic
- Attachment metadata injected into conversation message metadata JSON
- History annotations reconstructed in channel_recall transcripts
- attachment_recall tool (list/get_path/get_content) for channels
- Downloads reused between save and LLM processing (no double fetch)
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Adds per-channel attachment persistence: DB schema and disk layout, saving/deduplication logic, history annotations, an attachment_recall tool (list/get_path/get_content), runtime/config wiring (saved dir, channel flag), and threading of saved-attachment metadata through message processing.

Changes

Cohort / File(s) Summary
Design & Prompts
docs/design-docs/channel-attachment-persistence.md, prompts/en/tools/attachment_recall_description.md.j2
Adds design documentation and the attachment_recall tool prompt describing actions, limits, and identification semantics.
Database Migrations
migrations/20260306000001_saved_attachments.sql, migrations/20260306000002_saved_attachments_unique_filename.sql
Creates saved_attachments table with metadata, FK to channels(id), indexes, and a unique index on saved_filename.
Configuration
src/config/toml_schema.rs, src/config/types.rs, src/config/load.rs, src/config/runtime.rs
Adds save_attachments to TOML/schema and ChannelConfig, materializes per-channel defaults, and exposes saved_dir() helper.
Agent: Attachment Core
src/agent/channel_attachments.rs
New module: SavedAttachmentMeta, save_channel_attachments() (download, dedupe, atomic write, DB persist), content-from-bytes helpers, annotation builders, and filename collision logic.
Agent: Channel Integration
src/agent/channel.rs
Threads saved-attachment metadata through inbound and batched flows, updates persist_inbound_user_message signature to accept saved attachments, and reuses saved bytes when rendering.
Tools: Recall Tool
src/tools/attachment_recall.rs, src/tools.rs, src/prompts/text.rs
Adds AttachmentRecallTool (actions: list, get_path, get_content), arg/output types, registration when channel opt-in enabled, MIME and size constraints (10 MB), and prompt registration.
History & Integration
src/tools/channel_recall.rs, src/main.rs
Reconstructs/annotates history messages with attachment summaries and ensures per-agent workspace/saved directory is created on startup.
Dev Tooling
scripts/gate-pr.sh
CI: run cargo clippy with RUSTFLAGS="-Dwarnings" in non-fast path to treat warnings as errors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main feature: persistent channel attachment storage with a recall tool for retrieving saved files.
Description check ✅ Passed The description is well-structured, directly related to the changeset, and clearly explains the problem, design, implementation changes, and testing approach.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/channel-attachment-persistence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

}
};

let disk_path = saved_dir.join(&saved_filename);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saved_filename ultimately comes from a user-controlled attachment filename. Before saved_dir.join(...), it’d be good to sanitize to a safe basename (drop any path components, and reject "."/".."), otherwise a filename like ../foo can escape the saved/ dir.

Also: tokio::fs::write will happily overwrite on a race even with the dedup check; consider using OpenOptions::create_new(true) (and retry with a new suffix on AlreadyExists) to make this atomic.

for attachment in &attachments {
if let Some((_, bytes)) = saved
.iter()
.find(|(meta, _)| meta.filename == attachment.filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching saved bytes back to attachments by filename feels fragile (duplicate filenames in a single message/batch will pick the first match) and it’s O(n^2). Since save_channel_attachments iterates in-order, it might be safer to preserve that ordering (e.g., return a Vec aligned to the input slice, or return (Attachment, meta, bytes) tuples) and avoid the .find(...) lookup entirely.

channel: a.channel.map(|channel_config| ChannelConfig {
listen_only_mode: channel_config.listen_only_mode.unwrap_or(false),
save_attachments: channel_config.save_attachments.unwrap_or(false),
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small gotcha: if an agent sets [agent.channel] but only specifies one field, the other currently defaults to false here (instead of inheriting from defaults.channel.*). That can flip listen_only_mode off unintentionally when someone enables save_attachments.

Suggested change
}),
channel: a.channel.map(|channel_config| ChannelConfig {
listen_only_mode: channel_config
.listen_only_mode
.unwrap_or(defaults.channel.listen_only_mode),
save_attachments: channel_config
.save_attachments
.unwrap_or(defaults.channel.save_attachments),
}),

pub summary: String,
/// For get_content: re-loaded file content (if applicable).
#[serde(skip)]
pub content: Option<UserContent>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttachmentRecallOutput.content is #[serde(skip)], so the tool result won’t actually include the re-loaded bytes/text in the JSON returned to the model. That makes get_content effectively “summary-only” right now.

If the intent is to inline content for re-analysis, I think this needs to be part of the serialized output (e.g., content_base64/content_text + content_mime), or the tool runner needs a special-case path to inject UserContent into the next turn.

Extract SavedAttachmentWithBytes type alias to reduce inline tuple
complexity in the coalesced batch path. Use clamp() instead of
min().max() for attachment list limit.
attachment.filename,
attachment.mime_type,
size_str,
&attachment.id[..8]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachment.id[..8] will panic if id is shorter than 8 bytes (e.g. corrupted/old metadata). This is already handled safely in AttachmentRecallTool; worth mirroring here.

Suggested change
&attachment.id[..8]
attachment.id.get(..8).unwrap_or(&attachment.id)

saved_filename: row.saved_filename,
mime_type: row.mime_type,
size_bytes: row.size_bytes,
disk_path: row.disk_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the list action, do we want to return the absolute disk_path? It makes get_path redundant and exposes internal paths even when the caller only asked to list.

Suggested change
disk_path: row.disk_path,
disk_path: String::new(),

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (1)
src/tools/channel_recall.rs (1)

179-182: Expand the metadata variable names.

meta_json / meta read a bit out of step with the repo’s naming rule here; metadata_json / metadata_value would be clearer and consistent with the surrounding code. As per coding guidelines "Use non-abbreviated variable names in Rust code: queue not q, message not msg, channel not ch; common abbreviations like config are acceptable".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/channel_recall.rs` around lines 179 - 182, Rename the abbreviated
variables in the message metadata handling to clearer names: change meta_json to
metadata_json and meta to metadata_value in the block that reads
message.metadata and parses it with serde_json::from_str, and update any
subsequent uses (including the call to
crate::agent::channel_attachments::annotation_from_metadata) and the surrounding
let content assignment so all references match the new names; ensure variable
names are not abbreviated and compile after the rename.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design-docs/channel-attachment-persistence.md`:
- Line 51: Multiple fenced code blocks in channel-attachment-persistence.md are
missing language identifiers (tripping markdownlint MD040) — update each
triple-backtick fence (```) at the noted locations (including the occurrences
around lines 51, 147, 212, 236, 249, 261, 275, 284, 354) to include the
appropriate language tag (e.g., ```json, ```bash, ```yaml, ```ts, etc.) so
syntax highlighting and linter checks pass; locate the fences by searching for
the bare ``` markers and add the correct language identifier for the content
inside each fenced block.

In `@prompts/en/tools/attachment_recall_description.md.j2`:
- Around line 3-8: Revise the attachment actions text to discourage
inline/base64 recall and ambiguous filename lookup: update the description of
get_content to explicitly limit it to very small previews (e.g., "tiny
text/images only, ≤100 KB") and warn that larger files or non-image/non-text
must use get_path; emphasize that agents should prefer attachment IDs (e.g.,
id:a1b2c3d4) over original filenames because filenames like "report.pdf" are
ambiguous; keep list and get_path descriptions but add a short note to include
the absolute path in worker task descriptions when using get_path and to avoid
embedding multi-megabyte content inline.

In `@src/agent/channel_attachments.rs`:
- Around line 538-551: attachment.filename is user-controlled and must be
sanitized before joining into saved_dir; update the code around
deduplicate_filename and disk_path to first extract the basename of
attachment.filename (strip any directory components and leading path
separators), reject names that are empty or equal to "." or "..", then pass the
sanitized basename into deduplicate_filename (instead of the raw
attachment.filename) and use the returned saved_filename to construct disk_path;
apply the same basename+validation change to the other occurrence referenced
(lines ~671-689) to prevent writing outside the workspace.
- Around line 553-579: The current read-before-write flow around
filename_taken() is racy: two tasks can both see a free name, both call
tokio::fs::write(), and then both insert the same saved_filename/disk_path,
causing overwrites; fix by atomically reserving the filename and enforcing
uniqueness at DB level: add a UNIQUE constraint on
saved_attachments.saved_filename (or saved_filename + channel_id if needed),
change the write flow to create the file with create_new (open in
exclusive/create_new mode) so the filesystem fails if the path already exists,
and move to an insert-with-retry strategy in the code that performs the SQL
insert (the block using sqlx::query(...) .bind(...).execute(pool).await) so that
on a UNIQUE constraint violation you generate a new saved_filename and retry the
create_new + insert cycle (retry loop around saved_filename creation, disk write
with create_new, and sqlx insert) instead of relying on filename_taken(); update
both places (the snippet here and the other block covering lines 671-724) to use
this atomic create_new + DB-unique + retry pattern.

In `@src/agent/channel.rs`:
- Around line 1268-1285: The current lookup in the attachment handling block
uses meta.filename to match saved_data to attachments, causing duplicates (e.g.,
multiple "report.pdf") to reuse the first bytes; update
save_channel_attachments() and the saved_data structure to include a stable key
(original attachment index or platform attachment ID) and carry that through as
meta (e.g., meta.attachment_id or meta.index), then change the find() in the
block that builds attachment_content to match on that stable key (compare
meta.attachment_id or meta.index to attachment.id or attachment_index) before
calling channel_attachments::content_from_bytes so each attachment gets its
correct saved bytes; apply the same change to the analogous code at the other
location mentioned.

In `@src/config/load.rs`:
- Around line 1632-1635: The per-agent ChannelConfig mapping currently uses
hardcoded false fallbacks which overwrite defaults; change the map closure for
a.channel so ChannelConfig.listen_only_mode and ChannelConfig.save_attachments
use the corresponding values from defaults.channel (e.g.,
defaults.channel.listen_only_mode and defaults.channel.save_attachments) when
the agent's channel field is None (use unwrap_or or unwrap_or_else against those
defaults) instead of unwrap_or(false), ensuring inherited defaults for both
listen_only_mode and save_attachments.

In `@src/tools.rs`:
- Around line 403-408: The removal calls (handle.remove_tool for CronTool::NAME,
SendMessageTool::NAME, SendAgentMessageTool::NAME, and
AttachmentRecallTool::NAME) currently swallow all errors via `let _ = ...`;
change each to explicitly handle the Result: if removal returns an error that
indicates "tool not registered" treat it as a no-op, otherwise log the error
(e.g., via the existing logger) and continue. Locate the calls to
handle.remove_tool(...) and replace the silent discards with a match/if let that
checks for the not-found case and logs any other Err returned by remove_tool.

In `@src/tools/attachment_recall.rs`:
- Around line 138-151: The tool currently returns Err for recoverable,
user-correctable outcomes; update call() and the referenced return sites
(including the match arms in call, and uses in list_attachments,
get_attachment_path, get_attachment_content) to return a structured
AttachmentRecallOutput with an added status/error field for cases like unknown
action, missing selectors, not-found, missing-file, and oversized-file, and only
use Err(AttachmentRecallError(...)) for genuine DB/I/O faults; add/extend the
AttachmentRecallOutput type to include an enum/string status or error_message
and replace all non-fatal Err returns (including the current
AttachmentRecallError usages for user-facing conditions) with appropriate
AttachmentRecallOutput instances describing the recoverable failure so the
caller/LLM can recover.
- Around line 78-97: The AttachmentInfo struct currently always includes
disk_path which causes attachments (serialized by AttachmentRecallOutput) to
leak host filesystem paths for actions other than get_path; update the
serialization/struct shape so disk_path is omitted by default and only populated
when handling the get_path action. Concretely: remove or mark disk_path to be
skipped/optional in AttachmentInfo (or introduce a separate AttachmentPathInfo
used only for get_path), ensure attachment list/ get_content branches populate
attachments without disk_path, and populate disk_path only in the get_path
handling code path (and remove any "Path: ..." text from get_content summaries).
- Around line 354-363: The query treats the provided attachment id as a LIKE
pattern allowing %/_ to match unexpectedly; change the predicate to a literal
prefix comparison such as USING substr(id, 1, length(?)) = ? instead of "id LIKE
? || '%'", and update binds accordingly (bind self.channel_id.as_ref(), then
bind id twice for the length/equality check) so AttachmentRow lookup in the
block that calls fetch_optional(&self.pool) matches only true prefixes without
interpreting wildcard characters.
- Around line 261-269: The current size check uses attachment.size_bytes from
the DB row; instead fetch the live file size via filesystem metadata (e.g.
std::fs::metadata or tokio::fs::metadata) for the stored file path (use the
attachment.path or whatever field holds the on-disk path) and use metadata.len()
as the size to compare against MAX_CONTENT_SIZE; if metadata fails, return an
AttachmentRecallError with context, and if metadata.len() > MAX_CONTENT_SIZE
return the same "too large" error using format_size and attachment.filename as
before (preserve AttachmentRecallError, MAX_CONTENT_SIZE, format_size, and
attachment.filename references).

---

Nitpick comments:
In `@src/tools/channel_recall.rs`:
- Around line 179-182: Rename the abbreviated variables in the message metadata
handling to clearer names: change meta_json to metadata_json and meta to
metadata_value in the block that reads message.metadata and parses it with
serde_json::from_str, and update any subsequent uses (including the call to
crate::agent::channel_attachments::annotation_from_metadata) and the surrounding
let content assignment so all references match the new names; ensure variable
names are not abbreviated and compile after the rename.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 05f28898-bb61-4217-bf7b-7b0973d5dc83

📥 Commits

Reviewing files that changed from the base of the PR and between bba4b59 and 5a1dd7f.

📒 Files selected for processing (14)
  • docs/design-docs/channel-attachment-persistence.md
  • migrations/20260306000001_saved_attachments.sql
  • prompts/en/tools/attachment_recall_description.md.j2
  • src/agent/channel.rs
  • src/agent/channel_attachments.rs
  • src/config/load.rs
  • src/config/runtime.rs
  • src/config/toml_schema.rs
  • src/config/types.rs
  • src/main.rs
  • src/prompts/text.rs
  • src/tools.rs
  • src/tools/attachment_recall.rs
  • src/tools/channel_recall.rs

Comment on lines +3 to +8
Three actions:
- **list**: Show recent saved attachments (filename, type, size, ID).
- **get_path**: Get the absolute disk path of a saved file. Use this when delegating work to a worker — include the path in the task description so the worker can access the file directly.
- **get_content**: Re-load a saved image or text file for analysis. Use this when you need to see a file that was sent earlier in the conversation. Only works for images and text files under 10 MB — for other file types or larger files, use get_path and delegate to a worker.

You can identify attachments by their ID (shown in history annotations like `id:a1b2c3d4`) or by their original filename.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t nudge the model toward multi-megabyte inline recall or ambiguous filename lookup.

This wording makes get_content sound safe for anything under 10 MB and treats original filenames as first-class identifiers. Both are risky: multi-megabyte inline/base64 payloads can swamp context, and repeated uploads of report.pdf are inherently ambiguous. The prompt should steer agents to prefer IDs and reserve get_content for genuinely small previews/snippets.

Suggested wording
 Three actions:
 - **list**: Show recent saved attachments (filename, type, size, ID).
 - **get_path**: Get the absolute disk path of a saved file. Use this when delegating work to a worker — include the path in the task description so the worker can access the file directly.
-- **get_content**: Re-load a saved image or text file for analysis. Use this when you need to see a file that was sent earlier in the conversation. Only works for images and text files under 10 MB — for other file types or larger files, use get_path and delegate to a worker.
+- **get_content**: Re-load a small saved image or text file for analysis. Use this for small previews/snippets only. For larger files, repeated filenames, or non-text/binary formats, use **get_path** and delegate to a worker.
 
-You can identify attachments by their ID (shown in history annotations like `id:a1b2c3d4`) or by their original filename.
+Prefer attachment IDs (shown in history annotations like `id:a1b2c3d4`). Original filenames may be ambiguous if the same filename was uploaded more than once.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prompts/en/tools/attachment_recall_description.md.j2` around lines 3 - 8,
Revise the attachment actions text to discourage inline/base64 recall and
ambiguous filename lookup: update the description of get_content to explicitly
limit it to very small previews (e.g., "tiny text/images only, ≤100 KB") and
warn that larger files or non-image/non-text must use get_path; emphasize that
agents should prefer attachment IDs (e.g., id:a1b2c3d4) over original filenames
because filenames like "report.pdf" are ambiguous; keep list and get_path
descriptions but add a short note to include the absolute path in worker task
descriptions when using get_path and to avoid embedding multi-megabyte content
inline.

Comment on lines +553 to +579
if let Err(error) = tokio::fs::write(&disk_path, &bytes).await {
tracing::warn!(
%error,
path = %disk_path.display(),
"failed to write attachment to disk"
);
continue;
}

let id = uuid::Uuid::new_v4().to_string();
let size_bytes = bytes.len() as u64;
let disk_path_str = disk_path.to_string_lossy().to_string();

let insert_result = sqlx::query(
"INSERT INTO saved_attachments \
(id, channel_id, original_filename, saved_filename, mime_type, size_bytes, disk_path) \
VALUES (?, ?, ?, ?, ?, ?, ?)",
)
.bind(&id)
.bind(channel_id)
.bind(&attachment.filename)
.bind(&saved_filename)
.bind(&attachment.mime_type)
.bind(size_bytes as i64)
.bind(&disk_path_str)
.execute(pool)
.await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filename dedup is still racy across concurrent channels.

filename_taken() is only a preflight check. Two channel tasks can both observe report.pdf as free, then both call tokio::fs::write() and insert rows for the same disk_path, so one attachment overwrites the other. Reserve the path atomically (create_new, a UNIQUE constraint on saved_filename, and retry on conflict) instead of relying on the read-before-write sequence.

Also applies to: 671-724

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel_attachments.rs` around lines 553 - 579, The current
read-before-write flow around filename_taken() is racy: two tasks can both see a
free name, both call tokio::fs::write(), and then both insert the same
saved_filename/disk_path, causing overwrites; fix by atomically reserving the
filename and enforcing uniqueness at DB level: add a UNIQUE constraint on
saved_attachments.saved_filename (or saved_filename + channel_id if needed),
change the write flow to create the file with create_new (open in
exclusive/create_new mode) so the filesystem fails if the path already exists,
and move to an insert-with-retry strategy in the code that performs the SQL
insert (the block using sqlx::query(...) .bind(...).execute(pool).await) so that
on a UNIQUE constraint violation you generate a new saved_filename and retry the
create_new + insert cycle (retry loop around saved_filename creation, disk write
with create_new, and sqlx insert) instead of relying on filename_taken(); update
both places (the snippet here and the other block covering lines 671-724) to use
this atomic create_new + DB-unique + retry pattern.

Comment on lines +403 to +408
// Cron, send_message, send_agent_message, and attachment_recall removal is
// best-effort since not all channels have them
let _ = handle.remove_tool(CronTool::NAME).await;
let _ = handle.remove_tool(SendMessageTool::NAME).await;
let _ = handle.remove_tool(SendAgentMessageTool::NAME).await;
let _ = handle.remove_tool(AttachmentRecallTool::NAME).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't swallow optional-tool cleanup failures.

This new attachment_recall cleanup now follows the same silent let _ = ... pattern as the other optional tools. If remove_tool() fails for any reason other than “tool wasn't registered”, stale per-turn state becomes invisible in logs. Log and ignore only the not-found case instead of discarding every error.

As per coding guidelines "Never silently discard errors; no let _ = on Results; handle, log, or propagate errors, with the exception of .ok() on channel sends where the receiver may be dropped".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools.rs` around lines 403 - 408, The removal calls (handle.remove_tool
for CronTool::NAME, SendMessageTool::NAME, SendAgentMessageTool::NAME, and
AttachmentRecallTool::NAME) currently swallow all errors via `let _ = ...`;
change each to explicitly handle the Result: if removal returns an error that
indicates "tool not registered" treat it as a no-op, otherwise log the error
(e.g., via the existing logger) and continue. Locate the calls to
handle.remove_tool(...) and replace the silent discards with a match/if let that
checks for the not-found case and logs any other Err returned by remove_tool.

Set RUSTFLAGS="-Dwarnings" for the clippy step in gate-pr.sh so
local runs catch the same warnings-as-errors that CI enforces.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (8)
src/agent/channel.rs (2)

1268-1286: ⚠️ Potential issue | 🟠 Major

Matching saved attachment bytes by filename is fragile.

When multiple attachments in a batch share the same original filename (e.g., two report.pdf files), .find(|(meta, _)| meta.filename == attachment.filename) always returns the first match. Later attachments with the same name will be analyzed using the wrong file's bytes.

Since save_channel_attachments processes attachments in order, consider preserving index alignment or using a stable key (like the original attachment's platform ID if available) instead of filename.

💡 Suggested approach

One option: zip attachments with saved_data by index since they're processed in order:

-                for attachment in &attachments {
-                    if let Some((_, bytes)) = saved
-                        .iter()
-                        .find(|(meta, _)| meta.filename == attachment.filename)
-                    {
+                for (index, attachment) in attachments.iter().enumerate() {
+                    if let Some((_, bytes)) = saved.get(index) {

Or track by platform attachment ID if one exists in crate::Attachment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel.rs` around lines 1268 - 1286, The current matching of saved
bytes to incoming attachments using .find(|(meta, _)| meta.filename ==
attachment.filename) is fragile when filenames repeat; update the logic in the
block that builds attachment_content (using saved_data and attachments) to
preserve ordering or use a stable identifier instead of filename: either zip
saved_data and attachments by index (since save_channel_attachments processes
them in order) to pick the corresponding bytes for each attachment, or match by
a unique platform attachment ID on crate::Attachment/metadata if available;
ensure you still skip audio/* into unsaved and call
channel_attachments::content_from_bytes with the correctly-aligned bytes for
each attachment.

1600-1615: ⚠️ Potential issue | 🟠 Major

Same filename-matching fragility in single-message path.

This block has the same issue as the batched path: matching by meta.filename will misattribute bytes when multiple attachments share the same original filename.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel.rs` around lines 1600 - 1615, The code is matching saved
bytes by meta.filename which can misattribute when multiple attachments share a
filename; change the find predicate in this single-message path to match by a
stable content identifier (e.g., meta.content_id, meta.blob_id, or a
checksum/hash) that exists on both saved_data entries and Attachment (fall back
to filename only if that identifier is absent). Update the predicate used in the
closure passed to saved_data.iter().find(...) (the tuple pattern currently used)
to compare the unique id/hash fields (and compute/propagate a hash if
necessary), keep the audio check using attachment.mime_type as-is, and leave
channel_attachments::content_from_bytes(bytes, attachment) behavior unchanged
when a proper match is found.
src/tools/attachment_recall.rs (3)

88-98: ⚠️ Potential issue | 🟡 Minor

disk_path is exposed in all responses, leaking internal filesystem layout.

AttachmentInfo.disk_path is populated for all actions including list and get_content, exposing absolute host paths even when the caller only wanted a summary. This makes get_path redundant and reveals internal structure unnecessarily.

💡 Suggested fix: Make disk_path optional
 #[derive(Debug, Clone, Serialize)]
 pub struct AttachmentInfo {
     pub id: String,
     pub filename: String,
     pub saved_filename: String,
     pub mime_type: String,
     pub size_bytes: i64,
-    pub disk_path: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub disk_path: Option<String>,
     pub created_at: String,
 }

Then only populate disk_path in the get_path action.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/attachment_recall.rs` around lines 88 - 98, AttachmentInfo
currently exposes disk_path for all responses; change the type of
AttachmentInfo::disk_path from String to Option<String> and update serialization
to omit None (e.g., via serde's default behavior or skip_serializing_if) so
callers don't see internal paths by default. Then modify all places that
construct AttachmentInfo (e.g., any constructors, list_attachments, get_content,
or AttachmentInfo::new) to leave disk_path as None, and only populate Some(path)
when handling the explicit get_path action (e.g., in the handler/function that
returns the path). Ensure callers that expect disk_path handle Option<String>
accordingly.

261-269: ⚠️ Potential issue | 🟠 Major

Size check trusts DB metadata, not live file size.

attachment.size_bytes comes from the DB row written at save time. If the file is replaced or grows after saving, the live file could exceed the 10 MB limit while the DB value passes the check. The code then reads the entire oversized file into memory.

🔒 Proposed fix: Check live file size
-        let size = attachment.size_bytes as u64;
+        let size = tokio::fs::metadata(&path)
+            .await
+            .map_err(|error| {
+                AttachmentRecallError(format!(
+                    "Failed to stat file '{}': {error}",
+                    attachment.filename
+                ))
+            })?
+            .len();
+
         if size > MAX_CONTENT_SIZE {
-            let size_str = format_size(attachment.size_bytes);
+            let size_str = format_size(size as i64);
             return Err(AttachmentRecallError(format!(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/attachment_recall.rs` around lines 261 - 269, The current size
check uses the DB field attachment.size_bytes which can be stale; before reading
the file into memory in the attachment recall flow (the block that compares
attachment.size_bytes to MAX_CONTENT_SIZE and returns AttachmentRecallError),
stat the live file on disk and use the live size for the enforcement instead of
the DB value (use the attachment's stored file path/get_path result and call a
filesystem metadata stat to get actual size, format with format_size as before
and keep the same error message and suggestion to use get_path/delegate to a
worker). Ensure all branches that read the file use this live-size check to
avoid loading oversized files into memory.

354-367: ⚠️ Potential issue | 🟠 Major

LIKE pattern allows SQL wildcards in attachment ID.

The query id LIKE ? || '%' interprets %, _, and "" as SQL wildcards. An attachment_id of % matches any attachment in the channel; an empty string matches everything. A malformed ID silently resolves to the newest row instead of failing with "not found."

🔒 Proposed fix: Use literal prefix comparison
             let row = sqlx::query_as::<_, AttachmentRow>(
                 "SELECT id, original_filename, saved_filename, mime_type, size_bytes, disk_path, created_at \
                  FROM saved_attachments \
-                 WHERE channel_id = ? AND id LIKE ? || '%' \
+                 WHERE channel_id = ? AND substr(id, 1, length(?)) = ? \
                  ORDER BY created_at DESC \
                  LIMIT 1",
             )
             .bind(self.channel_id.as_ref())
             .bind(id)
+            .bind(id)
             .fetch_optional(&self.pool)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/attachment_recall.rs` around lines 354 - 367, The SQL uses id LIKE
? || '%' which treats % and _ as wildcards and lets malformed/empty IDs match;
fix by treating the provided id as a literal prefix: escape any '%' '_' and '\'
characters in the id before binding (replace '\' -> '\\', '%' -> '\%', '_' ->
'\_') and change the query to include an ESCAPE clause (e.g. "WHERE channel_id =
? AND id LIKE ? || '%' ESCAPE '\\'"). Update the binding to use the escaped id
(the same id variable passed to .bind(id) should be the escaped version) and
keep the existing error mapping (AttachmentRow, AttachmentRecallError,
self.channel_id, self.pool) otherwise unchanged.
src/agent/channel_attachments.rs (3)

541-554: ⚠️ Potential issue | 🔴 Critical

Path traversal vulnerability: attachment.filename is not sanitized.

The filename from user-controlled input (attachment.filename) flows directly into deduplicate_filename and then into saved_dir.join(&saved_filename) without stripping directory components. A malicious filename like ../../../etc/passwd or /tmp/evil.txt can escape the saved/ directory.

🔒 Proposed fix: Add sanitization before deduplication
+fn sanitize_filename(filename: &str) -> Option<String> {
+    std::path::Path::new(filename)
+        .file_name()
+        .and_then(|name| name.to_str())
+        .filter(|name| !name.is_empty() && *name != "." && *name != "..")
+        .map(|name| name.to_string())
+}
+
 pub(crate) async fn save_channel_attachments(
     ...
 ) -> Vec<(SavedAttachmentMeta, Vec<u8>)> {
     ...
     for attachment in attachments {
         ...
+        let safe_filename = match sanitize_filename(&attachment.filename) {
+            Some(name) => name,
+            None => {
+                tracing::warn!(filename = %attachment.filename, "invalid attachment filename, skipping");
+                continue;
+            }
+        };
+
-        let saved_filename = match deduplicate_filename(pool, saved_dir, &attachment.filename).await
+        let saved_filename = match deduplicate_filename(pool, saved_dir, &safe_filename).await
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel_attachments.rs` around lines 541 - 554, The
attachment.filename is user-controlled and can include path components, so
before calling deduplicate_filename and using saved_dir.join you must sanitize
it: in the code paths around deduplicate_filename, replace or validate
attachment.filename by extracting only the base name (e.g.,
Path::new(&attachment.filename).file_name()) and reject or replace empty/invalid
results (fall back to a generated safe name), strip any leading path separators
or drive letters, and remove/nullify control characters; then pass that
sanitized base name into deduplicate_filename and use the returned
saved_filename with saved_dir.join to ensure files cannot escape the saved
directory (update the code around deduplicate_filename, saved_filename,
attachment.filename, and saved_dir.join accordingly).

553-582: ⚠️ Potential issue | 🟠 Major

Filename deduplication is racy across concurrent saves.

filename_taken() is only a preflight check. Two concurrent channel tasks can both observe report.pdf as available, then both call tokio::fs::write() and insert rows for the same disk_path, causing one attachment to overwrite the other silently.

🔒 Proposed fix: Use atomic file creation with retry
-        if let Err(error) = tokio::fs::write(&disk_path, &bytes).await {
+        // Use create_new for atomic file creation; retry with new suffix on collision
+        let file = tokio::fs::OpenOptions::new()
+            .write(true)
+            .create_new(true)
+            .open(&disk_path)
+            .await;
+
+        let file = match file {
+            Ok(f) => f,
+            Err(error) if error.kind() == std::io::ErrorKind::AlreadyExists => {
+                tracing::debug!(path = %disk_path.display(), "file collision, will retry with new name");
+                continue; // Loop in an outer retry mechanism
+            }
+            Err(error) => {
+                tracing::warn!(%error, path = %disk_path.display(), "failed to create attachment file");
+                continue;
+            }
+        };
+
+        if let Err(error) = tokio::io::AsyncWriteExt::write_all(&mut file, &bytes).await {
             tracing::warn!(
                 %error,
                 path = %disk_path.display(),
                 "failed to write attachment to disk"
             );
             continue;
         }

Consider also adding a UNIQUE constraint on saved_filename in the database and retrying on constraint violation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel_attachments.rs` around lines 553 - 582, The current save
flow is racy: filename_taken() is a preflight only, so two concurrent tasks can
both pick the same saved_filename and call tokio::fs::write(&disk_path, ...)
then INSERT into saved_attachments, allowing silent overwrites; fix by
performing atomic file creation and DB-backed uniqueness with retries: attempt
to create the file atomically (e.g., open with create_new semantics) for the
chosen saved_filename and on failure pick a new name and retry, and also add a
UNIQUE constraint on saved_filename in the saved_attachments table and catch
constraint violations from the sqlx::query(...).execute(pool).await to retry
generating a new id/saved_filename (or abort after N attempts); update the code
paths around filename_taken(), tokio::fs::write, disk_path/disk_path_str, and
the INSERT into saved_attachments to implement this retry loop.

633-639: ⚠️ Potential issue | 🟡 Minor

Potential panic if attachment.id is shorter than 8 characters.

&attachment.id[..8] will panic if the ID is shorter than 8 bytes (e.g., corrupted or old metadata). This is handled safely in AttachmentRecallTool on line 203 using 8.min(attachment.id.len()).

🐛 Proposed fix
             format!(
                 "{} ({}, {}, id:{})",
                 attachment.filename,
                 attachment.mime_type,
                 size_str,
-                &attachment.id[..8]
+                attachment.id.get(..8).unwrap_or(&attachment.id)
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/channel_attachments.rs` around lines 633 - 639, The format string
that uses &attachment.id[..8] can panic if attachment.id is shorter than 8
bytes; change the slice to guard with the length (e.g., compute end =
8.min(attachment.id.len()) and use &attachment.id[..end]) and update the format
call that references attachment.filename, attachment.mime_type, size_str, and
attachment.id so it safely truncates the id like AttachmentRecallTool does.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools/attachment_recall.rs`:
- Around line 84-86: The content field is being skipped during serialization
(#[serde(skip)]) so UserContent populated by get_attachment_content() (via
image_base64() or text()) is never sent to the LLM—remove or change the serde
attribute so content is serialized (e.g., replace #[serde(skip)] with
#[serde(skip_serializing_if = "Option::is_none")] or similar) and ensure the
tool result struct that holds summary and content includes content in the JSON
payload; alternatively, if you prefer injecting into the conversation, update
the tool runner code path that serializes the tool result to call a function
that inserts the populated UserContent into the conversation history before
sending to the LLM.

---

Duplicate comments:
In `@src/agent/channel_attachments.rs`:
- Around line 541-554: The attachment.filename is user-controlled and can
include path components, so before calling deduplicate_filename and using
saved_dir.join you must sanitize it: in the code paths around
deduplicate_filename, replace or validate attachment.filename by extracting only
the base name (e.g., Path::new(&attachment.filename).file_name()) and reject or
replace empty/invalid results (fall back to a generated safe name), strip any
leading path separators or drive letters, and remove/nullify control characters;
then pass that sanitized base name into deduplicate_filename and use the
returned saved_filename with saved_dir.join to ensure files cannot escape the
saved directory (update the code around deduplicate_filename, saved_filename,
attachment.filename, and saved_dir.join accordingly).
- Around line 553-582: The current save flow is racy: filename_taken() is a
preflight only, so two concurrent tasks can both pick the same saved_filename
and call tokio::fs::write(&disk_path, ...) then INSERT into saved_attachments,
allowing silent overwrites; fix by performing atomic file creation and DB-backed
uniqueness with retries: attempt to create the file atomically (e.g., open with
create_new semantics) for the chosen saved_filename and on failure pick a new
name and retry, and also add a UNIQUE constraint on saved_filename in the
saved_attachments table and catch constraint violations from the
sqlx::query(...).execute(pool).await to retry generating a new id/saved_filename
(or abort after N attempts); update the code paths around filename_taken(),
tokio::fs::write, disk_path/disk_path_str, and the INSERT into saved_attachments
to implement this retry loop.
- Around line 633-639: The format string that uses &attachment.id[..8] can panic
if attachment.id is shorter than 8 bytes; change the slice to guard with the
length (e.g., compute end = 8.min(attachment.id.len()) and use
&attachment.id[..end]) and update the format call that references
attachment.filename, attachment.mime_type, size_str, and attachment.id so it
safely truncates the id like AttachmentRecallTool does.

In `@src/agent/channel.rs`:
- Around line 1268-1286: The current matching of saved bytes to incoming
attachments using .find(|(meta, _)| meta.filename == attachment.filename) is
fragile when filenames repeat; update the logic in the block that builds
attachment_content (using saved_data and attachments) to preserve ordering or
use a stable identifier instead of filename: either zip saved_data and
attachments by index (since save_channel_attachments processes them in order) to
pick the corresponding bytes for each attachment, or match by a unique platform
attachment ID on crate::Attachment/metadata if available; ensure you still skip
audio/* into unsaved and call channel_attachments::content_from_bytes with the
correctly-aligned bytes for each attachment.
- Around line 1600-1615: The code is matching saved bytes by meta.filename which
can misattribute when multiple attachments share a filename; change the find
predicate in this single-message path to match by a stable content identifier
(e.g., meta.content_id, meta.blob_id, or a checksum/hash) that exists on both
saved_data entries and Attachment (fall back to filename only if that identifier
is absent). Update the predicate used in the closure passed to
saved_data.iter().find(...) (the tuple pattern currently used) to compare the
unique id/hash fields (and compute/propagate a hash if necessary), keep the
audio check using attachment.mime_type as-is, and leave
channel_attachments::content_from_bytes(bytes, attachment) behavior unchanged
when a proper match is found.

In `@src/tools/attachment_recall.rs`:
- Around line 88-98: AttachmentInfo currently exposes disk_path for all
responses; change the type of AttachmentInfo::disk_path from String to
Option<String> and update serialization to omit None (e.g., via serde's default
behavior or skip_serializing_if) so callers don't see internal paths by default.
Then modify all places that construct AttachmentInfo (e.g., any constructors,
list_attachments, get_content, or AttachmentInfo::new) to leave disk_path as
None, and only populate Some(path) when handling the explicit get_path action
(e.g., in the handler/function that returns the path). Ensure callers that
expect disk_path handle Option<String> accordingly.
- Around line 261-269: The current size check uses the DB field
attachment.size_bytes which can be stale; before reading the file into memory in
the attachment recall flow (the block that compares attachment.size_bytes to
MAX_CONTENT_SIZE and returns AttachmentRecallError), stat the live file on disk
and use the live size for the enforcement instead of the DB value (use the
attachment's stored file path/get_path result and call a filesystem metadata
stat to get actual size, format with format_size as before and keep the same
error message and suggestion to use get_path/delegate to a worker). Ensure all
branches that read the file use this live-size check to avoid loading oversized
files into memory.
- Around line 354-367: The SQL uses id LIKE ? || '%' which treats % and _ as
wildcards and lets malformed/empty IDs match; fix by treating the provided id as
a literal prefix: escape any '%' '_' and '\' characters in the id before binding
(replace '\' -> '\\', '%' -> '\%', '_' -> '\_') and change the query to include
an ESCAPE clause (e.g. "WHERE channel_id = ? AND id LIKE ? || '%' ESCAPE '\\'").
Update the binding to use the escaped id (the same id variable passed to
.bind(id) should be the escaped version) and keep the existing error mapping
(AttachmentRow, AttachmentRecallError, self.channel_id, self.pool) otherwise
unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5dfbad3d-29f7-45f8-a4fe-c1e3a5a22d05

📥 Commits

Reviewing files that changed from the base of the PR and between 5a1dd7f and 6c2553c.

📒 Files selected for processing (3)
  • src/agent/channel.rs
  • src/agent/channel_attachments.rs
  • src/tools/attachment_recall.rs

- Sanitize filenames to prevent path traversal (sanitize_filename())
- Replace #[serde(skip)] content field with inline text in summary
- Use atomic file creation (create_new) to prevent dedup races
- Add UNIQUE constraint on saved_filename via new migration
- Match saved bytes by positional index, not filename
- Inherit defaults.channel config in per-agent resolution
- Make disk_path Optional to avoid leaking paths in list responses
- Return recoverable errors as structured Ok output, not Err
- Check live file size from disk metadata, not stale DB value
- Use literal prefix match for ID lookup (no SQL wildcards)
- Fix potential panic on short IDs with .get(..8)
- Rename meta_json/meta variables for clarity
- Add language tags to design doc code blocks
@jamiepine jamiepine merged commit 7e78eee into main Mar 7, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant