Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
373 changes: 373 additions & 0 deletions docs/design-docs/channel-attachment-persistence.md

Large diffs are not rendered by default.

15 changes: 15 additions & 0 deletions migrations/20260306000001_saved_attachments.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
CREATE TABLE IF NOT EXISTS saved_attachments (
id TEXT PRIMARY KEY,
channel_id TEXT NOT NULL,
message_id TEXT,
original_filename TEXT NOT NULL,
saved_filename TEXT NOT NULL,
mime_type TEXT NOT NULL,
size_bytes INTEGER NOT NULL,
disk_path TEXT NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (channel_id) REFERENCES channels(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_saved_attachments_channel ON saved_attachments(channel_id, created_at);
CREATE INDEX IF NOT EXISTS idx_saved_attachments_message ON saved_attachments(message_id);
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
CREATE UNIQUE INDEX IF NOT EXISTS idx_saved_attachments_saved_filename ON saved_attachments(saved_filename);
8 changes: 8 additions & 0 deletions prompts/en/tools/attachment_recall_description.md.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Recall saved file attachments from this channel's history. Files sent by users are automatically saved to disk when attachment persistence is enabled.

Three actions:
- **list**: Show recent saved attachments (filename, type, size, ID).
- **get_path**: Get the absolute disk path of a saved file. Use this when delegating work to a worker — include the path in the task description so the worker can access the file directly.
- **get_content**: Re-load a saved image or text file for analysis. Use this when you need to see a file that was sent earlier in the conversation. Only works for images and text files under 10 MB — for other file types or larger files, use get_path and delegate to a worker.

You can identify attachments by their ID (shown in history annotations like `id:a1b2c3d4`) or by their original filename.
Comment on lines +3 to +8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t nudge the model toward multi-megabyte inline recall or ambiguous filename lookup.

This wording makes get_content sound safe for anything under 10 MB and treats original filenames as first-class identifiers. Both are risky: multi-megabyte inline/base64 payloads can swamp context, and repeated uploads of report.pdf are inherently ambiguous. The prompt should steer agents to prefer IDs and reserve get_content for genuinely small previews/snippets.

Suggested wording
 Three actions:
 - **list**: Show recent saved attachments (filename, type, size, ID).
 - **get_path**: Get the absolute disk path of a saved file. Use this when delegating work to a worker — include the path in the task description so the worker can access the file directly.
-- **get_content**: Re-load a saved image or text file for analysis. Use this when you need to see a file that was sent earlier in the conversation. Only works for images and text files under 10 MB — for other file types or larger files, use get_path and delegate to a worker.
+- **get_content**: Re-load a small saved image or text file for analysis. Use this for small previews/snippets only. For larger files, repeated filenames, or non-text/binary formats, use **get_path** and delegate to a worker.
 
-You can identify attachments by their ID (shown in history annotations like `id:a1b2c3d4`) or by their original filename.
+Prefer attachment IDs (shown in history annotations like `id:a1b2c3d4`). Original filenames may be ambiguous if the same filename was uploaded more than once.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prompts/en/tools/attachment_recall_description.md.j2` around lines 3 - 8,
Revise the attachment actions text to discourage inline/base64 recall and
ambiguous filename lookup: update the description of get_content to explicitly
limit it to very small previews (e.g., "tiny text/images only, ≤100 KB") and
warn that larger files or non-image/non-text must use get_path; emphasize that
agents should prefer attachment IDs (e.g., id:a1b2c3d4) over original filenames
because filenames like "report.pdf" are ambiguous; keep list and get_path
descriptions but add a short note to include the absolute path in worker task
descriptions when using get_path and to avoid embedding multi-megabyte content
inline.

2 changes: 1 addition & 1 deletion scripts/gate-pr.sh
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ run_step "cargo check --all-targets" cargo check --all-targets
if $fast_mode; then
log "fast mode enabled: skipping clippy and integration test compile"
else
run_step "cargo clippy --all-targets" cargo clippy --all-targets
run_step "RUSTFLAGS=\"-Dwarnings\" cargo clippy --all-targets" env RUSTFLAGS="-Dwarnings" cargo clippy --all-targets
fi

run_step "cargo test --lib" cargo test --lib
Expand Down
158 changes: 147 additions & 11 deletions src/agent/channel.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
//! Channel: User-facing conversation process.

use crate::agent::channel_attachments;
use crate::agent::channel_attachments::download_attachments;
use crate::agent::channel_dispatch::spawn_memory_persistence_branch;
use crate::agent::channel_history::{
Expand Down Expand Up @@ -562,7 +563,12 @@ impl Channel {
persisted
}

fn persist_inbound_user_message(&self, message: &InboundMessage, raw_text: &str) {
fn persist_inbound_user_message(
&self,
message: &InboundMessage,
raw_text: &str,
saved_attachments: Option<&[channel_attachments::SavedAttachmentMeta]>,
) {
if message.source == "system" {
return;
}
Expand All @@ -571,16 +577,28 @@ impl Channel {
.get("sender_display_name")
.and_then(|v| v.as_str())
.unwrap_or(&message.sender_id);

// If attachments were saved, enrich the metadata with their info
let metadata = if let Some(saved) = saved_attachments {
let mut enriched = message.metadata.clone();
if let Ok(attachments_json) = serde_json::to_value(saved) {
enriched.insert("attachments".to_string(), attachments_json);
}
enriched
} else {
message.metadata.clone()
};

self.state.conversation_logger.log_user_message(
&self.state.channel_id,
sender_name,
&message.sender_id,
raw_text,
&message.metadata,
&metadata,
);
self.state
.channel_store
.upsert(&message.conversation_id, &message.metadata);
.upsert(&message.conversation_id, &metadata);
}

fn suppress_plaintext_fallback(&self) -> bool {
Expand Down Expand Up @@ -1120,7 +1138,20 @@ impl Channel {
}

// Persist each message to conversation log (individual audit trail)
let mut pending_batch_entries: Vec<(String, Vec<_>)> = Vec::new();
let save_attachments_enabled = self
.deps
.runtime_config
.channel_config
.load()
.save_attachments;
let saved_dir = self.deps.runtime_config.saved_dir();

// Entries: (formatted_text, attachments, optional saved bytes per attachment)
let mut pending_batch_entries: Vec<(
String,
Vec<crate::Attachment>,
Option<Vec<channel_attachments::SavedAttachmentWithBytes>>,
)> = Vec::new();
let mut conversation_id = String::new();
let temporal_context = TemporalContext::from_runtime(self.deps.runtime_config.as_ref());
let mut batch_has_invoke = false;
Expand Down Expand Up @@ -1151,16 +1182,44 @@ impl Channel {
invoked_by_command || invoked_by_mention || invoked_by_reply;
}

// Save attachments to disk when enabled
let saved_data = if save_attachments_enabled && !attachments.is_empty() {
Some(
channel_attachments::save_channel_attachments(
&self.deps.sqlite_pool,
self.deps.llm_manager.http_client(),
self.state.channel_id.as_ref(),
&saved_dir,
&attachments,
)
.await,
)
} else {
None
};

// Enrich metadata with saved attachment info
let metadata = if let Some(ref data) = saved_data {
let metas: Vec<_> = data.iter().map(|(meta, _)| meta.clone()).collect();
let mut enriched = message.metadata.clone();
if let Ok(json) = serde_json::to_value(&metas) {
enriched.insert("attachments".to_string(), json);
}
enriched
} else {
message.metadata.clone()
};

self.state.conversation_logger.log_user_message(
&self.state.channel_id,
sender_name,
&message.sender_id,
&raw_text,
&message.metadata,
&metadata,
);
self.state
.channel_store
.upsert(&message.conversation_id, &message.metadata);
.upsert(&message.conversation_id, &metadata);

conversation_id = message.conversation_id.clone();

Expand All @@ -1187,7 +1246,7 @@ impl Channel {
&raw_text,
);

pending_batch_entries.push((formatted_text, attachments));
pending_batch_entries.push((formatted_text, attachments, saved_data));
}
}

Expand All @@ -1204,9 +1263,31 @@ impl Channel {
}

let mut user_contents: Vec<UserContent> = Vec::new();
for (formatted_text, attachments) in pending_batch_entries {
for (formatted_text, attachments, saved_data) in pending_batch_entries {
if !attachments.is_empty() {
let attachment_content = download_attachments(&self.deps, &attachments).await;
let attachment_content = if let Some(ref saved) = saved_data {
let mut content = Vec::new();
let mut unsaved = Vec::new();
for (index, attachment) in attachments.iter().enumerate() {
if let Some((_, bytes)) = saved.get(index) {
if attachment.mime_type.starts_with("audio/") {
unsaved.push(attachment.clone());
} else {
content.push(channel_attachments::content_from_bytes(
bytes, attachment,
));
}
} else {
unsaved.push(attachment.clone());
}
}
if !unsaved.is_empty() {
content.extend(download_attachments(&self.deps, &unsaved).await);
}
content
} else {
download_attachments(&self.deps, &attachments).await
};
for content in attachment_content {
user_contents.push(content);
}
Expand Down Expand Up @@ -1362,7 +1443,34 @@ impl Channel {
crate::MessageContent::Interaction { .. } => (message.content.to_string(), Vec::new()),
};

self.persist_inbound_user_message(&message, &raw_text);
// Save attachments to disk when enabled, capturing bytes for LLM reuse
let save_attachments_enabled = self
.deps
.runtime_config
.channel_config
.load()
.save_attachments;
let saved_attachment_data = if save_attachments_enabled && !attachments.is_empty() {
let saved_dir = self.deps.runtime_config.saved_dir();
Some(
channel_attachments::save_channel_attachments(
&self.deps.sqlite_pool,
self.deps.llm_manager.http_client(),
self.state.channel_id.as_ref(),
&saved_dir,
&attachments,
)
.await,
)
} else {
None
};

let saved_metas: Option<Vec<_>> = saved_attachment_data
.as_ref()
.map(|data| data.iter().map(|(meta, _)| meta.clone()).collect());

self.persist_inbound_user_message(&message, &raw_text, saved_metas.as_deref());

// Deterministic built-in command: bypass model output drift for agent identity checks.
if message.source != "system" && raw_text.trim() == "/agent-id" {
Expand Down Expand Up @@ -1480,7 +1588,35 @@ impl Channel {

let is_retrigger = message.source == "system";
let attachment_content = if !attachments.is_empty() {
download_attachments(&self.deps, &attachments).await
if let Some(ref saved_data) = saved_attachment_data {
// Reuse already-downloaded bytes for images/text; audio still
// needs transcription via the normal path so we fall through.
let mut content = Vec::new();
let mut unsaved_attachments = Vec::new();

for (index, attachment) in attachments.iter().enumerate() {
if let Some((_, bytes)) = saved_data.get(index) {
// Audio attachments need transcription, not just bytes
if attachment.mime_type.starts_with("audio/") {
unsaved_attachments.push(attachment.clone());
} else {
content
.push(channel_attachments::content_from_bytes(bytes, attachment));
}
} else {
unsaved_attachments.push(attachment.clone());
}
}

// Process any attachments that weren't saved (or need transcription)
if !unsaved_attachments.is_empty() {
let extra = download_attachments(&self.deps, &unsaved_attachments).await;
content.extend(extra);
}
content
} else {
download_attachments(&self.deps, &attachments).await
}
} else {
Vec::new()
};
Expand Down
Loading