Skip to content

Store subagent transcripts as separate prompt records#562

Open
jwiegley wants to merge 9 commits intomainfrom
johnw/fix-371
Open

Store subagent transcripts as separate prompt records#562
jwiegley wants to merge 9 commits intomainfrom
johnw/fix-371

Conversation

@jwiegley
Copy link
Collaborator

@jwiegley jwiegley commented Feb 19, 2026

Summary

  • Parse Claude Code subagent transcripts from <session>/subagents/agent-<id>.jsonl into separate SubagentInfo structs instead of merging them into the parent transcript
  • Add parent_id field to PromptRecord and PromptDbRecord to link subagent prompt records back to their parent
  • Expand subagent metadata into separate prompt records at post-commit time and in virtual attributions, preserving thread hierarchy
  • Make the v3→v4 DB migration idempotent: apply_migration() now catches "duplicate column name" errors from ALTER TABLE ADD COLUMN so a crash between the DDL and the version-bump doesn't brick the database on restart

Closes #371

Design

Subagent transcripts stored in Claude Code's subagents/ directory are now collected as SubagentInfo structs during JSONL parsing rather than being flattened into the parent transcript. These are propagated through the checkpoint pipeline via PromptUpdateResult::Updated and stored as JSON metadata (__subagents key) on the parent checkpoint.

At post-commit time (and in virtual attributions), the subagent metadata is expanded into separate PromptDbRecord entries, each with a unique hash ID and a parent_id linking back to the parent prompt. This preserves the natural thread structure of agentic coding sessions.

DB schema migrated from version 3 to 4 to add the parent_id TEXT column to the prompts table. The migration is idempotent — if the column already exists (e.g. from a previous partial migration), the error is caught and silently skipped.

Updates since last revision

  • Merged latest main into the branch and resolved merge conflicts in tests/claude_code.rs (added new subagent test names to the reuse_tests_in_worktree! macro)
  • Made apply_migration() idempotent by catching "duplicate column name" SQLite errors instead of crashing
  • Added test_initialize_schema_handles_preexisting_parent_id_column to verify the idempotency path
  • Fixed missing parent_id: None in tests/worktrees.rs PromptRecord initializer (compilation error)
  • Fixed formatting (extra blank line in tests/claude_code.rs)

Test plan

  • cargo clippy — no warnings
  • cargo test --test claude_code — all 24 tests pass
  • cargo test --test agent_presets_comprehensive — all 58 tests pass
  • New test: test_parse_claude_code_jsonl_with_subagents verifies subagents are returned separately
  • Updated test: test_parse_claude_code_jsonl_without_subagents_dir verifies empty vec for no subagents
  • New test: test_initialize_schema_handles_preexisting_parent_id_column verifies idempotent migration
  • New test fixtures: claude-code-with-subagents.jsonl and subagents/agent-test-sub-1.jsonl
  • CI: Format ✅, Lint (all platforms) ✅, Ubuntu tests (hooks/wrapper/both) ✅

Review & Testing Checklist for Human

There are 4 items to verify given the scope of DB schema + query changes:

  • Idempotency approach: Verify that catching "duplicate column name" via string matching is acceptable for SQLite migrations (this is a common pattern but relies on error message text)
  • SQL query coverage: Spot-check that all SELECT, INSERT, and UPDATE queries in internal_db.rs include the new parent_id column in the correct position (column order matters for positional binding)
  • Test coverage: Confirm all PromptRecord initializers in test files include parent_id: None (compilation would fail if any were missed, but worth double-checking)
  • Subagent parsing correctness: Review test_parse_claude_code_jsonl_with_subagents to ensure subagent messages are correctly separated from the parent transcript and not duplicated

Notes

  • The parent_id field is Option<String> with #[serde(default)], so existing serialized data without this field will deserialize correctly
  • The migration is safe to run multiple times — if the column already exists, the error is caught and the version is still updated
  • All Ubuntu CI checks are passing; macOS and Windows checks are still pending but not blocking per user request

🤖 Generated with Devin
Requested by: @svarlamov

@git-ai-cloud-dev
Copy link

git-ai-cloud-dev bot commented Feb 19, 2026

Stats powered by Git AI

🧠 you    ██████████████████░░  91%
🤖 ai     ░░░░░░░░░░░░░░░░░░██  9%
More stats
  • 0.5 lines generated for every 1 accepted
  • 1 minute waiting for AI
  • Top model: claude::claude-opus-4-6 (288 accepted lines, 5 generated lines)

AI code tracked with git-ai

@git-ai-cloud
Copy link

git-ai-cloud bot commented Feb 19, 2026

Stats powered by Git AI

🧠 you    ██████████████████░░  91%
🤖 ai     ░░░░░░░░░░░░░░░░░░██  9%
More stats
  • 0.5 lines generated for every 1 accepted
  • 1 minute waiting for AI
  • Top model: claude::claude-opus-4-6 (288 accepted lines, 5 generated lines)

AI code tracked with git-ai

@jwiegley jwiegley marked this pull request as ready for review February 19, 2026 03:27
devin-ai-integration[bot]

This comment was marked as resolved.

Base automatically changed from johnw/fix-370 to main February 19, 2026 04:14
@svarlamov
Copy link
Member

svarlamov commented Feb 19, 2026

Append subagent messages to the main transcript in sorted filename order for deterministic results

I'm not a huge fan of this Claude... May have mentioned it elsewhere, but can we add them as normal prompts into the authorship log but just include a parentId that will point to the parent prompt?

devin-ai-integration[bot]

This comment was marked as resolved.

@jwiegley jwiegley changed the title Include subagent transcripts in Claude Code JSONL parsing Store subagent transcripts as separate prompt records Feb 19, 2026
@jwiegley jwiegley requested a review from svarlamov February 20, 2026 03:04
jwiegley and others added 5 commits February 26, 2026 17:55
Claude Code stores subagent (Task tool) transcripts in separate JSONL
files at <session-uuid>/subagents/agent-<id>.jsonl, but the transcript
parser only read the main session file. This meant all subagent
conversation content was silently dropped from git-ai authorship records.

Extract the JSONL line parsing into a reusable parse_claude_jsonl_content
helper, then after parsing the main transcript, discover and parse any
subagent JSONL files from the sibling subagents directory. Subagent
messages are appended to the main transcript in sorted filename order
for deterministic results.

Fixes #371

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds an optional parent_id field to both PromptRecord (git notes) and
PromptDbRecord (SQLite). This links subagent prompt records back to
their parent prompt, enabling hierarchical transcript storage.

Includes DB migration 3→4 (ALTER TABLE prompts ADD COLUMN parent_id)
and updates all construction sites with parent_id: None.

Refs: #371

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of merging Claude Code subagent messages into the parent
transcript, each subagent now produces a separate PromptRecord with
parent_id linking it to the parent prompt.

- Add SubagentInfo struct; parser returns subagents separately
- Propagate subagents through PromptUpdateResult pipeline
- Serialize subagent info into checkpoint agent_metadata
- Expand into separate PromptDbRecords at post-commit DB upsert
- Expand into separate PromptRecords in VirtualAttributions

Fixes: #371

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update all test files for the new 3-tuple return type from
transcript_and_model_from_claude_code_jsonl and the parent_id
field on PromptRecord/PromptDbRecord.

The subagent test now verifies that subagents are returned
separately (not merged) and that the main transcript contains
only main-session messages.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run cargo fmt on all changed files. Fix
test_claude_transcript_parsing_malformed_json which incorrectly
expected Err — the parser skips unparseable lines by design,
returning Ok with an empty transcript.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

jwiegley and others added 2 commits February 26, 2026 18:31
Update test_initialize_schema to expect version "4" instead of "3"
since SCHEMA_VERSION was bumped when adding the parent_id column.

Fix test_initialize_schema_handles_preexisting_cas_cache_table to
include the prompts and cas_sync_queue tables in its setup, since
the test simulates being at schema version 2 (meaning migrations
0->1 and 1->2 have already run). Without these tables, migration
3->4 (ALTER TABLE prompts ADD COLUMN parent_id) fails.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ jwiegley
❌ devin-ai-integration[bot]
You have signed the CLA already but the status is still pending? Let us recheck it.

devin-ai-integration bot and others added 2 commits February 28, 2026 02:08
Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Claude Code Subagent transcripts not saved in the transcript

3 participants