feat: add Pi Coding Agent rollout seed source#514
Conversation
Add support for ingesting Pi Coding Agent session artifacts as an agent rollout seed source. Pi sessions are tree-structured JSONL files; the handler resolves the active conversation path by walking from the last entry back to the root via id/parentId links. Key points: - Tree-structured sessions with automatic active-path resolution - Entry-level types: model_change, compaction, branch_summary, custom_message, thinking_level_change - Message roles: user, assistant (inline ToolCall/ThinkingContent/ TextContent blocks), toolResult, bashExecution (synthesized as tool-call pairs), custom, compactionSummary, branchSummary - Extract shared normalize_message_content to utils.py (was duplicated in Hermes handler)
|
Docs preview: https://b77fb86c.dd-docs-preview.pages.dev
|
PR Review: #514 — feat: add Pi Coding Agent rollout seed sourceSummaryThis PR adds support for ingesting Pi Coding Agent session artifacts as an agent rollout seed source. Pi sessions are tree-structured JSONL files; the handler resolves the active conversation path by walking from the last entry back to the root via Scope: 7 files changed (+1345 / -42), 2 new files (handler + tests), 5 modified files (config, registry, utils, hermes handler, docs). CI Status: Lint, license headers, all E2E tests, and most unit test suites pass. A few CI jobs are still pending at review time (coverage check, some test matrix entries). FindingsPositive Observations
Potential Issues
Nits (Non-blocking)
VerdictApprove. This is a clean, well-structured addition that follows established patterns in the codebase. The handler correctly normalizes Pi's tree-structured sessions into the shared rollout schema. The refactoring of The few observations noted above (timestamp scope, branch detection edge cases) are informational and don't require changes. |
Greptile SummaryThis PR adds a The implementation is clean and correctly follows the established handler pattern. All message types, entry-level types, branch detection, and edge cases (empty files, excluded bash executions, forked sessions) are covered by 17 targeted unit tests.
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/pi_coding_agent.py | New handler: active-path resolution via parentId walk, full message-role normalization, bashExecution synthetic tool-call pairs, branch detection — logic is correct and well-structured. |
| packages/data-designer-engine/tests/engine/resources/agent_rollout/test_pi_coding_agent.py | 17 tests covering happy path, parallel tool calls, branch resolution, excluded bash executions, all entry/message types, and error cases — comprehensive coverage. |
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/utils.py | Added shared normalize_message_content utility extracted from Hermes — identical logic, no behavioral change. |
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/hermes_agent.py | Replaced local _normalize_message_content with the shared utility from utils.py — pure refactor, identical behaviour. |
| packages/data-designer-config/src/data_designer/config/seed_source.py | Added PI_CODING_AGENT enum value, default path function, format defaults tuple, and updated field descriptions — straightforward additions consistent with existing pattern. |
| packages/data-designer-engine/src/data_designer/engine/resources/agent_rollout/registry.py | Registered PiCodingAgentRolloutFormatHandler alongside existing handlers — minimal, correct change. |
| docs/concepts/agent-rollout-ingestion.md | Added Pi Coding Agent tab, expanded the normalized-field table with a new column, and updated the Notes section — accurate and consistent with the implementation. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[JSONL file] --> B[load_jsonl_rows]
B --> C{First row type == 'session'?}
C -- No --> ERR[AgentRolloutSeedParseError]
C -- Yes --> D[Extract session header\nsession_id, cwd, version, timestamp]
D --> E[entries = rows 1..end]
E --> F[_resolve_active_path\nwalk entries-last back to root via parentId]
F --> G[active_entries in chronological order]
G --> H{entry_type?}
H -- model_change --> I[Track models_used]
H -- compaction / branch_summary --> J[Emit system message with summary]
H -- custom_message display=true --> K[Emit system message]
H -- message --> L{role?}
L -- user --> M[build_message role=user]
L -- assistant --> N[_normalize_pi_assistant_message\ntext + thinking + toolCalls]
L -- toolResult --> O[build_message role=tool]
L -- bashExecution --> P[_normalize_pi_bash_execution\nassistant tool-call + tool-result pair]
L -- custom display=true --> Q[build_message role=system]
L -- compactionSummary / branchSummary --> R[build_message role=system with summary]
I & J & K & M & N & O & P & Q & R --> S[messages list]
E --> T[_detect_branches\nparentId seen twice?]
S & T & D --> U[NormalizedAgentRolloutRecord]
Reviews (2): Last reviewed commit: "Merge branch 'main' into johnny/feature/..." | Re-trigger Greptile
eric-tramel
left a comment
There was a problem hiding this comment.
Nothing stood out for me, thanks @johnnygreco !
📋 Summary
Add support for ingesting Pi Coding Agent session artifacts as an agent rollout seed source. Pi sessions are tree-structured JSONL files stored at
~/.pi/agent/sessions/; the handler resolves the active conversation path by walking from the last entry back to the root viaid/parentIdlinks, then normalizes all message and entry types into the shared rollout schema.🔗 Related Issue
Closes #513
🔄 Changes
✨ Added
pi_coding_agent.py— Format handler with tree-structured session parsing, active-path resolution, and normalization of all Pi message roles (user,assistant,toolResult,bashExecution,custom,compactionSummary,branchSummary) and entry-level types (model_change,compaction,branch_summary,custom_message,thinking_level_change)test_pi_coding_agent.py— 17 tests covering realistic session structure, parallel tool calls, branch resolution, bashExecution normalization, entry-level types, error cases, and edge casesPI_CODING_AGENTenum value, default path (~/.pi/agent/sessions), and format defaults inseed_source.pynormalize_message_contentshared utility inutils.pyagent-rollout-ingestion.mddocs🔧 Changed
hermes_agent.py— Replaced local_normalize_message_contentwith sharednormalize_message_contentfromutils.py(identical logic was duplicated)registry.py— RegisteredPiCodingAgentRolloutFormatHandler🧪 Testing
make check-all-fixpasses (lint + format)✅ Checklist
make update-license-headers