feat: CLI-library feature parity (Phase 1 NDJSON + Phase 2)#101
feat: CLI-library feature parity (Phase 1 NDJSON + Phase 2)#101clintecker merged 20 commits intomainfrom
Conversation
Design for promoting the diagnose, audit, doctor, and simulate CLI commands into the top-level tracker package as structured reports, plus lifting the private NDJSON event writer to a public API. Closes the Phase 1 NDJSON gap and the full Phase 2 set from #76. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven-task plan covering shared activity helpers, NDJSON writer promotion, Simulate / Diagnose / Audit / Doctor library APIs, and docs. TDD steps with fixture run directories; byte-for-byte CLI output preserved via smoke-diff at each task boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move ResolveRunDir, MostRecentRunID, ActivityEntry, LoadActivityLog, ParseActivityLine into the tracker package. Prerequisite for exposing Diagnose / Audit library APIs. No behavior change; CLI wired to the library helpers. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Log warning instead of silent skip on non-ENOENT checkpoint load errors in MostRecentRunID (CLAUDE.md: never silently swallow errors). - Remove misleading JSON tags from ActivityEntry — the struct is populated via ParseActivityLine, not json.Unmarshal (two timestamp formats can't be handled by time.Time's default unmarshal). - Drop gratuitous type-alias comment in cmd/tracker/audit.go. - Add direct tests for SortActivityByTime ordering and equal timestamps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NewNDJSONWriter exposes the same --json wire format the CLI uses via PipelineHandler / AgentHandler / TraceObserver. Closes the Phase 1 NDJSON gap from #76. CLI json_stream.go removed; run.go wires the library writer. Wire format unchanged byte-for-byte. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Log marshal errors to stderr (CLAUDE.md: never silently swallow). - Log first w.Write error to stderr via sync.Once; further write errors suppressed to avoid flooding. - Add test covering AgentHandler content priority (ToolOutput wins over Text when both are populated). - Extract duplicated timestamp layout to a package-level constant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Simulate(source) returns a SimulateReport with nodes, edges, execution plan, and unreachable node list. CLI simulate command becomes a printer over the report; output byte-identical. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- SimulateReport.Unreachable is now sorted alphabetically; eliminates non-deterministic output driven by Go map iteration order. - SimulateReport gains GraphAttrs; CLI reads from the report instead of re-parsing the pipeline just for graph-level attrs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Diagnose(runDir) and tracker.DiagnoseMostRecent(workdir) return a structured DiagnoseReport with failures, budget halt, and typed suggestions. CLI diagnose command shrunk to a printer over the report; output byte-identical. Fixture runs added under testdata/runs/ (ok, failed, budget_halted) for library unit tests. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- CLI: skip SuggestionBudget in the suggestions printer; printBudgetHalt already renders the budget guidance. Eliminates duplicated prose on budget-halted runs. - Library: log non-ENOENT status.json parse errors to stderr instead of swallowing silently (CLAUDE.md: never silently swallow errors). - Clarify NodeFailure.RetryCount doc comment — it counts failures, not retries beyond the first attempt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Audit(runDir) returns a structured AuditReport (timeline, retries, errors, recommendations). tracker.ListRuns(workdir) returns []RunSummary sorted newest first. CLI audit command shrunk to a printer over the reports; output byte-identical. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Log non-ENOENT activity-log load errors in the run summary builder instead of silently discarding (CLAUDE.md: never silently swallow). - Remove redundant sort.Strings from printRecommendations; library already sorts the recommendations slice. - ListRuns now uses sort.SliceStable for deterministic ordering on timestamp ties. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Doctor(cfg) runs preflight checks and returns a DoctorReport. ProbeProviders defaults to false (no network calls); CLI sets it true to preserve today's behavior. Write side effects (gitignore fixup, workdir create) stay CLI-only. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- CLI --probe flag default flipped to true, preserving pre-refactor tracker doctor behavior (live API auth check runs by default). - TestDoctor_NoProviderKeys now unsets all five known provider env vars to avoid false negatives when the dev environment has GOOGLE_API_KEY or OPENAI_COMPAT_API_KEY set. - Artifact-dirs check now promotes Status to "error" when any detail is "error" (e.g., unwritable .ai/ directory); previously those cases were silently demoted to "warn" and the run proceeded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughLibrary-first refactor: promotes CLI logic into a public Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as User/CLI
participant Pipeline as Pipeline
participant LLM as LLM
participant Agent as Agent
participant Writer as tracker.NDJSONWriter
participant Stdout as Stdout
CLI->>Pipeline: start run / invoke tracker.Run
Pipeline-->>Writer: emit pipeline events (stage_started/completed/failed)
alt LLM traces generated
LLM-->>Writer: emit trace events (provider, model, preview)
end
alt Agent activities occur
Agent-->>Writer: emit agent events (tool output or text)
end
Writer->>Stdout: marshal NDJSONEvent -> write newline (thread-safe)
Stdout-->>CLI: stream of NDJSON lines (--json)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42579425b4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| source = string(data) | ||
| } | ||
|
|
||
| report, err := tracker.Simulate(source) |
There was a problem hiding this comment.
Preserve requested pipeline format in simulate report parsing
runSimulateCmd already parsed the pipeline with loadPipeline(resolved, formatOverride), but this second parse calls tracker.Simulate(source) with no format context. That drops --format/extension-based resolution and can misclassify valid DOT input (for example DOT files with a leading comment/BOM), causing tracker simulate to fail after the first parse succeeded. This is a regression from the previous single-parse flow and breaks valid simulate inputs.
Useful? React with 👍 / 👎.
| if d.Status == "error" { | ||
| out.Status = "error" | ||
| break |
There was a problem hiding this comment.
Keep artifact-directory problems as doctor warnings
This promotion turns artifact-directory permission problems into Status="error", which now counts as a hard failure in cmd/tracker/doctor.go (toDoctorResult maps report errors directly to failures). Previously this check was non-required and surfaced as a warning, so tracker doctor could exit with warning-only status; after this change the same .ai issues now return exit code 1 and can fail CI/scripts unexpectedly.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cmd/tracker/diagnose.go (1)
11-192:⚠️ Potential issue | 🟡 MinorRun
gofmton this file before merge.CI is currently red on
cmd/tracker/diagnose.go, so this change set does not pass the repository’s format gate yet.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/diagnose.go` around lines 11 - 192, The file cmd/tracker/diagnose.go is not gofmt'ed; run a formatter to fix whitespace/formatting so CI passes. Fix by running `gofmt -w` (or `gofmt -w cmd/tracker/diagnose.go` / `goimports -w`) on the file containing functions like diagnoseMostRecent, printDiagnoseHeader, printNodeDiagnosis, and printDiagnoseSuggestions to normalize import formatting, spacing, and line breaks, then re-run CI.
🧹 Nitpick comments (8)
tracker_activity.go (1)
78-82: Library function writes directly to os.Stderr.
MostRecentRunIDwrites warnings toos.Stderr, which is unexpected for a library function. Library consumers may want to capture, suppress, or redirect these warnings. Consider returning warnings as part of the function signature or using a logger interface.♻️ Alternative: Return warnings alongside the result
-func MostRecentRunID(workdir string) (string, error) { +// MostRecentRunIDResult contains the result and any warnings from MostRecentRunID. +type MostRecentRunIDResult struct { + RunID string + Warnings []string +} + +func MostRecentRunID(workdir string) (MostRecentRunIDResult, error) { runsDir := filepath.Join(workdir, ".tracker", "runs") entries, err := os.ReadDir(runsDir) if err != nil { if os.IsNotExist(err) { - return "", fmt.Errorf("no runs found — run a pipeline first") + return MostRecentRunIDResult{}, fmt.Errorf("no runs found — run a pipeline first") } - return "", fmt.Errorf("cannot read runs directory: %w", err) + return MostRecentRunIDResult{}, fmt.Errorf("cannot read runs directory: %w", err) } var latestID string var latestTime time.Time + var warnings []string for _, e := range entries { // ... if err != nil { if !os.IsNotExist(err) { - fmt.Fprintf(os.Stderr, "warning: skipping run %s: %v\n", e.Name(), err) + warnings = append(warnings, fmt.Sprintf("skipping run %s: %v", e.Name(), err)) } continue } // ... } if latestID == "" { - return "", fmt.Errorf("no runs found with valid checkpoints") + return MostRecentRunIDResult{}, fmt.Errorf("no runs found with valid checkpoints") } - return latestID, nil + return MostRecentRunIDResult{RunID: latestID, Warnings: warnings}, nil }Alternatively, this is acceptable if the CLI is the primary consumer and you document the stderr behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_activity.go` around lines 78 - 82, The MostRecentRunID function currently writes warnings directly to os.Stderr; change it to avoid side-effectful stderr writes by either (A) adding a logger/io.Writer parameter (e.g., logger or io.Writer warnOut) and writing warnings to that instead, or (B) accumulate warning messages in a slice and return them alongside the result (e.g., change signature to return (runID string, warnings []string, err error)); update the error handling block that currently calls fmt.Fprintf(os.Stderr, ...) to append the formatted message to the warnings slice or send it to the injected logger/writer, and update call sites accordingly.tracker_doctor.go (1)
521-531: Useexec.CommandContextfor consistency.
getBinaryVersionalso usesexec.Commandwithout context. Apply the same fix for consistency and to prevent potential hangs.🔧 Proposed fix
-func getBinaryVersion(path, flag string) string { - out, err := exec.Command(path, flag).CombinedOutput() +func getBinaryVersion(ctx context.Context, path, flag string) string { + out, err := exec.CommandContext(ctx, path, flag).CombinedOutput() if err != nil { return "(version unknown)" }Then update the caller at line 486:
- claudeVer := getBinaryVersion(claudePath, "--version") + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + claudeVer := getBinaryVersion(ctx, claudePath, "--version") + cancel()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_doctor.go` around lines 521 - 531, getBinaryVersion currently calls exec.Command without a context which risks hangs; change its signature to accept a context (e.g., getBinaryVersion(ctx context.Context, path, flag string) string) and use exec.CommandContext(ctx, path, flag). Keep the same error/return behavior, and update all callers to pass the appropriate context (the calling function's ctx) so the command inherits cancellation/timeouts; reference getBinaryVersion and use exec.CommandContext for the actual replacement.tracker_audit.go (1)
215-219: Library function writes warning to os.Stderr.Similar to
MostRecentRunID,buildLibRunSummarywrites warnings directly toos.Stderr. For consistency with library design, consider an alternative approach (returning warnings, using a logger, or documenting the behavior).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_audit.go` around lines 215 - 219, buildLibRunSummary currently writes a warning directly to os.Stderr when LoadActivityLog fails (similar to MostRecentRunID); instead return the warning or use the library's logger so callers can handle it. Modify buildLibRunSummary to not print to os.Stderr: have LoadActivityLog return the error as before, and propagate that error or append a warning string to a returned warnings slice (or accept a logger interface) so callers receive the warning; specifically update the block handling (activity, lerr := LoadActivityLog(runDir)) to remove fmt.Fprintf(os.Stderr, ...) and either return lerr up the call chain or record the message into a []string warnings return value (or call a provided logger), ensuring callers of buildLibRunSummary are updated to handle the new returned warning or error.cmd/tracker/run.go (1)
242-245: Cache the agent NDJSON handler once per run.
stream.AgentHandler()is rebuilt per event inside the closure. Cache it once to avoid per-event allocation churn.♻️ Proposed refactor
func buildJSONEventHandlers( activityLog *pipeline.JSONLEventHandler, llmClient *llm.Client, logAgentEvent func(agent.Event), ) (agent.EventHandler, pipeline.PipelineEventHandler) { stream := tracker.NewNDJSONWriter(os.Stdout) if llmClient != nil { llmClient.AddTraceObserver(stream.TraceObserver()) } + streamAgentHandler := stream.AgentHandler() agentHandler := agent.EventHandlerFunc(func(evt agent.Event) { logAgentEvent(evt) - stream.AgentHandler().HandleEvent(evt) + streamAgentHandler.HandleEvent(evt) }) pipelineHandler := pipeline.PipelineMultiHandler(stream.PipelineHandler(), activityLog) return agentHandler, pipelineHandler }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/run.go` around lines 242 - 245, The closure rebuilds the NDJSON handler on every event; cache stream.AgentHandler() once before creating agentHandler to avoid per-event allocation. Retrieve and store the handler in a local variable (e.g., cachedAgentHandler := stream.AgentHandler()) and then use cachedAgentHandler.HandleEvent(evt) inside the agent.EventHandlerFunc closure; keep existing calls to logAgentEvent(evt) and the agentHandler variable name.tracker_doctor_test.go (1)
11-14: Prefert.Setenvover manualos.Setenv+ cleanup.Go 1.17+ provides
t.Setenvwhich automatically handles cleanup. This is cleaner and less error-prone.♻️ Suggested simplification
func TestDoctor_NoProbe_KeyPresent(t *testing.T) { workdir := t.TempDir() - must(t, os.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890")) - t.Cleanup(func() { os.Unsetenv("ANTHROPIC_API_KEY") }) + t.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_doctor_test.go` around lines 11 - 14, The test TestDoctor_NoProbe_KeyPresent manually calls os.Setenv and t.Cleanup to restore ANTHROPIC_API_KEY; replace that pattern with t.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890") in the TestDoctor_NoProbe_KeyPresent function so the testing framework handles teardown automatically and remove the manual os.Setenv and t.Cleanup calls.cmd/tracker/audit.go (1)
95-105: Stale comment describes removed behavior.The comment on lines 97-99 mentions "Re-read timestamps from original entries" and "rebuild from the timeline slice itself," but the code simply iterates and prints each entry. The comment seems to be a leftover from refactoring.
📝 Suggested comment cleanup
- // Track stage starts for inline duration computation during printing. stageStarts := make(map[string]time.Time) - // Re-read timestamps from original entries; TimelineEntry carries Duration - // for completed stages but we need start times for the printTimelineEntry format. - // Rebuild from the timeline slice itself. for _, entry := range timeline { printTimelineEntryFromLib(entry, stageStarts) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/audit.go` around lines 95 - 105, The existing comment incorrectly describes rebuilding timestamps from entries; update or remove it to match the actual behavior: stageStarts := make(map[string]time.Time) is created and the code simply iterates over timeline calling printTimelineEntryFromLib(entry, stageStarts) and then printTimelineTotalDurationFromLib(timeline). Replace the stale lines about "Re-read timestamps..." and "Rebuild from the timeline slice itself" with a concise comment stating that stageStarts is used for inline duration computation and each timeline entry is printed via printTimelineEntryFromLib, or remove the comment entirely.cmd/tracker/doctor_test.go (1)
245-253: Consider usingstrings.Containsinstead of a custom helper.The custom
containshelper reimplementsstrings.Contains. Adding"strings"to imports is trivial and avoids maintaining custom code.♻️ Suggested simplification
+import ( + ... + "strings" + ... +) -// contains is a tiny helper to avoid dragging strings.Contains into tests. -func contains(haystack, needle string) bool { - for i := 0; i+len(needle) <= len(haystack); i++ { - if haystack[i:i+len(needle)] == needle { - return true - } - } - return false -} +// Then use strings.Contains directly in tests🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/doctor_test.go` around lines 245 - 253, The custom contains helper reimplements standard library functionality; remove the contains(haystack, needle string) function and replace its uses with strings.Contains by adding "strings" to the imports and updating call sites to strings.Contains(haystack, needle); ensure all tests compile after removing the helper and adjust any package-scoped references to contains accordingly.cmd/tracker/simulate.go (1)
17-67: Dual parsing reads the pipeline source twice.The function first parses via
loadPipeline/loadEmbeddedPipeline(lines 25-32) for validation warnings, then re-reads the source and parses again viatracker.Simulate(lines 46-64). While the comments acknowledge this is intentional (library Simulate doesn't surface validation warnings), consider whetherpipeline.ValidateAllcould be moved to the library or if the source could be cached after the first read.This is a minor performance consideration—acceptable for now since pipeline files are typically small.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/tracker/simulate.go` around lines 17 - 67, runSimulateCmd currently reads/parses the pipeline twice: first via loadPipeline/loadEmbeddedPipeline so pipeline.ValidateAll can emit warnings, then it re-reads the file/string and calls tracker.Simulate; to avoid double reads either (A) reuse the already-loaded source by capturing the original file contents when calling loadPipeline/loadEmbeddedPipeline and pass that string into tracker.Simulate, or (B) move the validation call into tracker.Simulate so you only call tracker.Simulate once; update runSimulateCmd to stop re-reading the file when isEmbedded/is not embedded after loading (referencing runSimulateCmd, loadPipeline, loadEmbeddedPipeline, pipeline.ValidateAll, and tracker.Simulate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 10-23: Add a new "### Fixed" subsection under the Unreleased
heading and list the behavior/determinism and error-handling fixes introduced by
this PR (e.g., deterministic ordering and improved error handling in
event/tracking flows), referencing the affected APIs and files so readers can
find the fixes—mention tracker.NewNDJSONWriter and the factory handlers
(PipelineHandler, AgentHandler, TraceObserver) if they gained deterministic
output, and call out CLI/library behavior fixes in cmd/tracker/diagnose.go,
audit.go, doctor.go, and simulate.go (and any specific fixes to run-directory
resolution or activity parsing like ResolveRunDir, MostRecentRunID,
LoadActivityLog, ParseActivityLine) as concise bullet points under the new "###
Fixed" header.
In `@cmd/tracker/doctor.go`:
- Around line 240-247: The initial "suffix := \"\\n\"" is ineffectual because
it's immediately overwritten; remove that dead assignment and instead declare
suffix once (e.g., var suffix string) then set it inside the existing if/else
that checks content and content[len(content)-1] in the same block, then append
strings.Join(toAppend, "\n") + "\n" to suffix before calling os.WriteFile with
append(content, []byte(suffix)...). Reference variables/functions: suffix,
content, toAppend, gitignorePath, os.WriteFile.
In `@docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md`:
- Around line 62-63: Replace the hard-coded package-count expectation string
"Expected: build succeeds, all 14 packages pass." (and any similar occurrences
like the later "15" or other numeric counts) with a non-stale verification
instruction such as "Expected: build succeeds and go test ./... passes" or
update the number to match the current PR summary (e.g., 17) if you must keep a
count; ensure you change every matching phrase in this doc (and the repeated
occurrence noted) so the plan no longer hard-codes package counts.
In `@docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md`:
- Around line 149-172: Update the documented Audit API types to match the
implementation: change AuditReport.Status to the actual enum values used in code
(use the exact strings the implementation uses instead of "completed" |
"in_progress" | "failed"), update RunSummary to match the richer shape the PR
implements (replace {RunID, StartedAt, Status, NodeCount} with the actual fields
used in the branch), and ensure any other struct fields in AuditReport (e.g.,
TimelineEntry, RetryRecord, ActivityError) and their JSON tags match the
implementation signatures; locate the declarations named AuditReport,
TimelineEntry, RetryRecord, ActivityError and RunSummary in the spec and edit
their types and documented status values to mirror the shipped contract exactly.
In `@tracker_audit.go`:
- Around line 1-3: The file has gofmt formatting issues; run the formatter and
save the file to fix whitespace/indentation (e.g., run `gofmt -w
tracker_audit.go`) so the package declaration and leading comments in package
tracker conform to gofmt rules; ensure no other formatting changes are left
unstaged before committing.
- Around line 119-132: The classifyStatus function only treats
"pipeline_completed" and "pipeline_failed" as terminal events, causing
budget-halted runs with a "budget_exceeded" activity to be misclassified as
success; update the switch inside classifyStatus (iterating activity[i].Type) to
include case "budget_exceeded": return "fail" so budget-exceeded terminal events
are treated as failures (or alternatively extend the terminal-event check to
include other terminating activity types if preferred).
In `@tracker_diagnose.go`:
- Around line 100-129: collectNodeFailuresLib and loadNodeFailureLib currently
swallow I/O and JSON errors; change their signatures to return errors (e.g.,
collectNodeFailuresLib returns (map[string]*NodeFailure, error) and
loadNodeFailureLib returns (*NodeFailure, error)), treat os.ReadDir/os.ReadFile
and json.Unmarshal failures as real errors to propagate up (only treat
explicitly optional files like missing activity.jsonl as ignorable), replace
fmt.Fprintf to stderr with returned errors, and update callers to
handle/aggregate and expose these errors so a partial/incomplete diagnosis is
not silently reported (also apply the same change pattern to the related helpers
referenced around lines 156-166).
In `@tracker_doctor_test.go`:
- Around line 34-47: The test TestDoctor_NoProviderKeys unsets provider env vars
directly which doesn't restore prior values; replace the os.Unsetenv loop with
calls to t.Setenv for each variable (e.g. t.Setenv("ANTHROPIC_API_KEY", "")
etc.) so the original environment is automatically restored after the test when
invoking Doctor with DoctorConfig{WorkDir: workdir, ProbeProviders: false};
update the loop that references ANTHROPIC_API_KEY, OPENAI_API_KEY,
GEMINI_API_KEY, GOOGLE_API_KEY, OPENAI_COMPAT_API_KEY accordingly.
In `@tracker_doctor.go`:
- Around line 365-380: The getDippinVersion function uses exec.Command which can
hang; switch to exec.CommandContext with a short context timeout: create a
context.WithTimeout (e.g., 2s) and call exec.CommandContext(ctx, path,
"--version") and if that fails try exec.CommandContext with "version"; make sure
to cancel the context (defer cancel()) before returning and preserve the same
output-parsing logic and fallback "(version unknown)" behavior in
getDippinVersion.
- Around line 692-718: checkDiskSpaceLib uses syscall.Statfs which is Unix-only
and breaks Windows builds; either restrict this implementation to non-Windows
using a build constraint or provide a Windows counterpart. Fix by moving this
function into a file with a //go:build !windows (or // +build !windows) header
(keeping the same function name checkDiskSpaceLib) and implement an alternative
checkDiskSpaceLib in a separate file for Windows (or use a cross-platform
library such as golang.org/x/sys/unix for Unix and golang.org/x/sys/windows or a
portable disk-space package for Windows) so the symbol checkDiskSpaceLib is
defined on all platforms and Windows builds no longer fail.
In `@tracker_events.go`:
- Around line 108-117: The NDJSONEvent is being stamped with time.Now() which
overwrites the original agent event time; change the timestamp assignment to
prefer evt.Timestamp formatted with ndjsonTimestampLayout and only use
time.Now() when evt.Timestamp.IsZero(). Update the construction of NDJSONEvent
(the Timestamp field) to format evt.Timestamp if non-zero, otherwise fall back
to time.Now().Format(ndjsonTimestampLayout), keeping all other fields (Source,
Type, NodeID, Provider, Model, ToolName, Content) unchanged.
In `@tracker_simulate.go`:
- Around line 164-168: The orphan nodes appended in simBFSNodeOrder are
non-deterministic because iterating graph.Nodes (a map) yields arbitrary order;
collect orphans into a separate slice (e.g., orphanNodes), sort that slice by a
stable key such as node.ID (or node.Name) using the existing sort package, then
append the sorted orphanNodes to ordered instead of appending directly from the
map; update simBFSNodeOrder to perform these steps so output and tests become
deterministic.
---
Outside diff comments:
In `@cmd/tracker/diagnose.go`:
- Around line 11-192: The file cmd/tracker/diagnose.go is not gofmt'ed; run a
formatter to fix whitespace/formatting so CI passes. Fix by running `gofmt -w`
(or `gofmt -w cmd/tracker/diagnose.go` / `goimports -w`) on the file containing
functions like diagnoseMostRecent, printDiagnoseHeader, printNodeDiagnosis, and
printDiagnoseSuggestions to normalize import formatting, spacing, and line
breaks, then re-run CI.
---
Nitpick comments:
In `@cmd/tracker/audit.go`:
- Around line 95-105: The existing comment incorrectly describes rebuilding
timestamps from entries; update or remove it to match the actual behavior:
stageStarts := make(map[string]time.Time) is created and the code simply
iterates over timeline calling printTimelineEntryFromLib(entry, stageStarts) and
then printTimelineTotalDurationFromLib(timeline). Replace the stale lines about
"Re-read timestamps..." and "Rebuild from the timeline slice itself" with a
concise comment stating that stageStarts is used for inline duration computation
and each timeline entry is printed via printTimelineEntryFromLib, or remove the
comment entirely.
In `@cmd/tracker/doctor_test.go`:
- Around line 245-253: The custom contains helper reimplements standard library
functionality; remove the contains(haystack, needle string) function and replace
its uses with strings.Contains by adding "strings" to the imports and updating
call sites to strings.Contains(haystack, needle); ensure all tests compile after
removing the helper and adjust any package-scoped references to contains
accordingly.
In `@cmd/tracker/run.go`:
- Around line 242-245: The closure rebuilds the NDJSON handler on every event;
cache stream.AgentHandler() once before creating agentHandler to avoid per-event
allocation. Retrieve and store the handler in a local variable (e.g.,
cachedAgentHandler := stream.AgentHandler()) and then use
cachedAgentHandler.HandleEvent(evt) inside the agent.EventHandlerFunc closure;
keep existing calls to logAgentEvent(evt) and the agentHandler variable name.
In `@cmd/tracker/simulate.go`:
- Around line 17-67: runSimulateCmd currently reads/parses the pipeline twice:
first via loadPipeline/loadEmbeddedPipeline so pipeline.ValidateAll can emit
warnings, then it re-reads the file/string and calls tracker.Simulate; to avoid
double reads either (A) reuse the already-loaded source by capturing the
original file contents when calling loadPipeline/loadEmbeddedPipeline and pass
that string into tracker.Simulate, or (B) move the validation call into
tracker.Simulate so you only call tracker.Simulate once; update runSimulateCmd
to stop re-reading the file when isEmbedded/is not embedded after loading
(referencing runSimulateCmd, loadPipeline, loadEmbeddedPipeline,
pipeline.ValidateAll, and tracker.Simulate).
In `@tracker_activity.go`:
- Around line 78-82: The MostRecentRunID function currently writes warnings
directly to os.Stderr; change it to avoid side-effectful stderr writes by either
(A) adding a logger/io.Writer parameter (e.g., logger or io.Writer warnOut) and
writing warnings to that instead, or (B) accumulate warning messages in a slice
and return them alongside the result (e.g., change signature to return (runID
string, warnings []string, err error)); update the error handling block that
currently calls fmt.Fprintf(os.Stderr, ...) to append the formatted message to
the warnings slice or send it to the injected logger/writer, and update call
sites accordingly.
In `@tracker_audit.go`:
- Around line 215-219: buildLibRunSummary currently writes a warning directly to
os.Stderr when LoadActivityLog fails (similar to MostRecentRunID); instead
return the warning or use the library's logger so callers can handle it. Modify
buildLibRunSummary to not print to os.Stderr: have LoadActivityLog return the
error as before, and propagate that error or append a warning string to a
returned warnings slice (or accept a logger interface) so callers receive the
warning; specifically update the block handling (activity, lerr :=
LoadActivityLog(runDir)) to remove fmt.Fprintf(os.Stderr, ...) and either return
lerr up the call chain or record the message into a []string warnings return
value (or call a provided logger), ensuring callers of buildLibRunSummary are
updated to handle the new returned warning or error.
In `@tracker_doctor_test.go`:
- Around line 11-14: The test TestDoctor_NoProbe_KeyPresent manually calls
os.Setenv and t.Cleanup to restore ANTHROPIC_API_KEY; replace that pattern with
t.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890") in the
TestDoctor_NoProbe_KeyPresent function so the testing framework handles teardown
automatically and remove the manual os.Setenv and t.Cleanup calls.
In `@tracker_doctor.go`:
- Around line 521-531: getBinaryVersion currently calls exec.Command without a
context which risks hangs; change its signature to accept a context (e.g.,
getBinaryVersion(ctx context.Context, path, flag string) string) and use
exec.CommandContext(ctx, path, flag). Keep the same error/return behavior, and
update all callers to pass the appropriate context (the calling function's ctx)
so the command inherits cancellation/timeouts; reference getBinaryVersion and
use exec.CommandContext for the actual replacement.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c37ecbbe-fa56-4fde-9b45-6d379db58ae3
📒 Files selected for processing (31)
CHANGELOG.mdREADME.mdcmd/tracker/audit.gocmd/tracker/diagnose.gocmd/tracker/doctor.gocmd/tracker/doctor_test.gocmd/tracker/flags.gocmd/tracker/json_stream.gocmd/tracker/run.gocmd/tracker/simulate.godocs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.mddocs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.mdtestdata/runs/budget_halted/activity.jsonltestdata/runs/budget_halted/checkpoint.jsontestdata/runs/failed/Build/status.jsontestdata/runs/failed/activity.jsonltestdata/runs/failed/checkpoint.jsontestdata/runs/ok/activity.jsonltestdata/runs/ok/checkpoint.jsontracker_activity.gotracker_activity_test.gotracker_audit.gotracker_audit_test.gotracker_diagnose.gotracker_diagnose_test.gotracker_doctor.gotracker_doctor_test.gotracker_events.gotracker_events_test.gotracker_simulate.gotracker_simulate_test.go
💤 Files with no reviewable changes (1)
- cmd/tracker/json_stream.go
| Expected: build succeeds, all 14 packages pass. If any package is red before you start, stop — resolve that first. | ||
|
|
There was a problem hiding this comment.
Avoid hard-coding package counts in the verification steps.
This plan already disagrees with itself (14 here vs 15 later), and the PR summary for this branch reports 17 passing packages. Please switch these expectations to “go test ./... passes” or update them to the current count so the plan does not go stale immediately.
Also applies to: 2619-2619
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md` around lines
62 - 63, Replace the hard-coded package-count expectation string "Expected:
build succeeds, all 14 packages pass." (and any similar occurrences like the
later "15" or other numeric counts) with a non-stale verification instruction
such as "Expected: build succeeds and go test ./... passes" or update the number
to match the current PR summary (e.g., 17) if you must keep a count; ensure you
change every matching phrase in this doc (and the repeated occurrence noted) so
the plan no longer hard-codes package counts.
- enrichFromEntryNF now uses the dual-format timestamp parser shared with ParseActivityLine; previously dropped all timestamps from logs using the millisecond format, zeroing NodeFailure.Duration and silently disabling the suspicious-timing suggestion heuristic. - CLI --probe help text updated: usage shows [--probe=false] since probe is now the default. - Tests use t.Setenv for Doctor env-var mutations so pre-existing values are restored on cleanup. - CheckResult.Status and CheckDetail.Status doc comments list the fifth value "hint" used for informational sub-items. CHANGELOG note added. - Diagnose and Audit doc comments warn that runDir is trusted input; point callers at ResolveRunDir / DiagnoseMostRecent for untrusted paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix CI format check — realign struct field tags and trim a trailing blank line that slipped in with the Phase 2 work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
tracker_doctor.go (2)
365-369:⚠️ Potential issue | 🟠 MajorUse
exec.CommandContextwith timeout for external binary version calls.Line 366/368 and Line 522 call external binaries without context; an unresponsive executable can hang Doctor indefinitely.
#!/bin/bash set -euo pipefail # Verify direct exec.Command usages in this file rg -nP '\bexec\.Command\(' tracker_doctor.go # Show the two helper functions for targeted patching rg -n 'func getDippinVersion|func getBinaryVersion' tracker_doctor.go -A25Also applies to: 521-523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_doctor.go` around lines 365 - 369, Replace direct exec.Command calls in getDippinVersion and getBinaryVersion with exec.CommandContext using a context with a reasonable timeout (e.g., context.WithTimeout) to avoid hanging if the external binary is unresponsive; create ctx, defer cancel, call exec.CommandContext(ctx, ...).CombinedOutput(), and handle context deadline exceeded or other errors appropriately when logging/returning the version from those functions.
693-696:⚠️ Potential issue | 🔴 Critical
syscall.Statfsis Unix-only; this implementation is not cross-platform.Line 695 uses
syscall.Statfs, which is unavailable on Windows builds. This can break compilation unless this function is split behind OS build tags.#!/bin/bash set -euo pipefail echo "Statfs usage:" rg -n 'syscall\.Statfs|Statfs_t' --type go echo echo "Build tags near doctor files:" fd -e go | rg 'doctor' | while read -r f; do head -n 3 "$f" | sed "s|^|$f: |" done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_doctor.go` around lines 693 - 696, The checkDiskSpaceLib function uses syscall.Statfs (Statfs_t) which is Unix-only and will fail Windows builds; split the implementation by moving the current checkDiskSpaceLib into a file with a non-Windows build tag (e.g., //go:build !windows) and create a Windows-specific stub file (with //go:build windows) that defines checkDiskSpaceLib and returns an appropriate CheckResult (e.g., "unsupported on windows" or a safe default) so the symbol exists on all platforms and compilation succeeds; reference the symbols checkDiskSpaceLib, syscall.Statfs, and Statfs_t when locating and splitting the code.tracker_diagnose.go (1)
106-110:⚠️ Potential issue | 🟠 MajorDon’t silently downgrade unreadable or corrupt artifacts into a “clean” report.
Lines 110, 127, 134-135, 166, and 183-184 currently collapse artifact I/O / JSON failures into “no failures” or “no budget halt”. For a public library API, that produces an apparently valid
DiagnoseReportwith missing retry/timing data and no indication that diagnosis was incomplete. Only genuinely optional files should be ignored; the other read/parse failures need to be returned or surfaced on the report.Also applies to: 123-135, 162-166, 183-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_diagnose.go` around lines 106 - 110, collectNodeFailuresLib (and related readers used by collectNodeFailuresLib) currently swallows filesystem and JSON parse errors and returns an empty map, which hides incomplete diagnosis; modify collectNodeFailuresLib to surface I/O and parse failures instead of treating them as “no failures” — either change its signature to return (map[string]*NodeFailure, error) or attach error details to the resulting DiagnoseReport (e.g., an Errors/Incomplete field) so callers can detect incomplete runs; update any callers (and the creation of DiagnoseReport) to propagate and/or include those errors (reference symbols: collectNodeFailuresLib, NodeFailure, DiagnoseReport) and ensure only truly optional files continue to be ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tracker_diagnose.go`:
- Around line 200-205: When parsing timestamps for stage events, the code
currently stores a zero time when parseActivityTimestampLib fails and later uses
it for duration math; modify the logic so that in the "stage_started" branch you
only set stageStarts[entry.NodeID] when parseActivityTimestampLib returned a
successful timestamp (check the boolean/err result), and in
updateFailureTimingNF (the function called for "stage_failed") guard the
duration calculation by verifying both the start from stageStarts and the end ts
are non-zero (or otherwise valid) before subtracting to compute duration; this
prevents incorrect durations derived from time.Time{} defaults.
- Around line 95-101: DiagnoseMostRecent currently calls MostRecentRunID which
prints warnings to os.Stderr, leaking CLI output from a library function; change
DiagnoseMostRecent to call a non-printing variant (e.g., MostRecentRunIDQuiet)
or adjust MostRecentRunID to accept a suppressWarnings bool or an io.Writer for
diagnostics so it doesn't write to os.Stderr by default, and propagate any
skipped-directory warnings back as structured data (e.g., return an error type
or a slice of warning strings) instead of printing; update DiagnoseMostRecent to
use the quiet variant/signature and handle returned warnings or errors
accordingly so the public API emits no unsolicited stderr output.
In `@tracker_doctor.go`:
- Around line 217-233: The provider loop correctly appends CheckDetail entries
with Status "error" for invalid/auth-failed keys (see isValidAPIKey,
probeProvider, and out.Details), but later logic force-sets the section summary
to "ok" when any provider is configured; update the summary computation (the
code around where section/status is set after the loop) to scan out.Details (or
the provider-specific slice) and set the section's Status to "error" if any
CheckDetail has Status == "error" (otherwise "ok"), so the DoctorReport counts
reflect the detail-level errors instead of always becoming "ok".
- Around line 24-27: The PinnedDippinVersion constant is out of date; update the
declaration of PinnedDippinVersion to match the go.mod requirement by changing
its value to "v0.19.1" so Doctor's version check uses the correct pinned version
(look for the const PinnedDippinVersion = ... in tracker_doctor.go).
- Around line 561-583: The section unconditionally sets out.Status = "ok" even
after adding warn CheckDetail entries (from the workdir home check and
missingGitignoreEntries), which hides warnings; update the end of the function
to compute and set out.Status and out.Message based on the details (or at
minimum set out.Status = "warn" when either the home/workdir check or
missingGitignoreEntries produced a warn), e.g. inspect the details slice for any
CheckDetail with Status == "warn" (references: missingGitignoreEntries, the
workdir home check that appends a warn detail, and the final
out.Status/out.Message assignments) and promote the section status to "warn" so
warnings are reflected in DoctorReport.
---
Duplicate comments:
In `@tracker_diagnose.go`:
- Around line 106-110: collectNodeFailuresLib (and related readers used by
collectNodeFailuresLib) currently swallows filesystem and JSON parse errors and
returns an empty map, which hides incomplete diagnosis; modify
collectNodeFailuresLib to surface I/O and parse failures instead of treating
them as “no failures” — either change its signature to return
(map[string]*NodeFailure, error) or attach error details to the resulting
DiagnoseReport (e.g., an Errors/Incomplete field) so callers can detect
incomplete runs; update any callers (and the creation of DiagnoseReport) to
propagate and/or include those errors (reference symbols:
collectNodeFailuresLib, NodeFailure, DiagnoseReport) and ensure only truly
optional files continue to be ignored.
In `@tracker_doctor.go`:
- Around line 365-369: Replace direct exec.Command calls in getDippinVersion and
getBinaryVersion with exec.CommandContext using a context with a reasonable
timeout (e.g., context.WithTimeout) to avoid hanging if the external binary is
unresponsive; create ctx, defer cancel, call exec.CommandContext(ctx,
...).CombinedOutput(), and handle context deadline exceeded or other errors
appropriately when logging/returning the version from those functions.
- Around line 693-696: The checkDiskSpaceLib function uses syscall.Statfs
(Statfs_t) which is Unix-only and will fail Windows builds; split the
implementation by moving the current checkDiskSpaceLib into a file with a
non-Windows build tag (e.g., //go:build !windows) and create a Windows-specific
stub file (with //go:build windows) that defines checkDiskSpaceLib and returns
an appropriate CheckResult (e.g., "unsupported on windows" or a safe default) so
the symbol exists on all platforms and compilation succeeds; reference the
symbols checkDiskSpaceLib, syscall.Statfs, and Statfs_t when locating and
splitting the code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1bf50865-4416-4313-8ec5-c8182d38442d
📒 Files selected for processing (6)
CHANGELOG.mdcmd/tracker/flags.gotracker_audit.gotracker_diagnose.gotracker_doctor.gotracker_doctor_test.go
✅ Files skipped from review due to trivial changes (1)
- tracker_audit.go
🚧 Files skipped from review as they are similar to previous changes (2)
- cmd/tracker/flags.go
- CHANGELOG.md
| // PinnedDippinVersion is the dippin-lang version from go.mod. | ||
| // Keep in sync with the require line in go.mod. | ||
| const PinnedDippinVersion = "v0.18.0" | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Pinned constant:"
rg -n 'PinnedDippinVersion' tracker_doctor.go
echo
echo "go.mod dippin-lang requires:"
rg -n 'github.com/2389-research/dippin-lang' go.modRepository: 2389-research/tracker
Length of output: 790
Update PinnedDippinVersion constant to match go.mod requirement.
PinnedDippinVersion is hardcoded to v0.18.0 but go.mod requires v0.19.1. This causes Doctor to report false version mismatches even when dippin is correctly installed at the pinned version.
Update line 26 to const PinnedDippinVersion = "v0.19.1".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tracker_doctor.go` around lines 24 - 27, The PinnedDippinVersion constant is
out of date; update the declaration of PinnedDippinVersion to match the go.mod
requirement by changing its value to "v0.19.1" so Doctor's version check uses
the correct pinned version (look for the const PinnedDippinVersion = ... in
tracker_doctor.go).
| if !isValidAPIKey(p.name, key) { | ||
| out.Details = append(out.Details, CheckDetail{ | ||
| Status: "error", | ||
| Message: fmt.Sprintf("%-15s %s=%s (invalid format)", p.name, envName, masked), | ||
| Hint: fmt.Sprintf("%s keys should match expected format — run `tracker setup`", p.name), | ||
| }) | ||
| continue | ||
| } | ||
| if probe && p.buildAdapter != nil { | ||
| authOk, authMsg := probeProvider(p, key) | ||
| if !authOk { | ||
| out.Details = append(out.Details, CheckDetail{ | ||
| Status: "error", | ||
| Message: fmt.Sprintf("%-15s %s=%s (auth failed: %s)", p.name, envName, masked, authMsg), | ||
| Hint: fmt.Sprintf("your %s key is invalid or expired — export a fresh key or run `tracker setup`", p.name), | ||
| }) | ||
| continue |
There was a problem hiding this comment.
Provider check can return overall ok while containing provider error details.
At Line 219/229, invalid/auth-failed providers are marked as error in details, but Line 275 forces section status to ok whenever at least one provider is configured. This under-reports health severity in DoctorReport summary counts.
Suggested fix
func checkProvidersLib(probe bool) CheckResult {
out := CheckResult{Name: "LLM Providers"}
var configuredNames []string
var missingNames []string
+ hasProviderErrors := false
@@
if !isValidAPIKey(p.name, key) {
out.Details = append(out.Details, CheckDetail{
Status: "error",
Message: fmt.Sprintf("%-15s %s=%s (invalid format)", p.name, envName, masked),
Hint: fmt.Sprintf("%s keys should match expected format — run `tracker setup`", p.name),
})
+ hasProviderErrors = true
continue
}
@@
if !authOk {
out.Details = append(out.Details, CheckDetail{
Status: "error",
Message: fmt.Sprintf("%-15s %s=%s (auth failed: %s)", p.name, envName, masked, authMsg),
Hint: fmt.Sprintf("your %s key is invalid or expired — export a fresh key or run `tracker setup`", p.name),
})
+ hasProviderErrors = true
continue
}
@@
- out.Status = "ok"
+ if hasProviderErrors {
+ out.Status = "warn"
+ } else {
+ out.Status = "ok"
+ }Also applies to: 275-281
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tracker_doctor.go` around lines 217 - 233, The provider loop correctly
appends CheckDetail entries with Status "error" for invalid/auth-failed keys
(see isValidAPIKey, probeProvider, and out.Details), but later logic force-sets
the section summary to "ok" when any provider is configured; update the summary
computation (the code around where section/status is set after the loop) to scan
out.Details (or the provider-specific slice) and set the section's Status to
"error" if any CheckDetail has Status == "error" (otherwise "ok"), so the
DoctorReport counts reflect the detail-level errors instead of always becoming
"ok".
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tracker_audit.go (1)
124-131:⚠️ Potential issue | 🟠 MajorHandle
budget_exceededas a failure terminal event.At Line 126,
classifyStatusonly recognizespipeline_completedandpipeline_failed. Runs halted by budget limits can still be terminal failures and are currently at risk of being reported as"success"via the fallback path.🔧 Proposed fix
func classifyStatus(cp *pipeline.Checkpoint, activity []ActivityEntry) string { for i := len(activity) - 1; i >= 0; i-- { switch activity[i].Type { case "pipeline_completed": return "success" case "pipeline_failed": + return "fail" + case "budget_exceeded": return "fail" } } if cp.CurrentNode != "" { return "fail" } return "success" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_audit.go` around lines 124 - 131, The classifyStatus function currently only treats "pipeline_completed" as success and "pipeline_failed" as a terminal failure; add handling for budget-exceeded terminal events by adding a case for activity[i].Type == "budget_exceeded" that returns "fail" so runs halted by budget limits are classified as failures (update the switch in classifyStatus that iterates ActivityEntry entries to include "budget_exceeded" alongside "pipeline_failed").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tracker_audit.go`:
- Around line 220-223: In buildLibRunSummary replace the direct write to
os.Stderr with returning structured warning information: remove the
fmt.Fprintf(os.Stderr...) call where activity, lerr := LoadActivityLog(runDir)
is handled, keep activity = nil for continuing, and add a warnings/errs return
value (e.g., []error or []string) or embed a Warnings field on the returned
summary type so callers (CLI) can decide how to present the message; ensure
LoadActivityLog error (lerr) and contextual info (name, runDir) are packaged
into that returned warning instead of printing.
---
Duplicate comments:
In `@tracker_audit.go`:
- Around line 124-131: The classifyStatus function currently only treats
"pipeline_completed" as success and "pipeline_failed" as a terminal failure; add
handling for budget-exceeded terminal events by adding a case for
activity[i].Type == "budget_exceeded" that returns "fail" so runs halted by
budget limits are classified as failures (update the switch in classifyStatus
that iterates ActivityEntry entries to include "budget_exceeded" alongside
"pipeline_failed").
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6a20e61b-f0e0-4cfb-a587-65448d175159
📒 Files selected for processing (2)
cmd/tracker/diagnose.gotracker_audit.go
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/tracker/diagnose.go
| activity, lerr := LoadActivityLog(runDir) | ||
| if lerr != nil { | ||
| fmt.Fprintf(os.Stderr, "warning: run %s: cannot read activity log: %v\n", name, lerr) | ||
| activity = nil // continue with nil so the summary still builds |
There was a problem hiding this comment.
Avoid writing to os.Stderr from library code paths.
At Line 222, buildLibRunSummary emits a warning directly to stderr. This introduces presentation side effects in a library API and undermines the data/presentation separation goal.
A better pattern is to return structured warning/error information up to the caller (CLI) and let the CLI decide whether/how to print.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tracker_audit.go` around lines 220 - 223, In buildLibRunSummary replace the
direct write to os.Stderr with returning structured warning information: remove
the fmt.Fprintf(os.Stderr...) call where activity, lerr :=
LoadActivityLog(runDir) is handled, keep activity = nil for continuing, and add
a warnings/errs return value (e.g., []error or []string) or embed a Warnings
field on the returned summary type so callers (CLI) can decide how to present
the message; ensure LoadActivityLog error (lerr) and contextual info (name,
runDir) are packaged into that returned warning instead of printing.
There was a problem hiding this comment.
Pull request overview
This PR moves key CLI-only features (diagnose/audit/doctor/simulate + NDJSON --json stream writer) into the public tracker library as JSON-serializable report APIs, and refactors the CLI commands into thin presentation layers over those library APIs to achieve Phase 1 (NDJSON) + Phase 2 parity from #76.
Changes:
- Added public library APIs:
Simulate,Diagnose/DiagnoseMostRecent,Audit,ListRuns,Doctor, and shared run/activity helpers (ResolveRunDir,MostRecentRunID,LoadActivityLog, etc.). - Promoted the CLI NDJSON writer into
tracker.NewNDJSONWriterwith handler factories for pipeline/agent/LLM trace events; deleted the old CLI-private implementation. - Updated CLI commands to call the library and print the same outputs; added tests, fixtures, and docs/changelog entries.
Reviewed changes
Copilot reviewed 31 out of 31 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tracker_simulate.go | New library Simulate API and report types (graph introspection + BFS plan). |
| tracker_simulate_test.go | Unit tests for SimulateReport fields (nodes/edges/plan/unreachable/attrs). |
| tracker_events.go | New public NDJSON wire type + NDJSONWriter with handler factories. |
| tracker_events_test.go | Tests for NDJSON writer correctness, stable tags, and concurrency safety. |
| tracker_activity.go | New shared helpers for run-dir resolution and activity.jsonl parsing/sorting. |
| tracker_activity_test.go | Tests for run-dir resolution and activity parsing/timestamp formats. |
| tracker_diagnose.go | New library Diagnose / DiagnoseMostRecent and structured failure/suggestion reporting. |
| tracker_diagnose_test.go | Fixture-driven tests for clean/failing/budget-halted run diagnosis. |
| tracker_audit.go | New library Audit + ListRuns APIs and structured audit/timeline summaries. |
| tracker_audit_test.go | Tests for audit status, timeline, durations, retries, and listing order. |
| tracker_doctor.go | New library Doctor API (env/provider/binary/workdir/artifact/disk/pipeline checks). |
| tracker_doctor_test.go | Tests for Doctor behavior (probe opt-in, provider detection, pipeline validation). |
| cmd/tracker/run.go | CLI JSON event mode now uses tracker.NewNDJSONWriter handlers. |
| cmd/tracker/json_stream.go | Removed CLI-private NDJSON stream writer (superseded by library). |
| cmd/tracker/simulate.go | CLI simulate now prints from tracker.SimulateReport (plus legacy validation warnings). |
| cmd/tracker/diagnose.go | CLI diagnose now calls tracker.Diagnose* and prints the report. |
| cmd/tracker/audit.go | CLI audit/list now calls tracker.Audit / tracker.ListRuns and prints the report. |
| cmd/tracker/doctor.go | CLI doctor now calls tracker.Doctor, prints results, and applies CLI-only fixups. |
| cmd/tracker/doctor_test.go | Reduced scope to CLI shim behavior (flags/exit codes/printing/fixups). |
| cmd/tracker/flags.go | Doctor --probe default changed to true; usage updated. |
| testdata/runs/ok/checkpoint.json | New run fixture for successful run. |
| testdata/runs/ok/activity.jsonl | New run fixture activity log for successful run. |
| testdata/runs/failed/checkpoint.json | New run fixture for failed run with retries. |
| testdata/runs/failed/activity.jsonl | New run fixture activity log for failed run. |
| testdata/runs/failed/Build/status.json | New per-node status fixture for failed node. |
| testdata/runs/budget_halted/checkpoint.json | New run fixture for budget-halted run. |
| testdata/runs/budget_halted/activity.jsonl | New run fixture activity log for budget halt. |
| docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md | New design/spec doc describing the parity phases and APIs. |
| README.md | Added library usage examples for diagnosing runs and streaming NDJSON. |
| CHANGELOG.md | Documented newly added library APIs and CLI refactor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if err != nil { | ||
| return "", err | ||
| } | ||
| return filepath.Join(runsDir, matched), nil |
There was a problem hiding this comment.
ResolveRunDir's docstring says it returns an absolute path, but the implementation just filepath.Join's workdir/.tracker/runs/... and will return a relative path when workdir is relative (e.g. "."). Either call filepath.Abs on the result (and return that), or update the docstring to match actual behavior.
| return filepath.Join(runsDir, matched), nil | |
| runDir := filepath.Join(runsDir, matched) | |
| absRunDir, err := filepath.Abs(runDir) | |
| if err != nil { | |
| return "", fmt.Errorf("cannot resolve absolute run directory path: %w", err) | |
| } | |
| return absRunDir, nil |
| for _, node := range graph.Nodes { | ||
| if !visited[node.ID] { | ||
| ordered = append(ordered, node) | ||
| } | ||
| } |
There was a problem hiding this comment.
simBFSNodeOrder appends orphan nodes by ranging over graph.Nodes (a map), which makes the resulting SimulateReport.Nodes ordering nondeterministic when there are multiple unreachable/orphan nodes. For a stable library API (and stable CLI output), consider appending orphans in a deterministic order (e.g. by graph.NodeOrder when present, or by sorting orphan IDs).
| for _, node := range graph.Nodes { | |
| if !visited[node.ID] { | |
| ordered = append(ordered, node) | |
| } | |
| } | |
| var orphanIDs []string | |
| for id := range graph.Nodes { | |
| if !visited[id] { | |
| orphanIDs = append(orphanIDs, id) | |
| } | |
| } | |
| sort.Strings(orphanIDs) | |
| for _, id := range orphanIDs { | |
| ordered = append(ordered, graph.Nodes[id]) | |
| } |
| path := filepath.Join(runDir, "activity.jsonl") | ||
| data, err := os.ReadFile(path) | ||
| if err != nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Diagnose reads the entire activity.jsonl into memory (os.ReadFile) and then strings.Split's it, which can be expensive / high-memory for large runs. Consider streaming the file (bufio.Scanner) similar to LoadActivityLog, so library consumers can safely diagnose very large activity logs.
| if !strings.HasSuffix(pipelineFile, ".dip") { | ||
| out.Details = append(out.Details, CheckDetail{ | ||
| Status: "warn", | ||
| Message: fmt.Sprintf("%s is not a .dip file — may not be a valid pipeline", pipelineFile), | ||
| }) |
There was a problem hiding this comment.
checkPipelineFileLib warns for any non-.dip file, but DoctorConfig/PipelineFile is described as supporting both .dip and .dot. As written, passing a valid .dot will always produce a warning detail ("not a .dip file"). Consider only warning when the extension is neither .dip nor .dot, or adjust the message to reflect that .dot is supported (even if deprecated).
| report, err := tracker.Simulate(source) | ||
| if err != nil { | ||
| return err |
There was a problem hiding this comment.
runSimulateCmd respects formatOverride when parsing via loadPipeline, but the library report path always calls tracker.Simulate(source) which auto-detects format from content and ignores formatOverride. This can lead to mismatched validation vs printed report (and potentially different behavior) when --format is used. Consider generating the SimulateReport from the already-parsed graph, or adding a library API that accepts an explicit format.
| // Write-side-effect fix-ups must not run inside the library. The | ||
| // Working Directory check emits a gitignore hint in its details; | ||
| // patch the file here now that the user has seen the warning. | ||
| maybeFixGitignore(report, workdir) |
There was a problem hiding this comment.
maybeFixGitignore/checkGitignore will now modify an existing .gitignore automatically whenever the library emits a missing-entry warning. This is a potentially surprising write side effect (and the Phase 2 design doc describes this as a prompt-driven fix-up). Consider gating this behind an explicit user confirmation / flag, or update the docs to reflect the new non-interactive behavior.
| ```go | ||
| type AuditReport struct { | ||
| RunID string `json:"run_id"` | ||
| Status string `json:"status"` // "completed" | "in_progress" | "failed" |
There was a problem hiding this comment.
The design doc's AuditReport.Status comment lists values "completed" | "in_progress" | "failed", but the implementation (tracker_audit.go) uses "success" and "fail" (and ListRuns/Audit tests assert those). Please update this doc snippet to match the actual status strings so library consumers aren't misled.
| Status string `json:"status"` // "completed" | "in_progress" | "failed" | |
| Status string `json:"status"` // "success" | "in_progress" | "fail" |
| - `tracker.Diagnose(runDir)` / `tracker.DiagnoseMostRecent(workDir)` — structured `*DiagnoseReport` with node failures, budget halt, and typed suggestions (`Kind: "retry_pattern" | "escalate_limit" | "no_output" | "shell_command" | "go_test" | "suspicious_timing" | "budget"`). | ||
| - `tracker.Audit(runDir)` — structured `*AuditReport` with timeline, retries, errors, and recommendations. | ||
| - `tracker.ListRuns(workDir)` — sorted `[]RunSummary` for enumerating past runs (newest first). | ||
| - `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has five values: `"ok"`, `"warn"`, `"error"`, `"skip"`, and `"hint"` (informational sub-items such as optional providers not configured). |
There was a problem hiding this comment.
CHANGELOG entry says CheckDetail.Status supports "skip", but the code only uses/defines detail statuses like ok/warn/error/hint (and there are no "skip" detail emitters). Either remove "skip" from the CheckDetail.Status list in the changelog or implement/define what a skipped detail line should mean.
| - `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has five values: `"ok"`, `"warn"`, `"error"`, `"skip"`, and `"hint"` (informational sub-items such as optional providers not configured). | |
| - `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has four values: `"ok"`, `"warn"`, `"error"`, and `"hint"` (informational sub-items such as optional providers not configured). |
- go.mod: upgrade github.com/2389-research/dippin-lang v0.19.1 → v0.20.0 - CI: install dippin CLI v0.20.0 (was v0.10.0 — 10 minor versions stale) - examples/human_gate_test_suite.dip: rename "default_choice:" to "default:" (the real IR field name on HumanConfig). - Makefile lint: skip human_gate_test_suite.dip with a note — the file exercises "timeout:" and "timeout_action:" on human nodes, which are tracker-level attrs but are not yet in the dippin-lang HumanConfig IR. v0.10.0 accepted unknown fields silently; v0.20.0 rejects them. See follow-up upstream issue. All local checks pass: fmt-check, vet, build, test-short, test-race, coverage, lint, doctor (ask_and_execute / build_product / build_product_with_superspec all A/100). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Makefile (1)
136-140: Make the lint skip explicitly temporary to avoid a long-term CI blind spot.Line 136 introduces a hard skip in a target that CI executes. Consider adding an expiry mechanism (e.g., version gate or tracked TODO/issue reference) so this file is automatically brought back under lint once
dippin-langIR supportstimeout_action.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Makefile` around lines 136 - 140, The Makefile currently hard-skips examples/human_gate_test_suite.dip inside the case "$$f" block which creates a permanent CI blind spot; change the skip so it is explicitly temporary by gating it on a clearly named env var or feature flag (e.g., LINT_ALLOW_HUMAN_TIMEOUT) or by adding a TODO/issue reference and an expiry/version gate checked by CI. Concretely, replace the unconditional skip for examples/human_gate_test_suite.dip in the case "$$f" branch with a conditional that only skips when the temp flag is set (or until a specified version/date), and update the echo message to include the TODO/issue identifier so CI/maintainers can track and remove the skip when dippin-lang IR supports timeout_action.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Makefile`:
- Around line 136-140: The Makefile currently hard-skips
examples/human_gate_test_suite.dip inside the case "$$f" block which creates a
permanent CI blind spot; change the skip so it is explicitly temporary by gating
it on a clearly named env var or feature flag (e.g., LINT_ALLOW_HUMAN_TIMEOUT)
or by adding a TODO/issue reference and an expiry/version gate checked by CI.
Concretely, replace the unconditional skip for
examples/human_gate_test_suite.dip in the case "$$f" branch with a conditional
that only skips when the temp flag is set (or until a specified version/date),
and update the echo message to include the TODO/issue identifier so
CI/maintainers can track and remove the skip when dippin-lang IR supports
timeout_action.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 28c850cd-97f0-43b1-9ea1-4dc8a3390c99
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (4)
.github/workflows/ci.ymlMakefileexamples/human_gate_test_suite.dipgo.mod
✅ Files skipped from review due to trivial changes (2)
- .github/workflows/ci.yml
- go.mod
Applies fixes from CodeRabbit / Copilot / Codex review comments: - PinnedDippinVersion updated to v0.20.0 (was v0.18.0, mismatched go.mod) - checkDiskSpaceLib split to tracker_doctor_unix.go / _windows.go — syscall.Statfs is Unix-only and breaks Windows builds - classifyStatus now returns "fail" for budget_exceeded events (budget-halted runs were mis-classified as success) - checkProvidersLib propagates individual provider error details to section Status (was always "ok" when any provider configured) - checkWorkdirLib propagates warn details to section-level Status - getDippinVersion uses exec.CommandContext with 5s timeout (noctx lint fix) - NDJSONWriter.AgentHandler preserves agent.Event.Timestamp; falls back to time.Now() only when zero - simBFSNodeOrder collects orphan nodes into a sorted slice before appending (deterministic output) - ResolveRunDir calls filepath.Abs to satisfy its "returns absolute path" contract - MostRecentRunID no longer writes to os.Stderr; invalid checkpoints silently skipped - checkPipelineFileLib no longer warns on .dot files (both .dip and .dot are valid) - Fixed ineffectual suffix := "\n" assignment in cmd/tracker/doctor.go maybeFixGitignore - enrichFromEntryNF/updateFailureTimingNF guard against zero start timestamps in duration math - CHANGELOG: added ### Fixed section; removed "skip" from CheckDetail.Status values (not a real status) Skipped (noted in PR summary or tracked separately): - Copilot: stream activity.jsonl via bufio.Scanner in Diagnose (non-trivial refactor, out of scope) - Copilot: gate maybeFixGitignore behind user confirmation (behavior change, needs design decision) - CodeRabbit: buildLibRunSummary stderr write — minor, would require API surface change - CodeRabbit tracker_diagnose.go:135 — collectNodeFailuresLib/loadNodeFailureLib error propagation (larger refactor, separate PR) - CodeRabbit tracker_diagnose.go:101 — DiagnoseMostRecent stderr fixed by silencing MostRecentRunID instead - Codex cmd/tracker/simulate.go:61 — format-aware Simulate API (new feature, out of scope) - CodeRabbit tracker_doctor.go:674 — artifact directory errors as "warn" vs "error" (intentional severity) - Doc-only comments on spec/plan markdown files (not code, low value) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tracker_doctor.go (1)
529-539: Useexec.CommandContextfor consistency with other version checks.Static analysis flagged this
exec.Commandcall. Whileclaude --versionshould be quick, usingCommandContextwith a timeout would prevent potential hangs (consistent with howgetDippinVersionwas fixed).♻️ Proposed fix
func getBinaryVersion(path, flag string) string { - out, err := exec.Command(path, flag).CombinedOutput() + ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + defer cancel() + out, err := exec.CommandContext(ctx, path, flag).CombinedOutput() if err != nil { return "(version unknown)" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_doctor.go` around lines 529 - 539, The getBinaryVersion function currently uses exec.Command which can hang; change it to use exec.CommandContext with a short context timeout (as done in getDippinVersion): create a context with context.WithTimeout, defer cancel, build the command via exec.CommandContext(ctx, path, flag), run CombinedOutput on the cmd, and preserve the existing error handling and trimming logic (return "(version unknown)" on error or empty output). Ensure you reference the function name getBinaryVersion and reuse the same timeout duration pattern used by getDippinVersion.tracker_diagnose.go (1)
284-288: Budget suggestion message couples library to CLI interface.The message references CLI flags (
--max-tokens,--max-cost,--max-wall-time) alongside the library-levelConfig.Budget. Since this is a library API, consider making the message more generic or providing the raw data (limits vs. actual) so CLI and other consumers can format their own actionable message.💡 Suggested alternative
if halt != nil { out = append(out, Suggestion{ Kind: SuggestionBudget, - Message: "Raise the relevant --max-tokens, --max-cost, or --max-wall-time flag, or remove the Config.Budget value", + Message: "Budget limit reached — increase token, cost, or wall-time limits in your configuration", }) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tracker_diagnose.go` around lines 284 - 288, The suggestion message in the Suggestion struct (where out = append(out, Suggestion{ Kind: SuggestionBudget, ... }) currently hardcodes CLI flags and couples library code to the CLI; change this to a library-appropriate message (e.g., "Increase configured budget limits or remove Config.Budget") or instead populate the Suggestion with structured data (e.g., include the budget limits and actual usage values alongside Kind: SuggestionBudget) so CLI/consumers can render actionable flag-specific text; modify the creation of this Suggestion (SuggestionBudget/Message) to either use a generic, non-CLI string or add fields on the Suggestion payload with the raw limit vs actual values for downstream formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tracker_diagnose.go`:
- Around line 284-288: The suggestion message in the Suggestion struct (where
out = append(out, Suggestion{ Kind: SuggestionBudget, ... }) currently hardcodes
CLI flags and couples library code to the CLI; change this to a
library-appropriate message (e.g., "Increase configured budget limits or remove
Config.Budget") or instead populate the Suggestion with structured data (e.g.,
include the budget limits and actual usage values alongside Kind:
SuggestionBudget) so CLI/consumers can render actionable flag-specific text;
modify the creation of this Suggestion (SuggestionBudget/Message) to either use
a generic, non-CLI string or add fields on the Suggestion payload with the raw
limit vs actual values for downstream formatting.
In `@tracker_doctor.go`:
- Around line 529-539: The getBinaryVersion function currently uses exec.Command
which can hang; change it to use exec.CommandContext with a short context
timeout (as done in getDippinVersion): create a context with
context.WithTimeout, defer cancel, build the command via
exec.CommandContext(ctx, path, flag), run CombinedOutput on the cmd, and
preserve the existing error handling and trimming logic (return "(version
unknown)" on error or empty output). Ensure you reference the function name
getBinaryVersion and reuse the same timeout duration pattern used by
getDippinVersion.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8119966b-f0fb-457a-acd3-8523c7de80bd
📒 Files selected for processing (10)
CHANGELOG.mdcmd/tracker/doctor.gotracker_activity.gotracker_audit.gotracker_diagnose.gotracker_doctor.gotracker_doctor_unix.gotracker_doctor_windows.gotracker_events.gotracker_simulate.go
🚧 Files skipped from review as they are similar to previous changes (3)
- tracker_simulate.go
- tracker_events.go
- tracker_activity.go
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 37 out of 38 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Message: fmt.Sprintf("%s valid (%d nodes, %d edges)", pipelineFile, len(graph.Nodes), len(graph.Edges)), | ||
| }) | ||
| out.Status = "ok" | ||
| out.Message = fmt.Sprintf("%s is valid", pipelineFile) | ||
| return out |
There was a problem hiding this comment.
checkPipelineFileLib can append a warn-level detail for non-.dip/.dot files, but still sets section Status="ok" and Message=" is valid" when parsing/validation succeeds. This makes DoctorReport.Warnings/OK inconsistent with the per-detail warning. Consider promoting the section Status to "warn" (and adjusting Message) whenever any warn-level detail is present and there are no errors.
| func buildExecutionPlan(graph *pipeline.Graph) ([]PlanStep, []string) { | ||
| if graph.StartNode == "" { | ||
| return nil, nil | ||
| } |
There was a problem hiding this comment.
buildExecutionPlan returns (nil, nil) when graph.StartNode is empty. Because SimulateReport.ExecutionPlan is not omitempty, this will JSON-marshal as null rather than an empty array, which is awkward for consumers and inconsistent with other report slices. Consider returning empty slices instead (e.g., []PlanStep{} / []string{}) so execution_plan is always an array.
| func (s *NDJSONWriter) Write(evt NDJSONEvent) { | ||
| data, err := json.Marshal(evt) | ||
| if err != nil { | ||
| fmt.Fprintf(os.Stderr, "tracker: marshal NDJSON event: %v\n", err) | ||
| return | ||
| } | ||
| data = append(data, '\n') | ||
| s.mu.Lock() | ||
| defer s.mu.Unlock() | ||
| if _, werr := s.w.Write(data); werr != nil { | ||
| s.errOnce.Do(func() { | ||
| fmt.Fprintf(os.Stderr, "tracker: NDJSON stream write error: %v (further write errors suppressed)\n", werr) | ||
| }) | ||
| } |
There was a problem hiding this comment.
NDJSONWriter.Write writes marshal/write errors directly to os.Stderr. Since this is a public library type, emitting to process-global stderr is a surprising side effect and hard for consumers to test/control. Consider returning an error from Write (or exposing an ErrorHandler/logger option on NDJSONWriter) so callers decide how to surface failures.
| activity, lerr := LoadActivityLog(runDir) | ||
| if lerr != nil { | ||
| fmt.Fprintf(os.Stderr, "warning: run %s: cannot read activity log: %v\n", name, lerr) | ||
| activity = nil // continue with nil so the summary still builds | ||
| } |
There was a problem hiding this comment.
buildLibRunSummary prints a warning directly to os.Stderr when LoadActivityLog fails. For a library API (ListRuns), this creates an unexpected global side effect; callers may want to handle/log these errors themselves. Consider returning the error (or collecting per-run warnings in the RunSummary/AuditReport) instead of writing to stderr.
| fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err) | ||
| return nil |
There was a problem hiding this comment.
loadNodeFailureLib logs JSON unmarshal failures to os.Stderr and then silently drops the node. For a library API intended to cleanly separate data/presentation, this stderr output is an unexpected side effect and makes failures hard to surface programmatically. Consider returning the parse error up the stack (or collecting it into the DiagnoseReport as a warning) rather than printing.
| fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err) | |
| return nil | |
| return &NodeFailure{ | |
| NodeID: nodeID, | |
| Outcome: "status_parse_error", | |
| Stderr: fmt.Sprintf("cannot parse %s: %v", statusPath, err), | |
| } |
| type AuditReport struct { | ||
| RunID string `json:"run_id"` | ||
| Status string `json:"status"` // "completed" | "in_progress" | "failed" | ||
| TotalDuration time.Duration `json:"total_duration_ns"` |
There was a problem hiding this comment.
The spec doc's AuditReport API shape is out of sync with the implementation: it documents Status values as "completed"|"in_progress"|"failed" (and a different RunSummary shape), but tracker_audit.go currently uses Status values like "success"/"fail" and RunSummary has different fields. Please update this doc section to match the actual exported types so library consumers aren't misled.
- checkPipelineFileLib promotes section status to warn when any detail is warn (matches pattern already applied to checkProviders/Workdir). - buildExecutionPlan returns empty slices instead of nil so SimulateReport.ExecutionPlan JSON-marshals as [] not null. - Spec doc AuditReport section updated to match shipped types: Status is "success"/"fail", RunSummary has the implemented fields. Three stderr-side-effect comments deferred to #104 (coordinated API decision; add Logger io.Writer to configs or return errors up-stack). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Batches six related pre-1.0 fixes flagged in the expert panel review of PR #101 (#102, #103, #104, #105, #106, #109). All changes touch the v0.18.0 library surface and are breaking, so ship together: - Thread context.Context through Doctor, Diagnose, DiagnoseMostRecent, Audit, and Simulate. Provider probes and binary version lookups honor ctx; getBinaryVersion now uses exec.CommandContext with a 5-second timeout to match getDippinVersion. - Introduce typed CheckStatus and SuggestionKind so consumers can switch-exhaust. Constant values are unchanged. - NDJSONWriter.Write now returns error. First write failure is still logged once to stderr; subsequent failures surface via return value. - Rename NDJSONEvent -> StreamEvent (wire format unchanged). - Move DoctorConfig.TrackerVersion/TrackerCommit off the config struct behind a WithVersionInfo functional option. - Add LogWriter to DoctorConfig/DiagnoseConfig/AuditConfig. CLI sets it to io.Discard so users stop seeing stray library warnings from audit/list/diagnose on os.Stderr. - sanitizeProviderError strips API keys and bearer tokens from provider error bodies before they land in CheckDetail.Message. - Panic-recover the NDJSON handler closures (pipeline, agent, LLM trace) so a misbehaving writer cannot crash the caller goroutine. - Stream activity.jsonl in Diagnose via bufio.Scanner (1 MB buffer limit), matching LoadActivityLog and avoiding memory spikes on large runs. - Rename internal *Lib/*NF/*ForDiagnose helpers to plain names. - CHANGELOG notes deterministic diagnose suggestion ordering (already alphabetical post-#101, documented as a visible change). Closes #102, #103, #104, #105, #106, #109. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
trackerlibrary package as JSON-serializable reports.cmd/tracker/json_stream.gointotracker.NewNDJSONWriter, closing Phase 1 of feat: achieve CLI↔library feature parity #76.*<Command>Report; CLI prints byte-for-byte identical output.Closes Phase 1 (NDJSON) and Phase 2 of #76. Phase 3 (
DescribeNodes) remains as a follow-up.Test plan
go build ./...cleango test ./... -short -raceall 17 packages passdippin doctorA grade on the three core example pipelinestracker doctoroutput byte-identical before / after (verified in 4 scenarios during Task 6)tracker simulateoutput byte-identical (verified in Task 3)tracker.Diagnose(runDir)and get a non-emptyDiagnoseReport🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Refactor
Tests