feat: CLI-library feature parity (Phase 1 NDJSON + Phase 2) by clintecker · Pull Request #101 · 2389-research/tracker

clintecker · 2026-04-17T17:17:55Z

Summary

Promotes four CLI-private commands (diagnose, audit, doctor, simulate) into the public tracker library package as JSON-serializable reports.
Lifts the private NDJSON event writer out of cmd/tracker/json_stream.go into tracker.NewNDJSONWriter, closing Phase 1 of feat: achieve CLI↔library feature parity #76.
Splits data / presentation cleanly — library returns pure *<Command>Report; CLI prints byte-for-byte identical output.

Closes Phase 1 (NDJSON) and Phase 2 of #76. Phase 3 (DescribeNodes) remains as a follow-up.

Test plan

go build ./... clean
go test ./... -short -race all 17 packages pass
dippin doctor A grade on the three core example pipelines
Smoke-diff: tracker doctor output byte-identical before / after (verified in 4 scenarios during Task 6)
Smoke-diff: tracker simulate output byte-identical (verified in Task 3)
Library consumer smoke: can call tracker.Diagnose(runDir) and get a non-empty DiagnoseReport

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Public tracker library exposing structured, JSON-serializable reports for diagnose, audit, doctor and simulate, plus an NDJSON event writer and handler factories for streaming events.
Documentation
- README examples showing library usage and NDJSON streaming; new design/plan docs and changelog entry.
Refactor
- CLI commands simplified to call the library while preserving human and --json output byte-for-byte; doctor probing now enabled by default.
Tests
- Added unit tests and run fixtures covering diagnostics, auditing, simulation, doctor checks, and NDJSON output.

Design for promoting the diagnose, audit, doctor, and simulate CLI commands into the top-level tracker package as structured reports, plus lifting the private NDJSON event writer to a public API. Closes the Phase 1 NDJSON gap and the full Phase 2 set from #76. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Seven-task plan covering shared activity helpers, NDJSON writer promotion, Simulate / Diagnose / Audit / Doctor library APIs, and docs. TDD steps with fixture run directories; byte-for-byte CLI output preserved via smoke-diff at each task boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Move ResolveRunDir, MostRecentRunID, ActivityEntry, LoadActivityLog, ParseActivityLine into the tracker package. Prerequisite for exposing Diagnose / Audit library APIs. No behavior change; CLI wired to the library helpers. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Log warning instead of silent skip on non-ENOENT checkpoint load errors in MostRecentRunID (CLAUDE.md: never silently swallow errors). - Remove misleading JSON tags from ActivityEntry — the struct is populated via ParseActivityLine, not json.Unmarshal (two timestamp formats can't be handled by time.Time's default unmarshal). - Drop gratuitous type-alias comment in cmd/tracker/audit.go. - Add direct tests for SortActivityByTime ordering and equal timestamps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

NewNDJSONWriter exposes the same --json wire format the CLI uses via PipelineHandler / AgentHandler / TraceObserver. Closes the Phase 1 NDJSON gap from #76. CLI json_stream.go removed; run.go wires the library writer. Wire format unchanged byte-for-byte. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Log marshal errors to stderr (CLAUDE.md: never silently swallow). - Log first w.Write error to stderr via sync.Once; further write errors suppressed to avoid flooding. - Add test covering AgentHandler content priority (ToolOutput wins over Text when both are populated). - Extract duplicated timestamp layout to a package-level constant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tracker.Simulate(source) returns a SimulateReport with nodes, edges, execution plan, and unreachable node list. CLI simulate command becomes a printer over the report; output byte-identical. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- SimulateReport.Unreachable is now sorted alphabetically; eliminates non-deterministic output driven by Go map iteration order. - SimulateReport gains GraphAttrs; CLI reads from the report instead of re-parsing the pipeline just for graph-level attrs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tracker.Diagnose(runDir) and tracker.DiagnoseMostRecent(workdir) return a structured DiagnoseReport with failures, budget halt, and typed suggestions. CLI diagnose command shrunk to a printer over the report; output byte-identical. Fixture runs added under testdata/runs/ (ok, failed, budget_halted) for library unit tests. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- CLI: skip SuggestionBudget in the suggestions printer; printBudgetHalt already renders the budget guidance. Eliminates duplicated prose on budget-halted runs. - Library: log non-ENOENT status.json parse errors to stderr instead of swallowing silently (CLAUDE.md: never silently swallow errors). - Clarify NodeFailure.RetryCount doc comment — it counts failures, not retries beyond the first attempt. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tracker.Audit(runDir) returns a structured AuditReport (timeline, retries, errors, recommendations). tracker.ListRuns(workdir) returns []RunSummary sorted newest first. CLI audit command shrunk to a printer over the reports; output byte-identical. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Log non-ENOENT activity-log load errors in the run summary builder instead of silently discarding (CLAUDE.md: never silently swallow). - Remove redundant sort.Strings from printRecommendations; library already sorts the recommendations slice. - ListRuns now uses sort.SliceStable for deterministic ordering on timestamp ties. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

tracker.Doctor(cfg) runs preflight checks and returns a DoctorReport. ProbeProviders defaults to false (no network calls); CLI sets it true to preserve today's behavior. Write side effects (gitignore fixup, workdir create) stay CLI-only. Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- CLI --probe flag default flipped to true, preserving pre-refactor tracker doctor behavior (live API auth check runs by default). - TestDoctor_NoProviderKeys now unsets all five known provider env vars to avoid false negatives when the dev environment has GOOGLE_API_KEY or OPENAI_COMPAT_API_KEY set. - Artifact-dirs check now promotes Status to "error" when any detail is "error" (e.g., unwritable .ai/ directory); previously those cases were silently demoted to "warn" and the run proceeded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-17T17:18:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Library-first refactor: promotes CLI logic into a public tracker package (Diagnose/Audit/Doctor/Simulate reports, run/activity utilities, NDJSON writer/handlers), updates CLI commands to delegate to these APIs, removes the private JSON stream, adds tests and fixtures, and updates docs/README/CHANGELOG and CI tooling.

Changes

Cohort / File(s)	Summary
Library Report APIs `tracker_diagnose.go`, `tracker_diagnose_test.go`, `tracker_audit.go`, `tracker_audit_test.go`, `tracker_doctor.go`, `tracker_doctor_test.go`, `tracker_simulate.go`, `tracker_simulate_test.go`	Add exported, JSON-serializable report types and entrypoints: `Diagnose`/`DiagnoseMostRecent`, `Audit`, `Doctor` (with config/report types), `Simulate`, plus suggestion/check models and tests.
Activity & Run Utilities `tracker_activity.go`, `tracker_activity_test.go`	Add run-resolution helpers (`ResolveRunDir`, `MostRecentRunID`) and activity JSONL parsing/loading/sorting (`ActivityEntry`, `LoadActivityLog`, `ParseActivityLine`, `SortActivityByTime`) with unit tests.
NDJSON Event Writer `tracker_events.go`, `tracker_events_test.go`	Introduce public `NDJSONWriter` (`NewNDJSONWriter`) and handler factories (`PipelineHandler`, `TraceObserver`, `AgentHandler`) for thread-safe NDJSON streaming and adapters; tests ported to package `tracker`.
CLI: diagnose & audit refactors `cmd/tracker/diagnose.go`, `cmd/tracker/audit.go`	Replace prior in-file parsing/diagnosis/audit logic with calls to `tracker.Diagnose`, `tracker.Audit`, `tracker.ListRuns`, and adapt printing to consume library report types (large deletions of local helpers).
CLI: doctor, flags & tests `cmd/tracker/doctor.go`, `cmd/tracker/doctor_test.go`, `cmd/tracker/flags.go`	Delegate preflight checks to `tracker.Doctor`, change `--probe` default to true, adapt output/gitignore handling, and refocus CLI tests on shim behavior and side-effects.
CLI: run & simulate `cmd/tracker/run.go`, `cmd/tracker/simulate.go`	Wire `tracker.NewNDJSONWriter` in place of removed private stream; call `tracker.Simulate` and render via `SimulateReport` types (remove local BFS helpers).
Removed Private Impl `cmd/tracker/json_stream.go`	Delete prior internal JSON stream implementation (replaced by public `tracker_events.go`).
Docs & Examples `CHANGELOG.md`, `README.md`, `docs/superpowers/...`	Add changelog/readme examples and design/plan/spec docs describing CLI↔library parity, NDJSON writer, and migration steps.
Test fixtures `testdata/runs/ok/`, `testdata/runs/failed/`, `testdata/runs/budget_halted/*`	Add checkpoint and activity.jsonl fixtures for success, failure, and budget-halt scenarios used by new tests.
CI / Tooling / Make / examples `.github/workflows/ci.yml`, `Makefile`, `go.mod`, `examples/human_gate_test_suite.dip`	Bump dippin in CI, skip one example in lint loop, update module deps, and rename two example attributes for human gates.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as User/CLI
    participant Pipeline as Pipeline
    participant LLM as LLM
    participant Agent as Agent
    participant Writer as tracker.NDJSONWriter
    participant Stdout as Stdout

    CLI->>Pipeline: start run / invoke tracker.Run
    Pipeline-->>Writer: emit pipeline events (stage_started/completed/failed)
    alt LLM traces generated
        LLM-->>Writer: emit trace events (provider, model, preview)
    end
    alt Agent activities occur
        Agent-->>Writer: emit agent events (tool output or text)
    end
    Writer->>Stdout: marshal NDJSONEvent -> write newline (thread-safe)
    Stdout-->>CLI: stream of NDJSON lines (--json)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

fix(cli): suppress library stderr warnings in tracker audit/list/diagnose #104 — Surfaces stderr warnings from audit/diagnose code; this PR moves those code paths into library functions that currently write warnings to stderr and may relate to making logging pluggable/suppressible.
feat: achieve CLI↔library feature parity #76 — Implements the CLI↔library parity and NDJSON writer features requested by this issue.
docs(tracker): library API stability, migration guide, godoc gaps #110 — Adds the exported API surface, README/CHANGELOG examples, and godoc-targeted symbols described in the issue.
refactor(tracker): library API polish before v1.0 #102 — Overlaps API-shape decisions (Doctor/Diagnose/Audit/Simulate and NDJSON writer signatures) that the issue requests to refine.

Possibly related PRs

fix: comprehensive tracker doctor checks with actionable errors #83 — Overlaps on doctor preflight checks and the migration of doctor logic to an exported Doctor API.
feat(tracker): library API for workflow catalog and run resolution (#76 Phase 1) #88 — Related run-resolution and workflow utilities; both PRs add run-resolution helpers and workflow parsing utilities.
feat: token & cost governance (library exposure + pipeline ceilings) #68 — Related budget/cost event shape and budget-exceeded diagnostics consumed by Diagnose and fixtures.

Poem

🐰 I hop through logs and code anew,

Reports and streams now public, too.
NDJSON lines that safely write,
Runs diagnosed by morning light.
Tests pass — carrots for the crew!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 37.58% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title is specific and accurately describes the main change: promoting CLI commands into the library as part of a feature parity effort across two phases.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/library-parity-phase-2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42579425b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-17T17:24:32Z

+		source = string(data)
+	}

+	report, err := tracker.Simulate(source)


Preserve requested pipeline format in simulate report parsing

runSimulateCmd already parsed the pipeline with loadPipeline(resolved, formatOverride), but this second parse calls tracker.Simulate(source) with no format context. That drops --format/extension-based resolution and can misclassify valid DOT input (for example DOT files with a leading comment/BOM), causing tracker simulate to fail after the first parse succeeded. This is a regression from the previous single-parse flow and breaks valid simulate inputs.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-17T17:24:32Z

+		if d.Status == "error" {
+			out.Status = "error"
+			break


Keep artifact-directory problems as doctor warnings

This promotion turns artifact-directory permission problems into Status="error", which now counts as a hard failure in cmd/tracker/doctor.go (toDoctorResult maps report errors directly to failures). Previously this check was non-required and surfaced as a warning, so tracker doctor could exit with warning-only status; after this change the same .ai issues now return exit code 1 and can fail CI/scripts unexpectedly.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cmd/tracker/diagnose.go (1)
11-192: ⚠️ Potential issue | 🟡 Minor

Run gofmt on this file before merge.

CI is currently red on cmd/tracker/diagnose.go, so this change set does not pass the repository’s format gate yet.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/diagnose.go` around lines 11 - 192, The file
cmd/tracker/diagnose.go is not gofmt'ed; run a formatter to fix
whitespace/formatting so CI passes. Fix by running `gofmt -w` (or `gofmt -w
cmd/tracker/diagnose.go` / `goimports -w`) on the file containing functions like
diagnoseMostRecent, printDiagnoseHeader, printNodeDiagnosis, and
printDiagnoseSuggestions to normalize import formatting, spacing, and line
breaks, then re-run CI.

🧹 Nitpick comments (8)

tracker_activity.go (1)

78-82: Library function writes directly to os.Stderr.

MostRecentRunID writes warnings to os.Stderr, which is unexpected for a library function. Library consumers may want to capture, suppress, or redirect these warnings. Consider returning warnings as part of the function signature or using a logger interface.

♻️ Alternative: Return warnings alongside the result

-func MostRecentRunID(workdir string) (string, error) {
+// MostRecentRunIDResult contains the result and any warnings from MostRecentRunID.
+type MostRecentRunIDResult struct {
+	RunID    string
+	Warnings []string
+}
+
+func MostRecentRunID(workdir string) (MostRecentRunIDResult, error) {
 	runsDir := filepath.Join(workdir, ".tracker", "runs")
 	entries, err := os.ReadDir(runsDir)
 	if err != nil {
 		if os.IsNotExist(err) {
-			return "", fmt.Errorf("no runs found — run a pipeline first")
+			return MostRecentRunIDResult{}, fmt.Errorf("no runs found — run a pipeline first")
 		}
-		return "", fmt.Errorf("cannot read runs directory: %w", err)
+		return MostRecentRunIDResult{}, fmt.Errorf("cannot read runs directory: %w", err)
 	}
 	var latestID string
 	var latestTime time.Time
+	var warnings []string
 	for _, e := range entries {
 		// ...
 		if err != nil {
 			if !os.IsNotExist(err) {
-				fmt.Fprintf(os.Stderr, "warning: skipping run %s: %v\n", e.Name(), err)
+				warnings = append(warnings, fmt.Sprintf("skipping run %s: %v", e.Name(), err))
 			}
 			continue
 		}
 		// ...
 	}
 	if latestID == "" {
-		return "", fmt.Errorf("no runs found with valid checkpoints")
+		return MostRecentRunIDResult{}, fmt.Errorf("no runs found with valid checkpoints")
 	}
-	return latestID, nil
+	return MostRecentRunIDResult{RunID: latestID, Warnings: warnings}, nil
 }

Alternatively, this is acceptable if the CLI is the primary consumer and you document the stderr behavior.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tracker_activity.go` around lines 78 - 82, The MostRecentRunID function
currently writes warnings directly to os.Stderr; change it to avoid
side-effectful stderr writes by either (A) adding a logger/io.Writer parameter
(e.g., logger or io.Writer warnOut) and writing warnings to that instead, or (B)
accumulate warning messages in a slice and return them alongside the result
(e.g., change signature to return (runID string, warnings []string, err error));
update the error handling block that currently calls fmt.Fprintf(os.Stderr, ...)
to append the formatted message to the warnings slice or send it to the injected
logger/writer, and update call sites accordingly.

tracker_doctor.go (1)

521-531: Use exec.CommandContext for consistency.

getBinaryVersion also uses exec.Command without context. Apply the same fix for consistency and to prevent potential hangs.

🔧 Proposed fix

-func getBinaryVersion(path, flag string) string {
-	out, err := exec.Command(path, flag).CombinedOutput()
+func getBinaryVersion(ctx context.Context, path, flag string) string {
+	out, err := exec.CommandContext(ctx, path, flag).CombinedOutput()
 	if err != nil {
 		return "(version unknown)"
 	}

Then update the caller at line 486:

-		claudeVer := getBinaryVersion(claudePath, "--version")
+		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		claudeVer := getBinaryVersion(ctx, claudePath, "--version")
+		cancel()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 521 - 531, getBinaryVersion currently calls
exec.Command without a context which risks hangs; change its signature to accept
a context (e.g., getBinaryVersion(ctx context.Context, path, flag string)
string) and use exec.CommandContext(ctx, path, flag). Keep the same error/return
behavior, and update all callers to pass the appropriate context (the calling
function's ctx) so the command inherits cancellation/timeouts; reference
getBinaryVersion and use exec.CommandContext for the actual replacement.

tracker_audit.go (1)

215-219: Library function writes warning to os.Stderr.

Similar to MostRecentRunID, buildLibRunSummary writes warnings directly to os.Stderr. For consistency with library design, consider an alternative approach (returning warnings, using a logger, or documenting the behavior).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tracker_audit.go` around lines 215 - 219, buildLibRunSummary currently writes
a warning directly to os.Stderr when LoadActivityLog fails (similar to
MostRecentRunID); instead return the warning or use the library's logger so
callers can handle it. Modify buildLibRunSummary to not print to os.Stderr: have
LoadActivityLog return the error as before, and propagate that error or append a
warning string to a returned warnings slice (or accept a logger interface) so
callers receive the warning; specifically update the block handling (activity,
lerr := LoadActivityLog(runDir)) to remove fmt.Fprintf(os.Stderr, ...) and
either return lerr up the call chain or record the message into a []string
warnings return value (or call a provided logger), ensuring callers of
buildLibRunSummary are updated to handle the new returned warning or error.

cmd/tracker/run.go (1)

242-245: Cache the agent NDJSON handler once per run.

stream.AgentHandler() is rebuilt per event inside the closure. Cache it once to avoid per-event allocation churn.

♻️ Proposed refactor

 func buildJSONEventHandlers(
 	activityLog *pipeline.JSONLEventHandler,
 	llmClient *llm.Client,
 	logAgentEvent func(agent.Event),
 ) (agent.EventHandler, pipeline.PipelineEventHandler) {
 	stream := tracker.NewNDJSONWriter(os.Stdout)
 	if llmClient != nil {
 		llmClient.AddTraceObserver(stream.TraceObserver())
 	}
+	streamAgentHandler := stream.AgentHandler()
 	agentHandler := agent.EventHandlerFunc(func(evt agent.Event) {
 		logAgentEvent(evt)
-		stream.AgentHandler().HandleEvent(evt)
+		streamAgentHandler.HandleEvent(evt)
 	})
 	pipelineHandler := pipeline.PipelineMultiHandler(stream.PipelineHandler(), activityLog)
 	return agentHandler, pipelineHandler
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/run.go` around lines 242 - 245, The closure rebuilds the NDJSON
handler on every event; cache stream.AgentHandler() once before creating
agentHandler to avoid per-event allocation. Retrieve and store the handler in a
local variable (e.g., cachedAgentHandler := stream.AgentHandler()) and then use
cachedAgentHandler.HandleEvent(evt) inside the agent.EventHandlerFunc closure;
keep existing calls to logAgentEvent(evt) and the agentHandler variable name.

tracker_doctor_test.go (1)

11-14: Prefer t.Setenv over manual os.Setenv + cleanup.

Go 1.17+ provides t.Setenv which automatically handles cleanup. This is cleaner and less error-prone.

♻️ Suggested simplification

 func TestDoctor_NoProbe_KeyPresent(t *testing.T) {
 	workdir := t.TempDir()
-	must(t, os.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890"))
-	t.Cleanup(func() { os.Unsetenv("ANTHROPIC_API_KEY") })
+	t.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor_test.go` around lines 11 - 14, The test
TestDoctor_NoProbe_KeyPresent manually calls os.Setenv and t.Cleanup to restore
ANTHROPIC_API_KEY; replace that pattern with t.Setenv("ANTHROPIC_API_KEY",
"sk-ant-test-12345678901234567890") in the TestDoctor_NoProbe_KeyPresent
function so the testing framework handles teardown automatically and remove the
manual os.Setenv and t.Cleanup calls.

cmd/tracker/audit.go (1)

95-105: Stale comment describes removed behavior.

The comment on lines 97-99 mentions "Re-read timestamps from original entries" and "rebuild from the timeline slice itself," but the code simply iterates and prints each entry. The comment seems to be a leftover from refactoring.

📝 Suggested comment cleanup

-	// Track stage starts for inline duration computation during printing.
 	stageStarts := make(map[string]time.Time)
-	// Re-read timestamps from original entries; TimelineEntry carries Duration
-	// for completed stages but we need start times for the printTimelineEntry format.
-	// Rebuild from the timeline slice itself.
 	for _, entry := range timeline {
 		printTimelineEntryFromLib(entry, stageStarts)
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/audit.go` around lines 95 - 105, The existing comment incorrectly
describes rebuilding timestamps from entries; update or remove it to match the
actual behavior: stageStarts := make(map[string]time.Time) is created and the
code simply iterates over timeline calling printTimelineEntryFromLib(entry,
stageStarts) and then printTimelineTotalDurationFromLib(timeline). Replace the
stale lines about "Re-read timestamps..." and "Rebuild from the timeline slice
itself" with a concise comment stating that stageStarts is used for inline
duration computation and each timeline entry is printed via
printTimelineEntryFromLib, or remove the comment entirely.

cmd/tracker/doctor_test.go (1)

245-253: Consider using strings.Contains instead of a custom helper.

The custom contains helper reimplements strings.Contains. Adding "strings" to imports is trivial and avoids maintaining custom code.

♻️ Suggested simplification

+import (
+	...
+	"strings"
+	...
+)

-// contains is a tiny helper to avoid dragging strings.Contains into tests.
-func contains(haystack, needle string) bool {
-	for i := 0; i+len(needle) <= len(haystack); i++ {
-		if haystack[i:i+len(needle)] == needle {
-			return true
-		}
-	}
-	return false
-}
+// Then use strings.Contains directly in tests

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/doctor_test.go` around lines 245 - 253, The custom contains
helper reimplements standard library functionality; remove the
contains(haystack, needle string) function and replace its uses with
strings.Contains by adding "strings" to the imports and updating call sites to
strings.Contains(haystack, needle); ensure all tests compile after removing the
helper and adjust any package-scoped references to contains accordingly.

cmd/tracker/simulate.go (1)

17-67: Dual parsing reads the pipeline source twice.

The function first parses via loadPipeline/loadEmbeddedPipeline (lines 25-32) for validation warnings, then re-reads the source and parses again via tracker.Simulate (lines 46-64). While the comments acknowledge this is intentional (library Simulate doesn't surface validation warnings), consider whether pipeline.ValidateAll could be moved to the library or if the source could be cached after the first read.

This is a minor performance consideration—acceptable for now since pipeline files are typically small.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/simulate.go` around lines 17 - 67, runSimulateCmd currently
reads/parses the pipeline twice: first via loadPipeline/loadEmbeddedPipeline so
pipeline.ValidateAll can emit warnings, then it re-reads the file/string and
calls tracker.Simulate; to avoid double reads either (A) reuse the
already-loaded source by capturing the original file contents when calling
loadPipeline/loadEmbeddedPipeline and pass that string into tracker.Simulate, or
(B) move the validation call into tracker.Simulate so you only call
tracker.Simulate once; update runSimulateCmd to stop re-reading the file when
isEmbedded/is not embedded after loading (referencing runSimulateCmd,
loadPipeline, loadEmbeddedPipeline, pipeline.ValidateAll, and tracker.Simulate).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 10-23: Add a new "### Fixed" subsection under the Unreleased
heading and list the behavior/determinism and error-handling fixes introduced by
this PR (e.g., deterministic ordering and improved error handling in
event/tracking flows), referencing the affected APIs and files so readers can
find the fixes—mention tracker.NewNDJSONWriter and the factory handlers
(PipelineHandler, AgentHandler, TraceObserver) if they gained deterministic
output, and call out CLI/library behavior fixes in cmd/tracker/diagnose.go,
audit.go, doctor.go, and simulate.go (and any specific fixes to run-directory
resolution or activity parsing like ResolveRunDir, MostRecentRunID,
LoadActivityLog, ParseActivityLine) as concise bullet points under the new "###
Fixed" header.

In `@cmd/tracker/doctor.go`:
- Around line 240-247: The initial "suffix := \"\\n\"" is ineffectual because
it's immediately overwritten; remove that dead assignment and instead declare
suffix once (e.g., var suffix string) then set it inside the existing if/else
that checks content and content[len(content)-1] in the same block, then append
strings.Join(toAppend, "\n") + "\n" to suffix before calling os.WriteFile with
append(content, []byte(suffix)...). Reference variables/functions: suffix,
content, toAppend, gitignorePath, os.WriteFile.

In `@docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md`:
- Around line 62-63: Replace the hard-coded package-count expectation string
"Expected: build succeeds, all 14 packages pass." (and any similar occurrences
like the later "15" or other numeric counts) with a non-stale verification
instruction such as "Expected: build succeeds and go test ./... passes" or
update the number to match the current PR summary (e.g., 17) if you must keep a
count; ensure you change every matching phrase in this doc (and the repeated
occurrence noted) so the plan no longer hard-codes package counts.

In `@docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md`:
- Around line 149-172: Update the documented Audit API types to match the
implementation: change AuditReport.Status to the actual enum values used in code
(use the exact strings the implementation uses instead of "completed" |
"in_progress" | "failed"), update RunSummary to match the richer shape the PR
implements (replace {RunID, StartedAt, Status, NodeCount} with the actual fields
used in the branch), and ensure any other struct fields in AuditReport (e.g.,
TimelineEntry, RetryRecord, ActivityError) and their JSON tags match the
implementation signatures; locate the declarations named AuditReport,
TimelineEntry, RetryRecord, ActivityError and RunSummary in the spec and edit
their types and documented status values to mirror the shipped contract exactly.

In `@tracker_audit.go`:
- Around line 1-3: The file has gofmt formatting issues; run the formatter and
save the file to fix whitespace/indentation (e.g., run `gofmt -w
tracker_audit.go`) so the package declaration and leading comments in package
tracker conform to gofmt rules; ensure no other formatting changes are left
unstaged before committing.
- Around line 119-132: The classifyStatus function only treats
"pipeline_completed" and "pipeline_failed" as terminal events, causing
budget-halted runs with a "budget_exceeded" activity to be misclassified as
success; update the switch inside classifyStatus (iterating activity[i].Type) to
include case "budget_exceeded": return "fail" so budget-exceeded terminal events
are treated as failures (or alternatively extend the terminal-event check to
include other terminating activity types if preferred).

In `@tracker_diagnose.go`:
- Around line 100-129: collectNodeFailuresLib and loadNodeFailureLib currently
swallow I/O and JSON errors; change their signatures to return errors (e.g.,
collectNodeFailuresLib returns (map[string]*NodeFailure, error) and
loadNodeFailureLib returns (*NodeFailure, error)), treat os.ReadDir/os.ReadFile
and json.Unmarshal failures as real errors to propagate up (only treat
explicitly optional files like missing activity.jsonl as ignorable), replace
fmt.Fprintf to stderr with returned errors, and update callers to
handle/aggregate and expose these errors so a partial/incomplete diagnosis is
not silently reported (also apply the same change pattern to the related helpers
referenced around lines 156-166).

In `@tracker_doctor_test.go`:
- Around line 34-47: The test TestDoctor_NoProviderKeys unsets provider env vars
directly which doesn't restore prior values; replace the os.Unsetenv loop with
calls to t.Setenv for each variable (e.g. t.Setenv("ANTHROPIC_API_KEY", "")
etc.) so the original environment is automatically restored after the test when
invoking Doctor with DoctorConfig{WorkDir: workdir, ProbeProviders: false};
update the loop that references ANTHROPIC_API_KEY, OPENAI_API_KEY,
GEMINI_API_KEY, GOOGLE_API_KEY, OPENAI_COMPAT_API_KEY accordingly.

In `@tracker_doctor.go`:
- Around line 365-380: The getDippinVersion function uses exec.Command which can
hang; switch to exec.CommandContext with a short context timeout: create a
context.WithTimeout (e.g., 2s) and call exec.CommandContext(ctx, path,
"--version") and if that fails try exec.CommandContext with "version"; make sure
to cancel the context (defer cancel()) before returning and preserve the same
output-parsing logic and fallback "(version unknown)" behavior in
getDippinVersion.
- Around line 692-718: checkDiskSpaceLib uses syscall.Statfs which is Unix-only
and breaks Windows builds; either restrict this implementation to non-Windows
using a build constraint or provide a Windows counterpart. Fix by moving this
function into a file with a //go:build !windows (or // +build !windows) header
(keeping the same function name checkDiskSpaceLib) and implement an alternative
checkDiskSpaceLib in a separate file for Windows (or use a cross-platform
library such as golang.org/x/sys/unix for Unix and golang.org/x/sys/windows or a
portable disk-space package for Windows) so the symbol checkDiskSpaceLib is
defined on all platforms and Windows builds no longer fail.

In `@tracker_events.go`:
- Around line 108-117: The NDJSONEvent is being stamped with time.Now() which
overwrites the original agent event time; change the timestamp assignment to
prefer evt.Timestamp formatted with ndjsonTimestampLayout and only use
time.Now() when evt.Timestamp.IsZero(). Update the construction of NDJSONEvent
(the Timestamp field) to format evt.Timestamp if non-zero, otherwise fall back
to time.Now().Format(ndjsonTimestampLayout), keeping all other fields (Source,
Type, NodeID, Provider, Model, ToolName, Content) unchanged.

In `@tracker_simulate.go`:
- Around line 164-168: The orphan nodes appended in simBFSNodeOrder are
non-deterministic because iterating graph.Nodes (a map) yields arbitrary order;
collect orphans into a separate slice (e.g., orphanNodes), sort that slice by a
stable key such as node.ID (or node.Name) using the existing sort package, then
append the sorted orphanNodes to ordered instead of appending directly from the
map; update simBFSNodeOrder to perform these steps so output and tests become
deterministic.

---

Outside diff comments:
In `@cmd/tracker/diagnose.go`:
- Around line 11-192: The file cmd/tracker/diagnose.go is not gofmt'ed; run a
formatter to fix whitespace/formatting so CI passes. Fix by running `gofmt -w`
(or `gofmt -w cmd/tracker/diagnose.go` / `goimports -w`) on the file containing
functions like diagnoseMostRecent, printDiagnoseHeader, printNodeDiagnosis, and
printDiagnoseSuggestions to normalize import formatting, spacing, and line
breaks, then re-run CI.

---

Nitpick comments:
In `@cmd/tracker/audit.go`:
- Around line 95-105: The existing comment incorrectly describes rebuilding
timestamps from entries; update or remove it to match the actual behavior:
stageStarts := make(map[string]time.Time) is created and the code simply
iterates over timeline calling printTimelineEntryFromLib(entry, stageStarts) and
then printTimelineTotalDurationFromLib(timeline). Replace the stale lines about
"Re-read timestamps..." and "Rebuild from the timeline slice itself" with a
concise comment stating that stageStarts is used for inline duration computation
and each timeline entry is printed via printTimelineEntryFromLib, or remove the
comment entirely.

In `@cmd/tracker/doctor_test.go`:
- Around line 245-253: The custom contains helper reimplements standard library
functionality; remove the contains(haystack, needle string) function and replace
its uses with strings.Contains by adding "strings" to the imports and updating
call sites to strings.Contains(haystack, needle); ensure all tests compile after
removing the helper and adjust any package-scoped references to contains
accordingly.

In `@cmd/tracker/run.go`:
- Around line 242-245: The closure rebuilds the NDJSON handler on every event;
cache stream.AgentHandler() once before creating agentHandler to avoid per-event
allocation. Retrieve and store the handler in a local variable (e.g.,
cachedAgentHandler := stream.AgentHandler()) and then use
cachedAgentHandler.HandleEvent(evt) inside the agent.EventHandlerFunc closure;
keep existing calls to logAgentEvent(evt) and the agentHandler variable name.

In `@cmd/tracker/simulate.go`:
- Around line 17-67: runSimulateCmd currently reads/parses the pipeline twice:
first via loadPipeline/loadEmbeddedPipeline so pipeline.ValidateAll can emit
warnings, then it re-reads the file/string and calls tracker.Simulate; to avoid
double reads either (A) reuse the already-loaded source by capturing the
original file contents when calling loadPipeline/loadEmbeddedPipeline and pass
that string into tracker.Simulate, or (B) move the validation call into
tracker.Simulate so you only call tracker.Simulate once; update runSimulateCmd
to stop re-reading the file when isEmbedded/is not embedded after loading
(referencing runSimulateCmd, loadPipeline, loadEmbeddedPipeline,
pipeline.ValidateAll, and tracker.Simulate).

In `@tracker_activity.go`:
- Around line 78-82: The MostRecentRunID function currently writes warnings
directly to os.Stderr; change it to avoid side-effectful stderr writes by either
(A) adding a logger/io.Writer parameter (e.g., logger or io.Writer warnOut) and
writing warnings to that instead, or (B) accumulate warning messages in a slice
and return them alongside the result (e.g., change signature to return (runID
string, warnings []string, err error)); update the error handling block that
currently calls fmt.Fprintf(os.Stderr, ...) to append the formatted message to
the warnings slice or send it to the injected logger/writer, and update call
sites accordingly.

In `@tracker_audit.go`:
- Around line 215-219: buildLibRunSummary currently writes a warning directly to
os.Stderr when LoadActivityLog fails (similar to MostRecentRunID); instead
return the warning or use the library's logger so callers can handle it. Modify
buildLibRunSummary to not print to os.Stderr: have LoadActivityLog return the
error as before, and propagate that error or append a warning string to a
returned warnings slice (or accept a logger interface) so callers receive the
warning; specifically update the block handling (activity, lerr :=
LoadActivityLog(runDir)) to remove fmt.Fprintf(os.Stderr, ...) and either return
lerr up the call chain or record the message into a []string warnings return
value (or call a provided logger), ensuring callers of buildLibRunSummary are
updated to handle the new returned warning or error.

In `@tracker_doctor_test.go`:
- Around line 11-14: The test TestDoctor_NoProbe_KeyPresent manually calls
os.Setenv and t.Cleanup to restore ANTHROPIC_API_KEY; replace that pattern with
t.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890") in the
TestDoctor_NoProbe_KeyPresent function so the testing framework handles teardown
automatically and remove the manual os.Setenv and t.Cleanup calls.

In `@tracker_doctor.go`:
- Around line 521-531: getBinaryVersion currently calls exec.Command without a
context which risks hangs; change its signature to accept a context (e.g.,
getBinaryVersion(ctx context.Context, path, flag string) string) and use
exec.CommandContext(ctx, path, flag). Keep the same error/return behavior, and
update all callers to pass the appropriate context (the calling function's ctx)
so the command inherits cancellation/timeouts; reference getBinaryVersion and
use exec.CommandContext for the actual replacement.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c37ecbbe-fa56-4fde-9b45-6d379db58ae3

📥 Commits

Reviewing files that changed from the base of the PR and between 76164b0 and 4257942.

📒 Files selected for processing (31)

CHANGELOG.md
README.md
cmd/tracker/audit.go
cmd/tracker/diagnose.go
cmd/tracker/doctor.go
cmd/tracker/doctor_test.go
cmd/tracker/flags.go
cmd/tracker/json_stream.go
cmd/tracker/run.go
cmd/tracker/simulate.go
docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md
docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md
testdata/runs/budget_halted/activity.jsonl
testdata/runs/budget_halted/checkpoint.json
testdata/runs/failed/Build/status.json
testdata/runs/failed/activity.jsonl
testdata/runs/failed/checkpoint.json
testdata/runs/ok/activity.jsonl
testdata/runs/ok/checkpoint.json
tracker_activity.go
tracker_activity_test.go
tracker_audit.go
tracker_audit_test.go
tracker_diagnose.go
tracker_diagnose_test.go
tracker_doctor.go
tracker_doctor_test.go
tracker_events.go
tracker_events_test.go
tracker_simulate.go
tracker_simulate_test.go

💤 Files with no reviewable changes (1)

cmd/tracker/json_stream.go

coderabbitai · 2026-04-17T17:28:59Z

+Expected: build succeeds, all 14 packages pass. If any package is red before you start, stop — resolve that first.
+


⚠️ Potential issue | 🟡 Minor

Avoid hard-coding package counts in the verification steps.

This plan already disagrees with itself (14 here vs 15 later), and the PR summary for this branch reports 17 passing packages. Please switch these expectations to “go test ./... passes” or update them to the current count so the plan does not go stale immediately.

Also applies to: 2619-2619

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md` around lines 62 - 63, Replace the hard-coded package-count expectation string "Expected: build succeeds, all 14 packages pass." (and any similar occurrences like the later "15" or other numeric counts) with a non-stale verification instruction such as "Expected: build succeeds and go test ./... passes" or update the number to match the current PR summary (e.g., 17) if you must keep a count; ensure you change every matching phrase in this doc (and the repeated occurrence noted) so the plan no longer hard-codes package counts.

- enrichFromEntryNF now uses the dual-format timestamp parser shared with ParseActivityLine; previously dropped all timestamps from logs using the millisecond format, zeroing NodeFailure.Duration and silently disabling the suspicious-timing suggestion heuristic. - CLI --probe help text updated: usage shows [--probe=false] since probe is now the default. - Tests use t.Setenv for Doctor env-var mutations so pre-existing values are restored on cleanup. - CheckResult.Status and CheckDetail.Status doc comments list the fifth value "hint" used for informational sub-items. CHANGELOG note added. - Diagnose and Audit doc comments warn that runDir is trusted input; point callers at ResolveRunDir / DiagnoseMostRecent for untrusted paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fix CI format check — realign struct field tags and trim a trailing blank line that slipped in with the Phase 2 work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 5

♻️ Duplicate comments (3)

tracker_doctor.go (2)
365-369: ⚠️ Potential issue | 🟠 Major

Use exec.CommandContext with timeout for external binary version calls.

Line 366/368 and Line 522 call external binaries without context; an unresponsive executable can hang Doctor indefinitely.
#!/bin/bash
set -euo pipefail

# Verify direct exec.Command usages in this file
rg -nP '\bexec\.Command\(' tracker_doctor.go

# Show the two helper functions for targeted patching
rg -n 'func getDippinVersion|func getBinaryVersion' tracker_doctor.go -A25
Also applies to: 521-523
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 365 - 369, Replace direct exec.Command calls
in getDippinVersion and getBinaryVersion with exec.CommandContext using a
context with a reasonable timeout (e.g., context.WithTimeout) to avoid hanging
if the external binary is unresponsive; create ctx, defer cancel, call
exec.CommandContext(ctx, ...).CombinedOutput(), and handle context deadline
exceeded or other errors appropriately when logging/returning the version from
those functions.
693-696: ⚠️ Potential issue | 🔴 Critical

syscall.Statfs is Unix-only; this implementation is not cross-platform.

Line 695 uses syscall.Statfs, which is unavailable on Windows builds. This can break compilation unless this function is split behind OS build tags.
#!/bin/bash
set -euo pipefail

echo "Statfs usage:"
rg -n 'syscall\.Statfs|Statfs_t' --type go

echo
echo "Build tags near doctor files:"
fd -e go | rg 'doctor' | while read -r f; do
  head -n 3 "$f" | sed "s|^|$f: |"
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 693 - 696, The checkDiskSpaceLib function
uses syscall.Statfs (Statfs_t) which is Unix-only and will fail Windows builds;
split the implementation by moving the current checkDiskSpaceLib into a file
with a non-Windows build tag (e.g., //go:build !windows) and create a
Windows-specific stub file (with //go:build windows) that defines
checkDiskSpaceLib and returns an appropriate CheckResult (e.g., "unsupported on
windows" or a safe default) so the symbol exists on all platforms and
compilation succeeds; reference the symbols checkDiskSpaceLib, syscall.Statfs,
and Statfs_t when locating and splitting the code.
tracker_diagnose.go (1)
106-110: ⚠️ Potential issue | 🟠 Major

Don’t silently downgrade unreadable or corrupt artifacts into a “clean” report.

Lines 110, 127, 134-135, 166, and 183-184 currently collapse artifact I/O / JSON failures into “no failures” or “no budget halt”. For a public library API, that produces an apparently valid DiagnoseReport with missing retry/timing data and no indication that diagnosis was incomplete. Only genuinely optional files should be ignored; the other read/parse failures need to be returned or surfaced on the report.

Also applies to: 123-135, 162-166, 183-184
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_diagnose.go` around lines 106 - 110, collectNodeFailuresLib (and
related readers used by collectNodeFailuresLib) currently swallows filesystem
and JSON parse errors and returns an empty map, which hides incomplete
diagnosis; modify collectNodeFailuresLib to surface I/O and parse failures
instead of treating them as “no failures” — either change its signature to
return (map[string]*NodeFailure, error) or attach error details to the resulting
DiagnoseReport (e.g., an Errors/Incomplete field) so callers can detect
incomplete runs; update any callers (and the creation of DiagnoseReport) to
propagate and/or include those errors (reference symbols:
collectNodeFailuresLib, NodeFailure, DiagnoseReport) and ensure only truly
optional files continue to be ignored.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tracker_diagnose.go`:
- Around line 200-205: When parsing timestamps for stage events, the code
currently stores a zero time when parseActivityTimestampLib fails and later uses
it for duration math; modify the logic so that in the "stage_started" branch you
only set stageStarts[entry.NodeID] when parseActivityTimestampLib returned a
successful timestamp (check the boolean/err result), and in
updateFailureTimingNF (the function called for "stage_failed") guard the
duration calculation by verifying both the start from stageStarts and the end ts
are non-zero (or otherwise valid) before subtracting to compute duration; this
prevents incorrect durations derived from time.Time{} defaults.
- Around line 95-101: DiagnoseMostRecent currently calls MostRecentRunID which
prints warnings to os.Stderr, leaking CLI output from a library function; change
DiagnoseMostRecent to call a non-printing variant (e.g., MostRecentRunIDQuiet)
or adjust MostRecentRunID to accept a suppressWarnings bool or an io.Writer for
diagnostics so it doesn't write to os.Stderr by default, and propagate any
skipped-directory warnings back as structured data (e.g., return an error type
or a slice of warning strings) instead of printing; update DiagnoseMostRecent to
use the quiet variant/signature and handle returned warnings or errors
accordingly so the public API emits no unsolicited stderr output.

In `@tracker_doctor.go`:
- Around line 217-233: The provider loop correctly appends CheckDetail entries
with Status "error" for invalid/auth-failed keys (see isValidAPIKey,
probeProvider, and out.Details), but later logic force-sets the section summary
to "ok" when any provider is configured; update the summary computation (the
code around where section/status is set after the loop) to scan out.Details (or
the provider-specific slice) and set the section's Status to "error" if any
CheckDetail has Status == "error" (otherwise "ok"), so the DoctorReport counts
reflect the detail-level errors instead of always becoming "ok".
- Around line 24-27: The PinnedDippinVersion constant is out of date; update the
declaration of PinnedDippinVersion to match the go.mod requirement by changing
its value to "v0.19.1" so Doctor's version check uses the correct pinned version
(look for the const PinnedDippinVersion = ... in tracker_doctor.go).
- Around line 561-583: The section unconditionally sets out.Status = "ok" even
after adding warn CheckDetail entries (from the workdir home check and
missingGitignoreEntries), which hides warnings; update the end of the function
to compute and set out.Status and out.Message based on the details (or at
minimum set out.Status = "warn" when either the home/workdir check or
missingGitignoreEntries produced a warn), e.g. inspect the details slice for any
CheckDetail with Status == "warn" (references: missingGitignoreEntries, the
workdir home check that appends a warn detail, and the final
out.Status/out.Message assignments) and promote the section status to "warn" so
warnings are reflected in DoctorReport.

---

Duplicate comments:
In `@tracker_diagnose.go`:
- Around line 106-110: collectNodeFailuresLib (and related readers used by
collectNodeFailuresLib) currently swallows filesystem and JSON parse errors and
returns an empty map, which hides incomplete diagnosis; modify
collectNodeFailuresLib to surface I/O and parse failures instead of treating
them as “no failures” — either change its signature to return
(map[string]*NodeFailure, error) or attach error details to the resulting
DiagnoseReport (e.g., an Errors/Incomplete field) so callers can detect
incomplete runs; update any callers (and the creation of DiagnoseReport) to
propagate and/or include those errors (reference symbols:
collectNodeFailuresLib, NodeFailure, DiagnoseReport) and ensure only truly
optional files continue to be ignored.

In `@tracker_doctor.go`:
- Around line 365-369: Replace direct exec.Command calls in getDippinVersion and
getBinaryVersion with exec.CommandContext using a context with a reasonable
timeout (e.g., context.WithTimeout) to avoid hanging if the external binary is
unresponsive; create ctx, defer cancel, call exec.CommandContext(ctx,
...).CombinedOutput(), and handle context deadline exceeded or other errors
appropriately when logging/returning the version from those functions.
- Around line 693-696: The checkDiskSpaceLib function uses syscall.Statfs
(Statfs_t) which is Unix-only and will fail Windows builds; split the
implementation by moving the current checkDiskSpaceLib into a file with a
non-Windows build tag (e.g., //go:build !windows) and create a Windows-specific
stub file (with //go:build windows) that defines checkDiskSpaceLib and returns
an appropriate CheckResult (e.g., "unsupported on windows" or a safe default) so
the symbol exists on all platforms and compilation succeeds; reference the
symbols checkDiskSpaceLib, syscall.Statfs, and Statfs_t when locating and
splitting the code.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1bf50865-4416-4313-8ec5-c8182d38442d

📥 Commits

Reviewing files that changed from the base of the PR and between 4257942 and fef3871.

📒 Files selected for processing (6)

CHANGELOG.md
cmd/tracker/flags.go
tracker_audit.go
tracker_diagnose.go
tracker_doctor.go
tracker_doctor_test.go

✅ Files skipped from review due to trivial changes (1)

tracker_audit.go

🚧 Files skipped from review as they are similar to previous changes (2)

cmd/tracker/flags.go
CHANGELOG.md

coderabbitai · 2026-04-17T18:33:27Z

+// PinnedDippinVersion is the dippin-lang version from go.mod.
+// Keep in sync with the require line in go.mod.
+const PinnedDippinVersion = "v0.18.0"
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Pinned constant:" rg -n 'PinnedDippinVersion' tracker_doctor.go echo echo "go.mod dippin-lang requires:" rg -n 'github.com/2389-research/dippin-lang' go.mod

Repository: 2389-research/tracker

Length of output: 790

Update PinnedDippinVersion constant to match go.mod requirement.

PinnedDippinVersion is hardcoded to v0.18.0 but go.mod requires v0.19.1. This causes Doctor to report false version mismatches even when dippin is correctly installed at the pinned version.

Update line 26 to const PinnedDippinVersion = "v0.19.1".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tracker_doctor.go` around lines 24 - 27, The PinnedDippinVersion constant is out of date; update the declaration of PinnedDippinVersion to match the go.mod requirement by changing its value to "v0.19.1" so Doctor's version check uses the correct pinned version (look for the const PinnedDippinVersion = ... in tracker_doctor.go).

coderabbitai · 2026-04-17T18:33:27Z

+		if !isValidAPIKey(p.name, key) {
+			out.Details = append(out.Details, CheckDetail{
+				Status:  "error",
+				Message: fmt.Sprintf("%-15s %s=%s (invalid format)", p.name, envName, masked),
+				Hint:    fmt.Sprintf("%s keys should match expected format — run `tracker setup`", p.name),
+			})
+			continue
+		}
+		if probe && p.buildAdapter != nil {
+			authOk, authMsg := probeProvider(p, key)
+			if !authOk {
+				out.Details = append(out.Details, CheckDetail{
+					Status:  "error",
+					Message: fmt.Sprintf("%-15s %s=%s (auth failed: %s)", p.name, envName, masked, authMsg),
+					Hint:    fmt.Sprintf("your %s key is invalid or expired — export a fresh key or run `tracker setup`", p.name),
+				})
+				continue


⚠️ Potential issue | 🟠 Major

Provider check can return overall ok while containing provider error details.

At Line 219/229, invalid/auth-failed providers are marked as error in details, but Line 275 forces section status to ok whenever at least one provider is configured. This under-reports health severity in DoctorReport summary counts.

Suggested fix

func checkProvidersLib(probe bool) CheckResult { out := CheckResult{Name: "LLM Providers"} var configuredNames []string var missingNames []string + hasProviderErrors := false @@ if !isValidAPIKey(p.name, key) { out.Details = append(out.Details, CheckDetail{ Status: "error", Message: fmt.Sprintf("%-15s %s=%s (invalid format)", p.name, envName, masked), Hint: fmt.Sprintf("%s keys should match expected format — run `tracker setup`", p.name), }) + hasProviderErrors = true continue } @@ if !authOk { out.Details = append(out.Details, CheckDetail{ Status: "error", Message: fmt.Sprintf("%-15s %s=%s (auth failed: %s)", p.name, envName, masked, authMsg), Hint: fmt.Sprintf("your %s key is invalid or expired — export a fresh key or run `tracker setup`", p.name), }) + hasProviderErrors = true continue } @@ - out.Status = "ok" + if hasProviderErrors { + out.Status = "warn" + } else { + out.Status = "ok" + }

Also applies to: 275-281

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tracker_doctor.go` around lines 217 - 233, The provider loop correctly appends CheckDetail entries with Status "error" for invalid/auth-failed keys (see isValidAPIKey, probeProvider, and out.Details), but later logic force-sets the section summary to "ok" when any provider is configured; update the summary computation (the code around where section/status is set after the loop) to scan out.Details (or the provider-specific slice) and set the section's Status to "error" if any CheckDetail has Status == "error" (otherwise "ok"), so the DoctorReport counts reflect the detail-level errors instead of always becoming "ok".

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tracker_audit.go (1)

124-131: ⚠️ Potential issue | 🟠 Major

Handle budget_exceeded as a failure terminal event.

At Line 126, classifyStatus only recognizes pipeline_completed and pipeline_failed. Runs halted by budget limits can still be terminal failures and are currently at risk of being reported as "success" via the fallback path.

🔧 Proposed fix

 func classifyStatus(cp *pipeline.Checkpoint, activity []ActivityEntry) string {
 	for i := len(activity) - 1; i >= 0; i-- {
 		switch activity[i].Type {
 		case "pipeline_completed":
 			return "success"
 		case "pipeline_failed":
+			return "fail"
+		case "budget_exceeded":
 			return "fail"
 		}
 	}
 	if cp.CurrentNode != "" {
 		return "fail"
 	}
 	return "success"
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tracker_audit.go` around lines 124 - 131, The classifyStatus function
currently only treats "pipeline_completed" as success and "pipeline_failed" as a
terminal failure; add handling for budget-exceeded terminal events by adding a
case for activity[i].Type == "budget_exceeded" that returns "fail" so runs
halted by budget limits are classified as failures (update the switch in
classifyStatus that iterates ActivityEntry entries to include "budget_exceeded"
alongside "pipeline_failed").

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tracker_audit.go`:
- Around line 220-223: In buildLibRunSummary replace the direct write to
os.Stderr with returning structured warning information: remove the
fmt.Fprintf(os.Stderr...) call where activity, lerr := LoadActivityLog(runDir)
is handled, keep activity = nil for continuing, and add a warnings/errs return
value (e.g., []error or []string) or embed a Warnings field on the returned
summary type so callers (CLI) can decide how to present the message; ensure
LoadActivityLog error (lerr) and contextual info (name, runDir) are packaged
into that returned warning instead of printing.

---

Duplicate comments:
In `@tracker_audit.go`:
- Around line 124-131: The classifyStatus function currently only treats
"pipeline_completed" as success and "pipeline_failed" as a terminal failure; add
handling for budget-exceeded terminal events by adding a case for
activity[i].Type == "budget_exceeded" that returns "fail" so runs halted by
budget limits are classified as failures (update the switch in classifyStatus
that iterates ActivityEntry entries to include "budget_exceeded" alongside
"pipeline_failed").

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6a20e61b-f0e0-4cfb-a587-65448d175159

📥 Commits

Reviewing files that changed from the base of the PR and between fef3871 and 268f95b.

📒 Files selected for processing (2)

cmd/tracker/diagnose.go
tracker_audit.go

🚧 Files skipped from review as they are similar to previous changes (1)

cmd/tracker/diagnose.go

coderabbitai · 2026-04-17T18:36:36Z

+	activity, lerr := LoadActivityLog(runDir)
+	if lerr != nil {
+		fmt.Fprintf(os.Stderr, "warning: run %s: cannot read activity log: %v\n", name, lerr)
+		activity = nil // continue with nil so the summary still builds


⚠️ Potential issue | 🟠 Major

Avoid writing to os.Stderr from library code paths.

At Line 222, buildLibRunSummary emits a warning directly to stderr. This introduces presentation side effects in a library API and undermines the data/presentation separation goal.

A better pattern is to return structured warning/error information up to the caller (CLI) and let the CLI decide whether/how to print.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tracker_audit.go` around lines 220 - 223, In buildLibRunSummary replace the direct write to os.Stderr with returning structured warning information: remove the fmt.Fprintf(os.Stderr...) call where activity, lerr := LoadActivityLog(runDir) is handled, keep activity = nil for continuing, and add a warnings/errs return value (e.g., []error or []string) or embed a Warnings field on the returned summary type so callers (CLI) can decide how to present the message; ensure LoadActivityLog error (lerr) and contextual info (name, runDir) are packaged into that returned warning instead of printing.

Copilot

Pull request overview

This PR moves key CLI-only features (diagnose/audit/doctor/simulate + NDJSON --json stream writer) into the public tracker library as JSON-serializable report APIs, and refactors the CLI commands into thin presentation layers over those library APIs to achieve Phase 1 (NDJSON) + Phase 2 parity from #76.

Changes:

Added public library APIs: Simulate, Diagnose/DiagnoseMostRecent, Audit, ListRuns, Doctor, and shared run/activity helpers (ResolveRunDir, MostRecentRunID, LoadActivityLog, etc.).
Promoted the CLI NDJSON writer into tracker.NewNDJSONWriter with handler factories for pipeline/agent/LLM trace events; deleted the old CLI-private implementation.
Updated CLI commands to call the library and print the same outputs; added tests, fixtures, and docs/changelog entries.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
tracker_simulate.go	New library `Simulate` API and report types (graph introspection + BFS plan).
tracker_simulate_test.go	Unit tests for `SimulateReport` fields (nodes/edges/plan/unreachable/attrs).
tracker_events.go	New public NDJSON wire type + `NDJSONWriter` with handler factories.
tracker_events_test.go	Tests for NDJSON writer correctness, stable tags, and concurrency safety.
tracker_activity.go	New shared helpers for run-dir resolution and activity.jsonl parsing/sorting.
tracker_activity_test.go	Tests for run-dir resolution and activity parsing/timestamp formats.
tracker_diagnose.go	New library `Diagnose` / `DiagnoseMostRecent` and structured failure/suggestion reporting.
tracker_diagnose_test.go	Fixture-driven tests for clean/failing/budget-halted run diagnosis.
tracker_audit.go	New library `Audit` + `ListRuns` APIs and structured audit/timeline summaries.
tracker_audit_test.go	Tests for audit status, timeline, durations, retries, and listing order.
tracker_doctor.go	New library `Doctor` API (env/provider/binary/workdir/artifact/disk/pipeline checks).
tracker_doctor_test.go	Tests for `Doctor` behavior (probe opt-in, provider detection, pipeline validation).
cmd/tracker/run.go	CLI JSON event mode now uses `tracker.NewNDJSONWriter` handlers.
cmd/tracker/json_stream.go	Removed CLI-private NDJSON stream writer (superseded by library).
cmd/tracker/simulate.go	CLI simulate now prints from `tracker.SimulateReport` (plus legacy validation warnings).
cmd/tracker/diagnose.go	CLI diagnose now calls `tracker.Diagnose*` and prints the report.
cmd/tracker/audit.go	CLI audit/list now calls `tracker.Audit` / `tracker.ListRuns` and prints the report.
cmd/tracker/doctor.go	CLI doctor now calls `tracker.Doctor`, prints results, and applies CLI-only fixups.
cmd/tracker/doctor_test.go	Reduced scope to CLI shim behavior (flags/exit codes/printing/fixups).
cmd/tracker/flags.go	Doctor `--probe` default changed to true; usage updated.
testdata/runs/ok/checkpoint.json	New run fixture for successful run.
testdata/runs/ok/activity.jsonl	New run fixture activity log for successful run.
testdata/runs/failed/checkpoint.json	New run fixture for failed run with retries.
testdata/runs/failed/activity.jsonl	New run fixture activity log for failed run.
testdata/runs/failed/Build/status.json	New per-node status fixture for failed node.
testdata/runs/budget_halted/checkpoint.json	New run fixture for budget-halted run.
testdata/runs/budget_halted/activity.jsonl	New run fixture activity log for budget halt.
docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md	New design/spec doc describing the parity phases and APIs.
README.md	Added library usage examples for diagnosing runs and streaming NDJSON.
CHANGELOG.md	Documented newly added library APIs and CLI refactor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T18:37:04Z

+	if err != nil {
+		return "", err
+	}
+	return filepath.Join(runsDir, matched), nil


ResolveRunDir's docstring says it returns an absolute path, but the implementation just filepath.Join's workdir/.tracker/runs/... and will return a relative path when workdir is relative (e.g. "."). Either call filepath.Abs on the result (and return that), or update the docstring to match actual behavior.

Suggested change

return filepath.Join(runsDir, matched), nil

runDir := filepath.Join(runsDir, matched)

absRunDir, err := filepath.Abs(runDir)

if err != nil {

return "", fmt.Errorf("cannot resolve absolute run directory path: %w", err)

}

return absRunDir, nil

Copilot · 2026-04-17T18:37:05Z

+	for _, node := range graph.Nodes {
+		if !visited[node.ID] {
+			ordered = append(ordered, node)
+		}
+	}


simBFSNodeOrder appends orphan nodes by ranging over graph.Nodes (a map), which makes the resulting SimulateReport.Nodes ordering nondeterministic when there are multiple unreachable/orphan nodes. For a stable library API (and stable CLI output), consider appending orphans in a deterministic order (e.g. by graph.NodeOrder when present, or by sorting orphan IDs).

Suggested change

for _, node := range graph.Nodes {

if !visited[node.ID] {

ordered = append(ordered, node)

}

}

var orphanIDs []string

for id := range graph.Nodes {

if !visited[id] {

orphanIDs = append(orphanIDs, id)

}

}

sort.Strings(orphanIDs)

for _, id := range orphanIDs {

ordered = append(ordered, graph.Nodes[id])

}

Copilot · 2026-04-17T18:37:05Z

+	path := filepath.Join(runDir, "activity.jsonl")
+	data, err := os.ReadFile(path)
+	if err != nil {
+		return nil
+	}


Diagnose reads the entire activity.jsonl into memory (os.ReadFile) and then strings.Split's it, which can be expensive / high-memory for large runs. Consider streaming the file (bufio.Scanner) similar to LoadActivityLog, so library consumers can safely diagnose very large activity logs.

Copilot · 2026-04-17T18:37:05Z

+	if !strings.HasSuffix(pipelineFile, ".dip") {
+		out.Details = append(out.Details, CheckDetail{
+			Status:  "warn",
+			Message: fmt.Sprintf("%s is not a .dip file — may not be a valid pipeline", pipelineFile),
+		})


checkPipelineFileLib warns for any non-.dip file, but DoctorConfig/PipelineFile is described as supporting both .dip and .dot. As written, passing a valid .dot will always produce a warning detail ("not a .dip file"). Consider only warning when the extension is neither .dip nor .dot, or adjust the message to reflect that .dot is supported (even if deprecated).

Copilot · 2026-04-17T18:37:05Z

+	report, err := tracker.Simulate(source)
+	if err != nil {
+		return err


runSimulateCmd respects formatOverride when parsing via loadPipeline, but the library report path always calls tracker.Simulate(source) which auto-detects format from content and ignores formatOverride. This can lead to mismatched validation vs printed report (and potentially different behavior) when --format is used. Consider generating the SimulateReport from the already-parsed graph, or adding a library API that accepts an explicit format.

Copilot · 2026-04-17T18:37:06Z

+	// Write-side-effect fix-ups must not run inside the library. The
+	// Working Directory check emits a gitignore hint in its details;
+	// patch the file here now that the user has seen the warning.
+	maybeFixGitignore(report, workdir)


maybeFixGitignore/checkGitignore will now modify an existing .gitignore automatically whenever the library emits a missing-entry warning. This is a potentially surprising write side effect (and the Phase 2 design doc describes this as a prompt-driven fix-up). Consider gating this behind an explicit user confirmation / flag, or update the docs to reflect the new non-interactive behavior.

Copilot · 2026-04-17T18:37:06Z

+```go
+type AuditReport struct {
+    RunID           string          `json:"run_id"`
+    Status          string          `json:"status"`          // "completed" | "in_progress" | "failed"


The design doc's AuditReport.Status comment lists values "completed" | "in_progress" | "failed", but the implementation (tracker_audit.go) uses "success" and "fail" (and ListRuns/Audit tests assert those). Please update this doc snippet to match the actual status strings so library consumers aren't misled.

Suggested change

Status string `json:"status"` // "completed" | "in_progress" | "failed"

Status string `json:"status"` // "success" | "in_progress" | "fail"

Copilot · 2026-04-17T18:37:06Z

+- `tracker.Diagnose(runDir)` / `tracker.DiagnoseMostRecent(workDir)` — structured `*DiagnoseReport` with node failures, budget halt, and typed suggestions (`Kind: "retry_pattern" | "escalate_limit" | "no_output" | "shell_command" | "go_test" | "suspicious_timing" | "budget"`).
+- `tracker.Audit(runDir)` — structured `*AuditReport` with timeline, retries, errors, and recommendations.
+- `tracker.ListRuns(workDir)` — sorted `[]RunSummary` for enumerating past runs (newest first).
+- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has five values: `"ok"`, `"warn"`, `"error"`, `"skip"`, and `"hint"` (informational sub-items such as optional providers not configured).


CHANGELOG entry says CheckDetail.Status supports "skip", but the code only uses/defines detail statuses like ok/warn/error/hint (and there are no "skip" detail emitters). Either remove "skip" from the CheckDetail.Status list in the changelog or implement/define what a skipped detail line should mean.

Suggested change

- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has five values: `"ok"`, `"warn"`, `"error"`, `"skip"`, and `"hint"` (informational sub-items such as optional providers not configured).

- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has four values: `"ok"`, `"warn"`, `"error"`, and `"hint"` (informational sub-items such as optional providers not configured).

- go.mod: upgrade github.com/2389-research/dippin-lang v0.19.1 → v0.20.0 - CI: install dippin CLI v0.20.0 (was v0.10.0 — 10 minor versions stale) - examples/human_gate_test_suite.dip: rename "default_choice:" to "default:" (the real IR field name on HumanConfig). - Makefile lint: skip human_gate_test_suite.dip with a note — the file exercises "timeout:" and "timeout_action:" on human nodes, which are tracker-level attrs but are not yet in the dippin-lang HumanConfig IR. v0.10.0 accepted unknown fields silently; v0.20.0 rejects them. See follow-up upstream issue. All local checks pass: fmt-check, vet, build, test-short, test-race, coverage, lint, doctor (ask_and_execute / build_product / build_product_with_superspec all A/100). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

Makefile (1)
136-140: Make the lint skip explicitly temporary to avoid a long-term CI blind spot.

Line 136 introduces a hard skip in a target that CI executes. Consider adding an expiry mechanism (e.g., version gate or tracked TODO/issue reference) so this file is automatically brought back under lint once dippin-lang IR supports timeout_action.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 136 - 140, The Makefile currently hard-skips
examples/human_gate_test_suite.dip inside the case "$$f" block which creates a
permanent CI blind spot; change the skip so it is explicitly temporary by gating
it on a clearly named env var or feature flag (e.g., LINT_ALLOW_HUMAN_TIMEOUT)
or by adding a TODO/issue reference and an expiry/version gate checked by CI.
Concretely, replace the unconditional skip for
examples/human_gate_test_suite.dip in the case "$$f" branch with a conditional
that only skips when the temp flag is set (or until a specified version/date),
and update the echo message to include the TODO/issue identifier so
CI/maintainers can track and remove the skip when dippin-lang IR supports
timeout_action.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Makefile`:
- Around line 136-140: The Makefile currently hard-skips
examples/human_gate_test_suite.dip inside the case "$$f" block which creates a
permanent CI blind spot; change the skip so it is explicitly temporary by gating
it on a clearly named env var or feature flag (e.g., LINT_ALLOW_HUMAN_TIMEOUT)
or by adding a TODO/issue reference and an expiry/version gate checked by CI.
Concretely, replace the unconditional skip for
examples/human_gate_test_suite.dip in the case "$$f" branch with a conditional
that only skips when the temp flag is set (or until a specified version/date),
and update the echo message to include the TODO/issue identifier so
CI/maintainers can track and remove the skip when dippin-lang IR supports
timeout_action.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 28c850cd-97f0-43b1-9ea1-4dc8a3390c99

📥 Commits

Reviewing files that changed from the base of the PR and between 268f95b and d178f88.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (4)

.github/workflows/ci.yml
Makefile
examples/human_gate_test_suite.dip
go.mod

✅ Files skipped from review due to trivial changes (2)

.github/workflows/ci.yml
go.mod

Applies fixes from CodeRabbit / Copilot / Codex review comments: - PinnedDippinVersion updated to v0.20.0 (was v0.18.0, mismatched go.mod) - checkDiskSpaceLib split to tracker_doctor_unix.go / _windows.go — syscall.Statfs is Unix-only and breaks Windows builds - classifyStatus now returns "fail" for budget_exceeded events (budget-halted runs were mis-classified as success) - checkProvidersLib propagates individual provider error details to section Status (was always "ok" when any provider configured) - checkWorkdirLib propagates warn details to section-level Status - getDippinVersion uses exec.CommandContext with 5s timeout (noctx lint fix) - NDJSONWriter.AgentHandler preserves agent.Event.Timestamp; falls back to time.Now() only when zero - simBFSNodeOrder collects orphan nodes into a sorted slice before appending (deterministic output) - ResolveRunDir calls filepath.Abs to satisfy its "returns absolute path" contract - MostRecentRunID no longer writes to os.Stderr; invalid checkpoints silently skipped - checkPipelineFileLib no longer warns on .dot files (both .dip and .dot are valid) - Fixed ineffectual suffix := "\n" assignment in cmd/tracker/doctor.go maybeFixGitignore - enrichFromEntryNF/updateFailureTimingNF guard against zero start timestamps in duration math - CHANGELOG: added ### Fixed section; removed "skip" from CheckDetail.Status values (not a real status) Skipped (noted in PR summary or tracked separately): - Copilot: stream activity.jsonl via bufio.Scanner in Diagnose (non-trivial refactor, out of scope) - Copilot: gate maybeFixGitignore behind user confirmation (behavior change, needs design decision) - CodeRabbit: buildLibRunSummary stderr write — minor, would require API surface change - CodeRabbit tracker_diagnose.go:135 — collectNodeFailuresLib/loadNodeFailureLib error propagation (larger refactor, separate PR) - CodeRabbit tracker_diagnose.go:101 — DiagnoseMostRecent stderr fixed by silencing MostRecentRunID instead - Codex cmd/tracker/simulate.go:61 — format-aware Simulate API (new feature, out of scope) - CodeRabbit tracker_doctor.go:674 — artifact directory errors as "warn" vs "error" (intentional severity) - Doc-only comments on spec/plan markdown files (not code, low value) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

tracker_doctor.go (1)

529-539: Use exec.CommandContext for consistency with other version checks.

Static analysis flagged this exec.Command call. While claude --version should be quick, using CommandContext with a timeout would prevent potential hangs (consistent with how getDippinVersion was fixed).

♻️ Proposed fix

 func getBinaryVersion(path, flag string) string {
-	out, err := exec.Command(path, flag).CombinedOutput()
+	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer cancel()
+	out, err := exec.CommandContext(ctx, path, flag).CombinedOutput()
 	if err != nil {
 		return "(version unknown)"
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 529 - 539, The getBinaryVersion function
currently uses exec.Command which can hang; change it to use exec.CommandContext
with a short context timeout (as done in getDippinVersion): create a context
with context.WithTimeout, defer cancel, build the command via
exec.CommandContext(ctx, path, flag), run CombinedOutput on the cmd, and
preserve the existing error handling and trimming logic (return "(version
unknown)" on error or empty output). Ensure you reference the function name
getBinaryVersion and reuse the same timeout duration pattern used by
getDippinVersion.

tracker_diagnose.go (1)

284-288: Budget suggestion message couples library to CLI interface.

The message references CLI flags (--max-tokens, --max-cost, --max-wall-time) alongside the library-level Config.Budget. Since this is a library API, consider making the message more generic or providing the raw data (limits vs. actual) so CLI and other consumers can format their own actionable message.

💡 Suggested alternative

 	if halt != nil {
 		out = append(out, Suggestion{
 			Kind:    SuggestionBudget,
-			Message: "Raise the relevant --max-tokens, --max-cost, or --max-wall-time flag, or remove the Config.Budget value",
+			Message: "Budget limit reached — increase token, cost, or wall-time limits in your configuration",
 		})
 	}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tracker_diagnose.go` around lines 284 - 288, The suggestion message in the
Suggestion struct (where out = append(out, Suggestion{ Kind: SuggestionBudget,
... }) currently hardcodes CLI flags and couples library code to the CLI; change
this to a library-appropriate message (e.g., "Increase configured budget limits
or remove Config.Budget") or instead populate the Suggestion with structured
data (e.g., include the budget limits and actual usage values alongside Kind:
SuggestionBudget) so CLI/consumers can render actionable flag-specific text;
modify the creation of this Suggestion (SuggestionBudget/Message) to either use
a generic, non-CLI string or add fields on the Suggestion payload with the raw
limit vs actual values for downstream formatting.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tracker_diagnose.go`:
- Around line 284-288: The suggestion message in the Suggestion struct (where
out = append(out, Suggestion{ Kind: SuggestionBudget, ... }) currently hardcodes
CLI flags and couples library code to the CLI; change this to a
library-appropriate message (e.g., "Increase configured budget limits or remove
Config.Budget") or instead populate the Suggestion with structured data (e.g.,
include the budget limits and actual usage values alongside Kind:
SuggestionBudget) so CLI/consumers can render actionable flag-specific text;
modify the creation of this Suggestion (SuggestionBudget/Message) to either use
a generic, non-CLI string or add fields on the Suggestion payload with the raw
limit vs actual values for downstream formatting.

In `@tracker_doctor.go`:
- Around line 529-539: The getBinaryVersion function currently uses exec.Command
which can hang; change it to use exec.CommandContext with a short context
timeout (as done in getDippinVersion): create a context with
context.WithTimeout, defer cancel, build the command via
exec.CommandContext(ctx, path, flag), run CombinedOutput on the cmd, and
preserve the existing error handling and trimming logic (return "(version
unknown)" on error or empty output). Ensure you reference the function name
getBinaryVersion and reuse the same timeout duration pattern used by
getDippinVersion.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8119966b-f0fb-457a-acd3-8523c7de80bd

📥 Commits

Reviewing files that changed from the base of the PR and between d178f88 and 9e0e6ab.

📒 Files selected for processing (10)

CHANGELOG.md
cmd/tracker/doctor.go
tracker_activity.go
tracker_audit.go
tracker_diagnose.go
tracker_doctor.go
tracker_doctor_unix.go
tracker_doctor_windows.go
tracker_events.go
tracker_simulate.go

🚧 Files skipped from review as they are similar to previous changes (3)

tracker_simulate.go
tracker_events.go
tracker_activity.go

Copilot

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T19:09:17Z

+		Message: fmt.Sprintf("%s valid (%d nodes, %d edges)", pipelineFile, len(graph.Nodes), len(graph.Edges)),
+	})
+	out.Status = "ok"
+	out.Message = fmt.Sprintf("%s is valid", pipelineFile)
+	return out


checkPipelineFileLib can append a warn-level detail for non-.dip/.dot files, but still sets section Status="ok" and Message=" is valid" when parsing/validation succeeds. This makes DoctorReport.Warnings/OK inconsistent with the per-detail warning. Consider promoting the section Status to "warn" (and adjusting Message) whenever any warn-level detail is present and there are no errors.

Copilot · 2026-04-17T19:09:17Z

+func buildExecutionPlan(graph *pipeline.Graph) ([]PlanStep, []string) {
+	if graph.StartNode == "" {
+		return nil, nil
+	}


buildExecutionPlan returns (nil, nil) when graph.StartNode is empty. Because SimulateReport.ExecutionPlan is not omitempty, this will JSON-marshal as null rather than an empty array, which is awkward for consumers and inconsistent with other report slices. Consider returning empty slices instead (e.g., []PlanStep{} / []string{}) so execution_plan is always an array.

Copilot · 2026-04-17T19:09:17Z

+func (s *NDJSONWriter) Write(evt NDJSONEvent) {
+	data, err := json.Marshal(evt)
+	if err != nil {
+		fmt.Fprintf(os.Stderr, "tracker: marshal NDJSON event: %v\n", err)
+		return
+	}
+	data = append(data, '\n')
+	s.mu.Lock()
+	defer s.mu.Unlock()
+	if _, werr := s.w.Write(data); werr != nil {
+		s.errOnce.Do(func() {
+			fmt.Fprintf(os.Stderr, "tracker: NDJSON stream write error: %v (further write errors suppressed)\n", werr)
+		})
+	}


NDJSONWriter.Write writes marshal/write errors directly to os.Stderr. Since this is a public library type, emitting to process-global stderr is a surprising side effect and hard for consumers to test/control. Consider returning an error from Write (or exposing an ErrorHandler/logger option on NDJSONWriter) so callers decide how to surface failures.

Copilot · 2026-04-17T19:09:18Z

+	activity, lerr := LoadActivityLog(runDir)
+	if lerr != nil {
+		fmt.Fprintf(os.Stderr, "warning: run %s: cannot read activity log: %v\n", name, lerr)
+		activity = nil // continue with nil so the summary still builds
+	}


buildLibRunSummary prints a warning directly to os.Stderr when LoadActivityLog fails. For a library API (ListRuns), this creates an unexpected global side effect; callers may want to handle/log these errors themselves. Consider returning the error (or collecting per-run warnings in the RunSummary/AuditReport) instead of writing to stderr.

Copilot · 2026-04-17T19:09:18Z

+		fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err)
+		return nil


loadNodeFailureLib logs JSON unmarshal failures to os.Stderr and then silently drops the node. For a library API intended to cleanly separate data/presentation, this stderr output is an unexpected side effect and makes failures hard to surface programmatically. Consider returning the parse error up the stack (or collecting it into the DiagnoseReport as a warning) rather than printing.

Suggested change

fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err)

return nil

return &NodeFailure{

NodeID: nodeID,

Outcome: "status_parse_error",

Stderr: fmt.Sprintf("cannot parse %s: %v", statusPath, err),

}

Copilot · 2026-04-17T19:09:18Z

+type AuditReport struct {
+    RunID           string          `json:"run_id"`
+    Status          string          `json:"status"`          // "completed" | "in_progress" | "failed"
+    TotalDuration   time.Duration   `json:"total_duration_ns"`


The spec doc's AuditReport API shape is out of sync with the implementation: it documents Status values as "completed"|"in_progress"|"failed" (and a different RunSummary shape), but tracker_audit.go currently uses Status values like "success"/"fail" and RunSummary has different fields. Please update this doc section to match the actual exported types so library consumers aren't misled.

- checkPipelineFileLib promotes section status to warn when any detail is warn (matches pattern already applied to checkProviders/Workdir). - buildExecutionPlan returns empty slices instead of nil so SimulateReport.ExecutionPlan JSON-marshals as [] not null. - Spec doc AuditReport section updated to match shipped types: Status is "success"/"fail", RunSummary has the implemented fields. Three stderr-side-effect comments deferred to #104 (coordinated API decision; add Logger io.Writer to configs or return errors up-stack). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Batches six related pre-1.0 fixes flagged in the expert panel review of PR #101 (#102, #103, #104, #105, #106, #109). All changes touch the v0.18.0 library surface and are breaking, so ship together: - Thread context.Context through Doctor, Diagnose, DiagnoseMostRecent, Audit, and Simulate. Provider probes and binary version lookups honor ctx; getBinaryVersion now uses exec.CommandContext with a 5-second timeout to match getDippinVersion. - Introduce typed CheckStatus and SuggestionKind so consumers can switch-exhaust. Constant values are unchanged. - NDJSONWriter.Write now returns error. First write failure is still logged once to stderr; subsequent failures surface via return value. - Rename NDJSONEvent -> StreamEvent (wire format unchanged). - Move DoctorConfig.TrackerVersion/TrackerCommit off the config struct behind a WithVersionInfo functional option. - Add LogWriter to DoctorConfig/DiagnoseConfig/AuditConfig. CLI sets it to io.Discard so users stop seeing stray library warnings from audit/list/diagnose on os.Stderr. - sanitizeProviderError strips API keys and bearer tokens from provider error bodies before they land in CheckDetail.Message. - Panic-recover the NDJSON handler closures (pipeline, agent, LLM trace) so a misbehaving writer cannot crash the caller goroutine. - Stream activity.jsonl in Diagnose via bufio.Scanner (1 MB buffer limit), matching LoadActivityLog and avoiding memory spikes on large runs. - Rename internal *Lib/*NF/*ForDiagnose helpers to plain names. - CHANGELOG notes deterministic diagnose suggestion ordering (already alphabetical post-#101, documented as a visible change). Closes #102, #103, #104, #105, #106, #109. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

clintecker and others added 15 commits April 17, 2026 11:02

docs: CHANGELOG + README for Phase 2 library APIs

4257942

Refs #76 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Apr 17, 2026

View reviewed changes

coderabbitai bot requested changes Apr 17, 2026

View reviewed changes

clintecker requested a review from Copilot April 17, 2026 18:31

style: gofmt cmd/tracker/diagnose.go and tracker_audit.go

268f95b

Fix CI format check — realign struct field tags and trim a trailing blank line that slipped in with the Phase 2 work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot started reviewing on behalf of clintecker April 17, 2026 18:31 View session

coderabbitai bot requested changes Apr 17, 2026

View reviewed changes

Copilot AI reviewed Apr 17, 2026

View reviewed changes

clintecker mentioned this pull request Apr 17, 2026

upstream(dippin-lang): HumanConfig IR missing timeout/timeout_action fields #112

Closed

coderabbitai bot reviewed Apr 17, 2026

View reviewed changes

clintecker mentioned this pull request Apr 17, 2026

HumanConfig IR: add Timeout and TimeoutAction fields 2389-research/dippin-lang#18

Closed

coderabbitai bot reviewed Apr 17, 2026

View reviewed changes

clintecker requested a review from Copilot April 17, 2026 19:03

Copilot started reviewing on behalf of clintecker April 17, 2026 19:04 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

clintecker merged commit 8e23397 into main Apr 17, 2026
2 checks passed

clintecker mentioned this pull request Apr 17, 2026

refactor(tracker): library API hardening for v1.0 #113

Merged

8 tasks

		Expected: build succeeds, all 14 packages pass. If any package is red before you start, stop — resolve that first.

-	return filepath.Join(runsDir, matched), nil
+	runDir := filepath.Join(runsDir, matched)
+	absRunDir, err := filepath.Abs(runDir)
+	if err != nil {
+		return "", fmt.Errorf("cannot resolve absolute run directory path: %w", err)
+	}
+	return absRunDir, nil

-	for _, node := range graph.Nodes {
-		if !visited[node.ID] {
-			ordered = append(ordered, node)
-		}
-	}
+	var orphanIDs []string
+	for id := range graph.Nodes {
+		if !visited[id] {
+			orphanIDs = append(orphanIDs, id)
+		}
+	}
+	sort.Strings(orphanIDs)
+	for _, id := range orphanIDs {
+		ordered = append(ordered, graph.Nodes[id])
+	}

	Status string `json:"status"` // "completed" \| "in_progress" \| "failed"
	Status string `json:"status"` // "success" \| "in_progress" \| "fail"

	- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has five values: `"ok"`, `"warn"`, `"error"`, `"skip"`, and `"hint"` (informational sub-items such as optional providers not configured).
	- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has four values: `"ok"`, `"warn"`, `"error"`, and `"hint"` (informational sub-items such as optional providers not configured).

		fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err)
		return nil

-		fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err)
-		return nil
+		return &NodeFailure{
+			NodeID:  nodeID,
+			Outcome: "status_parse_error",
+			Stderr:  fmt.Sprintf("cannot parse %s: %v", statusPath, err),
+		}

Conversation

clintecker commented Apr 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 17, 2026

clintecker commented Apr 17, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 17, 2026 •

edited

Loading