Skip to content

feat: CLI-library feature parity (Phase 1 NDJSON + Phase 2)#101

Merged
clintecker merged 20 commits intomainfrom
feat/library-parity-phase-2
Apr 17, 2026
Merged

feat: CLI-library feature parity (Phase 1 NDJSON + Phase 2)#101
clintecker merged 20 commits intomainfrom
feat/library-parity-phase-2

Conversation

@clintecker
Copy link
Copy Markdown
Collaborator

@clintecker clintecker commented Apr 17, 2026

Summary

  • Promotes four CLI-private commands (diagnose, audit, doctor, simulate) into the public tracker library package as JSON-serializable reports.
  • Lifts the private NDJSON event writer out of cmd/tracker/json_stream.go into tracker.NewNDJSONWriter, closing Phase 1 of feat: achieve CLI↔library feature parity #76.
  • Splits data / presentation cleanly — library returns pure *<Command>Report; CLI prints byte-for-byte identical output.

Closes Phase 1 (NDJSON) and Phase 2 of #76. Phase 3 (DescribeNodes) remains as a follow-up.

Test plan

  • go build ./... clean
  • go test ./... -short -race all 17 packages pass
  • dippin doctor A grade on the three core example pipelines
  • Smoke-diff: tracker doctor output byte-identical before / after (verified in 4 scenarios during Task 6)
  • Smoke-diff: tracker simulate output byte-identical (verified in Task 3)
  • Library consumer smoke: can call tracker.Diagnose(runDir) and get a non-empty DiagnoseReport

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Public tracker library exposing structured, JSON-serializable reports for diagnose, audit, doctor and simulate, plus an NDJSON event writer and handler factories for streaming events.
  • Documentation

    • README examples showing library usage and NDJSON streaming; new design/plan docs and changelog entry.
  • Refactor

    • CLI commands simplified to call the library while preserving human and --json output byte-for-byte; doctor probing now enabled by default.
  • Tests

    • Added unit tests and run fixtures covering diagnostics, auditing, simulation, doctor checks, and NDJSON output.

clintecker and others added 15 commits April 17, 2026 11:02
Design for promoting the diagnose, audit, doctor, and simulate CLI
commands into the top-level tracker package as structured reports, plus
lifting the private NDJSON event writer to a public API. Closes the
Phase 1 NDJSON gap and the full Phase 2 set from #76.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven-task plan covering shared activity helpers, NDJSON writer
promotion, Simulate / Diagnose / Audit / Doctor library APIs, and
docs. TDD steps with fixture run directories; byte-for-byte CLI
output preserved via smoke-diff at each task boundary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move ResolveRunDir, MostRecentRunID, ActivityEntry, LoadActivityLog,
ParseActivityLine into the tracker package. Prerequisite for exposing
Diagnose / Audit library APIs. No behavior change; CLI wired to the
library helpers.

Refs #76

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Log warning instead of silent skip on non-ENOENT checkpoint load errors
  in MostRecentRunID (CLAUDE.md: never silently swallow errors).
- Remove misleading JSON tags from ActivityEntry — the struct is populated
  via ParseActivityLine, not json.Unmarshal (two timestamp formats can't
  be handled by time.Time's default unmarshal).
- Drop gratuitous type-alias comment in cmd/tracker/audit.go.
- Add direct tests for SortActivityByTime ordering and equal timestamps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NewNDJSONWriter exposes the same --json wire format the CLI uses via
PipelineHandler / AgentHandler / TraceObserver. Closes the Phase 1
NDJSON gap from #76. CLI json_stream.go removed; run.go wires the
library writer. Wire format unchanged byte-for-byte.

Refs #76

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Log marshal errors to stderr (CLAUDE.md: never silently swallow).
- Log first w.Write error to stderr via sync.Once; further write
  errors suppressed to avoid flooding.
- Add test covering AgentHandler content priority (ToolOutput wins
  over Text when both are populated).
- Extract duplicated timestamp layout to a package-level constant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Simulate(source) returns a SimulateReport with nodes, edges,
execution plan, and unreachable node list. CLI simulate command becomes
a printer over the report; output byte-identical.

Refs #76

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- SimulateReport.Unreachable is now sorted alphabetically; eliminates
  non-deterministic output driven by Go map iteration order.
- SimulateReport gains GraphAttrs; CLI reads from the report instead of
  re-parsing the pipeline just for graph-level attrs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Diagnose(runDir) and tracker.DiagnoseMostRecent(workdir) return
a structured DiagnoseReport with failures, budget halt, and typed
suggestions. CLI diagnose command shrunk to a printer over the report;
output byte-identical.

Fixture runs added under testdata/runs/ (ok, failed, budget_halted) for
library unit tests.

Refs #76

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- CLI: skip SuggestionBudget in the suggestions printer; printBudgetHalt
  already renders the budget guidance. Eliminates duplicated prose on
  budget-halted runs.
- Library: log non-ENOENT status.json parse errors to stderr instead of
  swallowing silently (CLAUDE.md: never silently swallow errors).
- Clarify NodeFailure.RetryCount doc comment — it counts failures, not
  retries beyond the first attempt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Audit(runDir) returns a structured AuditReport (timeline,
retries, errors, recommendations). tracker.ListRuns(workdir) returns
[]RunSummary sorted newest first. CLI audit command shrunk to a printer
over the reports; output byte-identical.

Refs #76

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Log non-ENOENT activity-log load errors in the run summary builder
  instead of silently discarding (CLAUDE.md: never silently swallow).
- Remove redundant sort.Strings from printRecommendations; library
  already sorts the recommendations slice.
- ListRuns now uses sort.SliceStable for deterministic ordering on
  timestamp ties.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
tracker.Doctor(cfg) runs preflight checks and returns a DoctorReport.
ProbeProviders defaults to false (no network calls); CLI sets it true
to preserve today's behavior. Write side effects (gitignore fixup,
workdir create) stay CLI-only.

Refs #76

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- CLI --probe flag default flipped to true, preserving pre-refactor
  tracker doctor behavior (live API auth check runs by default).
- TestDoctor_NoProviderKeys now unsets all five known provider env
  vars to avoid false negatives when the dev environment has
  GOOGLE_API_KEY or OPENAI_COMPAT_API_KEY set.
- Artifact-dirs check now promotes Status to "error" when any detail
  is "error" (e.g., unwritable .ai/ directory); previously those cases
  were silently demoted to "warn" and the run proceeded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #76

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Library-first refactor: promotes CLI logic into a public tracker package (Diagnose/Audit/Doctor/Simulate reports, run/activity utilities, NDJSON writer/handlers), updates CLI commands to delegate to these APIs, removes the private JSON stream, adds tests and fixtures, and updates docs/README/CHANGELOG and CI tooling.

Changes

Cohort / File(s) Summary
Library Report APIs
tracker_diagnose.go, tracker_diagnose_test.go, tracker_audit.go, tracker_audit_test.go, tracker_doctor.go, tracker_doctor_test.go, tracker_simulate.go, tracker_simulate_test.go
Add exported, JSON-serializable report types and entrypoints: Diagnose/DiagnoseMostRecent, Audit, Doctor (with config/report types), Simulate, plus suggestion/check models and tests.
Activity & Run Utilities
tracker_activity.go, tracker_activity_test.go
Add run-resolution helpers (ResolveRunDir, MostRecentRunID) and activity JSONL parsing/loading/sorting (ActivityEntry, LoadActivityLog, ParseActivityLine, SortActivityByTime) with unit tests.
NDJSON Event Writer
tracker_events.go, tracker_events_test.go
Introduce public NDJSONWriter (NewNDJSONWriter) and handler factories (PipelineHandler, TraceObserver, AgentHandler) for thread-safe NDJSON streaming and adapters; tests ported to package tracker.
CLI: diagnose & audit refactors
cmd/tracker/diagnose.go, cmd/tracker/audit.go
Replace prior in-file parsing/diagnosis/audit logic with calls to tracker.Diagnose, tracker.Audit, tracker.ListRuns, and adapt printing to consume library report types (large deletions of local helpers).
CLI: doctor, flags & tests
cmd/tracker/doctor.go, cmd/tracker/doctor_test.go, cmd/tracker/flags.go
Delegate preflight checks to tracker.Doctor, change --probe default to true, adapt output/gitignore handling, and refocus CLI tests on shim behavior and side-effects.
CLI: run & simulate
cmd/tracker/run.go, cmd/tracker/simulate.go
Wire tracker.NewNDJSONWriter in place of removed private stream; call tracker.Simulate and render via SimulateReport types (remove local BFS helpers).
Removed Private Impl
cmd/tracker/json_stream.go
Delete prior internal JSON stream implementation (replaced by public tracker_events.go).
Docs & Examples
CHANGELOG.md, README.md, docs/superpowers/...
Add changelog/readme examples and design/plan/spec docs describing CLI↔library parity, NDJSON writer, and migration steps.
Test fixtures
testdata/runs/ok/*, testdata/runs/failed/*, testdata/runs/budget_halted/*
Add checkpoint and activity.jsonl fixtures for success, failure, and budget-halt scenarios used by new tests.
CI / Tooling / Make / examples
.github/workflows/ci.yml, Makefile, go.mod, examples/human_gate_test_suite.dip
Bump dippin in CI, skip one example in lint loop, update module deps, and rename two example attributes for human gates.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CLI as User/CLI
    participant Pipeline as Pipeline
    participant LLM as LLM
    participant Agent as Agent
    participant Writer as tracker.NDJSONWriter
    participant Stdout as Stdout

    CLI->>Pipeline: start run / invoke tracker.Run
    Pipeline-->>Writer: emit pipeline events (stage_started/completed/failed)
    alt LLM traces generated
        LLM-->>Writer: emit trace events (provider, model, preview)
    end
    alt Agent activities occur
        Agent-->>Writer: emit agent events (tool output or text)
    end
    Writer->>Stdout: marshal NDJSONEvent -> write newline (thread-safe)
    Stdout-->>CLI: stream of NDJSON lines (--json)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 I hop through logs and code anew,

Reports and streams now public, too.
NDJSON lines that safely write,
Runs diagnosed by morning light.
Tests pass — carrots for the crew!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is specific and accurately describes the main change: promoting CLI commands into the library as part of a feature parity effort across two phases.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/library-parity-phase-2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42579425b4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cmd/tracker/simulate.go
source = string(data)
}

report, err := tracker.Simulate(source)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve requested pipeline format in simulate report parsing

runSimulateCmd already parsed the pipeline with loadPipeline(resolved, formatOverride), but this second parse calls tracker.Simulate(source) with no format context. That drops --format/extension-based resolution and can misclassify valid DOT input (for example DOT files with a leading comment/BOM), causing tracker simulate to fail after the first parse succeeded. This is a regression from the previous single-parse flow and breaks valid simulate inputs.

Useful? React with 👍 / 👎.

Comment thread tracker_doctor.go
Comment on lines +672 to +674
if d.Status == "error" {
out.Status = "error"
break
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep artifact-directory problems as doctor warnings

This promotion turns artifact-directory permission problems into Status="error", which now counts as a hard failure in cmd/tracker/doctor.go (toDoctorResult maps report errors directly to failures). Previously this check was non-required and surfaced as a warning, so tracker doctor could exit with warning-only status; after this change the same .ai issues now return exit code 1 and can fail CI/scripts unexpectedly.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cmd/tracker/diagnose.go (1)

11-192: ⚠️ Potential issue | 🟡 Minor

Run gofmt on this file before merge.

CI is currently red on cmd/tracker/diagnose.go, so this change set does not pass the repository’s format gate yet.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/diagnose.go` around lines 11 - 192, The file
cmd/tracker/diagnose.go is not gofmt'ed; run a formatter to fix
whitespace/formatting so CI passes. Fix by running `gofmt -w` (or `gofmt -w
cmd/tracker/diagnose.go` / `goimports -w`) on the file containing functions like
diagnoseMostRecent, printDiagnoseHeader, printNodeDiagnosis, and
printDiagnoseSuggestions to normalize import formatting, spacing, and line
breaks, then re-run CI.
🧹 Nitpick comments (8)
tracker_activity.go (1)

78-82: Library function writes directly to os.Stderr.

MostRecentRunID writes warnings to os.Stderr, which is unexpected for a library function. Library consumers may want to capture, suppress, or redirect these warnings. Consider returning warnings as part of the function signature or using a logger interface.

♻️ Alternative: Return warnings alongside the result
-func MostRecentRunID(workdir string) (string, error) {
+// MostRecentRunIDResult contains the result and any warnings from MostRecentRunID.
+type MostRecentRunIDResult struct {
+	RunID    string
+	Warnings []string
+}
+
+func MostRecentRunID(workdir string) (MostRecentRunIDResult, error) {
 	runsDir := filepath.Join(workdir, ".tracker", "runs")
 	entries, err := os.ReadDir(runsDir)
 	if err != nil {
 		if os.IsNotExist(err) {
-			return "", fmt.Errorf("no runs found — run a pipeline first")
+			return MostRecentRunIDResult{}, fmt.Errorf("no runs found — run a pipeline first")
 		}
-		return "", fmt.Errorf("cannot read runs directory: %w", err)
+		return MostRecentRunIDResult{}, fmt.Errorf("cannot read runs directory: %w", err)
 	}
 	var latestID string
 	var latestTime time.Time
+	var warnings []string
 	for _, e := range entries {
 		// ...
 		if err != nil {
 			if !os.IsNotExist(err) {
-				fmt.Fprintf(os.Stderr, "warning: skipping run %s: %v\n", e.Name(), err)
+				warnings = append(warnings, fmt.Sprintf("skipping run %s: %v", e.Name(), err))
 			}
 			continue
 		}
 		// ...
 	}
 	if latestID == "" {
-		return "", fmt.Errorf("no runs found with valid checkpoints")
+		return MostRecentRunIDResult{}, fmt.Errorf("no runs found with valid checkpoints")
 	}
-	return latestID, nil
+	return MostRecentRunIDResult{RunID: latestID, Warnings: warnings}, nil
 }

Alternatively, this is acceptable if the CLI is the primary consumer and you document the stderr behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_activity.go` around lines 78 - 82, The MostRecentRunID function
currently writes warnings directly to os.Stderr; change it to avoid
side-effectful stderr writes by either (A) adding a logger/io.Writer parameter
(e.g., logger or io.Writer warnOut) and writing warnings to that instead, or (B)
accumulate warning messages in a slice and return them alongside the result
(e.g., change signature to return (runID string, warnings []string, err error));
update the error handling block that currently calls fmt.Fprintf(os.Stderr, ...)
to append the formatted message to the warnings slice or send it to the injected
logger/writer, and update call sites accordingly.
tracker_doctor.go (1)

521-531: Use exec.CommandContext for consistency.

getBinaryVersion also uses exec.Command without context. Apply the same fix for consistency and to prevent potential hangs.

🔧 Proposed fix
-func getBinaryVersion(path, flag string) string {
-	out, err := exec.Command(path, flag).CombinedOutput()
+func getBinaryVersion(ctx context.Context, path, flag string) string {
+	out, err := exec.CommandContext(ctx, path, flag).CombinedOutput()
 	if err != nil {
 		return "(version unknown)"
 	}

Then update the caller at line 486:

-		claudeVer := getBinaryVersion(claudePath, "--version")
+		ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+		claudeVer := getBinaryVersion(ctx, claudePath, "--version")
+		cancel()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 521 - 531, getBinaryVersion currently calls
exec.Command without a context which risks hangs; change its signature to accept
a context (e.g., getBinaryVersion(ctx context.Context, path, flag string)
string) and use exec.CommandContext(ctx, path, flag). Keep the same error/return
behavior, and update all callers to pass the appropriate context (the calling
function's ctx) so the command inherits cancellation/timeouts; reference
getBinaryVersion and use exec.CommandContext for the actual replacement.
tracker_audit.go (1)

215-219: Library function writes warning to os.Stderr.

Similar to MostRecentRunID, buildLibRunSummary writes warnings directly to os.Stderr. For consistency with library design, consider an alternative approach (returning warnings, using a logger, or documenting the behavior).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_audit.go` around lines 215 - 219, buildLibRunSummary currently writes
a warning directly to os.Stderr when LoadActivityLog fails (similar to
MostRecentRunID); instead return the warning or use the library's logger so
callers can handle it. Modify buildLibRunSummary to not print to os.Stderr: have
LoadActivityLog return the error as before, and propagate that error or append a
warning string to a returned warnings slice (or accept a logger interface) so
callers receive the warning; specifically update the block handling (activity,
lerr := LoadActivityLog(runDir)) to remove fmt.Fprintf(os.Stderr, ...) and
either return lerr up the call chain or record the message into a []string
warnings return value (or call a provided logger), ensuring callers of
buildLibRunSummary are updated to handle the new returned warning or error.
cmd/tracker/run.go (1)

242-245: Cache the agent NDJSON handler once per run.

stream.AgentHandler() is rebuilt per event inside the closure. Cache it once to avoid per-event allocation churn.

♻️ Proposed refactor
 func buildJSONEventHandlers(
 	activityLog *pipeline.JSONLEventHandler,
 	llmClient *llm.Client,
 	logAgentEvent func(agent.Event),
 ) (agent.EventHandler, pipeline.PipelineEventHandler) {
 	stream := tracker.NewNDJSONWriter(os.Stdout)
 	if llmClient != nil {
 		llmClient.AddTraceObserver(stream.TraceObserver())
 	}
+	streamAgentHandler := stream.AgentHandler()
 	agentHandler := agent.EventHandlerFunc(func(evt agent.Event) {
 		logAgentEvent(evt)
-		stream.AgentHandler().HandleEvent(evt)
+		streamAgentHandler.HandleEvent(evt)
 	})
 	pipelineHandler := pipeline.PipelineMultiHandler(stream.PipelineHandler(), activityLog)
 	return agentHandler, pipelineHandler
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/run.go` around lines 242 - 245, The closure rebuilds the NDJSON
handler on every event; cache stream.AgentHandler() once before creating
agentHandler to avoid per-event allocation. Retrieve and store the handler in a
local variable (e.g., cachedAgentHandler := stream.AgentHandler()) and then use
cachedAgentHandler.HandleEvent(evt) inside the agent.EventHandlerFunc closure;
keep existing calls to logAgentEvent(evt) and the agentHandler variable name.
tracker_doctor_test.go (1)

11-14: Prefer t.Setenv over manual os.Setenv + cleanup.

Go 1.17+ provides t.Setenv which automatically handles cleanup. This is cleaner and less error-prone.

♻️ Suggested simplification
 func TestDoctor_NoProbe_KeyPresent(t *testing.T) {
 	workdir := t.TempDir()
-	must(t, os.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890"))
-	t.Cleanup(func() { os.Unsetenv("ANTHROPIC_API_KEY") })
+	t.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor_test.go` around lines 11 - 14, The test
TestDoctor_NoProbe_KeyPresent manually calls os.Setenv and t.Cleanup to restore
ANTHROPIC_API_KEY; replace that pattern with t.Setenv("ANTHROPIC_API_KEY",
"sk-ant-test-12345678901234567890") in the TestDoctor_NoProbe_KeyPresent
function so the testing framework handles teardown automatically and remove the
manual os.Setenv and t.Cleanup calls.
cmd/tracker/audit.go (1)

95-105: Stale comment describes removed behavior.

The comment on lines 97-99 mentions "Re-read timestamps from original entries" and "rebuild from the timeline slice itself," but the code simply iterates and prints each entry. The comment seems to be a leftover from refactoring.

📝 Suggested comment cleanup
-	// Track stage starts for inline duration computation during printing.
 	stageStarts := make(map[string]time.Time)
-	// Re-read timestamps from original entries; TimelineEntry carries Duration
-	// for completed stages but we need start times for the printTimelineEntry format.
-	// Rebuild from the timeline slice itself.
 	for _, entry := range timeline {
 		printTimelineEntryFromLib(entry, stageStarts)
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/audit.go` around lines 95 - 105, The existing comment incorrectly
describes rebuilding timestamps from entries; update or remove it to match the
actual behavior: stageStarts := make(map[string]time.Time) is created and the
code simply iterates over timeline calling printTimelineEntryFromLib(entry,
stageStarts) and then printTimelineTotalDurationFromLib(timeline). Replace the
stale lines about "Re-read timestamps..." and "Rebuild from the timeline slice
itself" with a concise comment stating that stageStarts is used for inline
duration computation and each timeline entry is printed via
printTimelineEntryFromLib, or remove the comment entirely.
cmd/tracker/doctor_test.go (1)

245-253: Consider using strings.Contains instead of a custom helper.

The custom contains helper reimplements strings.Contains. Adding "strings" to imports is trivial and avoids maintaining custom code.

♻️ Suggested simplification
+import (
+	...
+	"strings"
+	...
+)

-// contains is a tiny helper to avoid dragging strings.Contains into tests.
-func contains(haystack, needle string) bool {
-	for i := 0; i+len(needle) <= len(haystack); i++ {
-		if haystack[i:i+len(needle)] == needle {
-			return true
-		}
-	}
-	return false
-}
+// Then use strings.Contains directly in tests
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/doctor_test.go` around lines 245 - 253, The custom contains
helper reimplements standard library functionality; remove the
contains(haystack, needle string) function and replace its uses with
strings.Contains by adding "strings" to the imports and updating call sites to
strings.Contains(haystack, needle); ensure all tests compile after removing the
helper and adjust any package-scoped references to contains accordingly.
cmd/tracker/simulate.go (1)

17-67: Dual parsing reads the pipeline source twice.

The function first parses via loadPipeline/loadEmbeddedPipeline (lines 25-32) for validation warnings, then re-reads the source and parses again via tracker.Simulate (lines 46-64). While the comments acknowledge this is intentional (library Simulate doesn't surface validation warnings), consider whether pipeline.ValidateAll could be moved to the library or if the source could be cached after the first read.

This is a minor performance consideration—acceptable for now since pipeline files are typically small.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/tracker/simulate.go` around lines 17 - 67, runSimulateCmd currently
reads/parses the pipeline twice: first via loadPipeline/loadEmbeddedPipeline so
pipeline.ValidateAll can emit warnings, then it re-reads the file/string and
calls tracker.Simulate; to avoid double reads either (A) reuse the
already-loaded source by capturing the original file contents when calling
loadPipeline/loadEmbeddedPipeline and pass that string into tracker.Simulate, or
(B) move the validation call into tracker.Simulate so you only call
tracker.Simulate once; update runSimulateCmd to stop re-reading the file when
isEmbedded/is not embedded after loading (referencing runSimulateCmd,
loadPipeline, loadEmbeddedPipeline, pipeline.ValidateAll, and tracker.Simulate).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 10-23: Add a new "### Fixed" subsection under the Unreleased
heading and list the behavior/determinism and error-handling fixes introduced by
this PR (e.g., deterministic ordering and improved error handling in
event/tracking flows), referencing the affected APIs and files so readers can
find the fixes—mention tracker.NewNDJSONWriter and the factory handlers
(PipelineHandler, AgentHandler, TraceObserver) if they gained deterministic
output, and call out CLI/library behavior fixes in cmd/tracker/diagnose.go,
audit.go, doctor.go, and simulate.go (and any specific fixes to run-directory
resolution or activity parsing like ResolveRunDir, MostRecentRunID,
LoadActivityLog, ParseActivityLine) as concise bullet points under the new "###
Fixed" header.

In `@cmd/tracker/doctor.go`:
- Around line 240-247: The initial "suffix := \"\\n\"" is ineffectual because
it's immediately overwritten; remove that dead assignment and instead declare
suffix once (e.g., var suffix string) then set it inside the existing if/else
that checks content and content[len(content)-1] in the same block, then append
strings.Join(toAppend, "\n") + "\n" to suffix before calling os.WriteFile with
append(content, []byte(suffix)...). Reference variables/functions: suffix,
content, toAppend, gitignorePath, os.WriteFile.

In `@docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md`:
- Around line 62-63: Replace the hard-coded package-count expectation string
"Expected: build succeeds, all 14 packages pass." (and any similar occurrences
like the later "15" or other numeric counts) with a non-stale verification
instruction such as "Expected: build succeeds and go test ./... passes" or
update the number to match the current PR summary (e.g., 17) if you must keep a
count; ensure you change every matching phrase in this doc (and the repeated
occurrence noted) so the plan no longer hard-codes package counts.

In `@docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md`:
- Around line 149-172: Update the documented Audit API types to match the
implementation: change AuditReport.Status to the actual enum values used in code
(use the exact strings the implementation uses instead of "completed" |
"in_progress" | "failed"), update RunSummary to match the richer shape the PR
implements (replace {RunID, StartedAt, Status, NodeCount} with the actual fields
used in the branch), and ensure any other struct fields in AuditReport (e.g.,
TimelineEntry, RetryRecord, ActivityError) and their JSON tags match the
implementation signatures; locate the declarations named AuditReport,
TimelineEntry, RetryRecord, ActivityError and RunSummary in the spec and edit
their types and documented status values to mirror the shipped contract exactly.

In `@tracker_audit.go`:
- Around line 1-3: The file has gofmt formatting issues; run the formatter and
save the file to fix whitespace/indentation (e.g., run `gofmt -w
tracker_audit.go`) so the package declaration and leading comments in package
tracker conform to gofmt rules; ensure no other formatting changes are left
unstaged before committing.
- Around line 119-132: The classifyStatus function only treats
"pipeline_completed" and "pipeline_failed" as terminal events, causing
budget-halted runs with a "budget_exceeded" activity to be misclassified as
success; update the switch inside classifyStatus (iterating activity[i].Type) to
include case "budget_exceeded": return "fail" so budget-exceeded terminal events
are treated as failures (or alternatively extend the terminal-event check to
include other terminating activity types if preferred).

In `@tracker_diagnose.go`:
- Around line 100-129: collectNodeFailuresLib and loadNodeFailureLib currently
swallow I/O and JSON errors; change their signatures to return errors (e.g.,
collectNodeFailuresLib returns (map[string]*NodeFailure, error) and
loadNodeFailureLib returns (*NodeFailure, error)), treat os.ReadDir/os.ReadFile
and json.Unmarshal failures as real errors to propagate up (only treat
explicitly optional files like missing activity.jsonl as ignorable), replace
fmt.Fprintf to stderr with returned errors, and update callers to
handle/aggregate and expose these errors so a partial/incomplete diagnosis is
not silently reported (also apply the same change pattern to the related helpers
referenced around lines 156-166).

In `@tracker_doctor_test.go`:
- Around line 34-47: The test TestDoctor_NoProviderKeys unsets provider env vars
directly which doesn't restore prior values; replace the os.Unsetenv loop with
calls to t.Setenv for each variable (e.g. t.Setenv("ANTHROPIC_API_KEY", "")
etc.) so the original environment is automatically restored after the test when
invoking Doctor with DoctorConfig{WorkDir: workdir, ProbeProviders: false};
update the loop that references ANTHROPIC_API_KEY, OPENAI_API_KEY,
GEMINI_API_KEY, GOOGLE_API_KEY, OPENAI_COMPAT_API_KEY accordingly.

In `@tracker_doctor.go`:
- Around line 365-380: The getDippinVersion function uses exec.Command which can
hang; switch to exec.CommandContext with a short context timeout: create a
context.WithTimeout (e.g., 2s) and call exec.CommandContext(ctx, path,
"--version") and if that fails try exec.CommandContext with "version"; make sure
to cancel the context (defer cancel()) before returning and preserve the same
output-parsing logic and fallback "(version unknown)" behavior in
getDippinVersion.
- Around line 692-718: checkDiskSpaceLib uses syscall.Statfs which is Unix-only
and breaks Windows builds; either restrict this implementation to non-Windows
using a build constraint or provide a Windows counterpart. Fix by moving this
function into a file with a //go:build !windows (or // +build !windows) header
(keeping the same function name checkDiskSpaceLib) and implement an alternative
checkDiskSpaceLib in a separate file for Windows (or use a cross-platform
library such as golang.org/x/sys/unix for Unix and golang.org/x/sys/windows or a
portable disk-space package for Windows) so the symbol checkDiskSpaceLib is
defined on all platforms and Windows builds no longer fail.

In `@tracker_events.go`:
- Around line 108-117: The NDJSONEvent is being stamped with time.Now() which
overwrites the original agent event time; change the timestamp assignment to
prefer evt.Timestamp formatted with ndjsonTimestampLayout and only use
time.Now() when evt.Timestamp.IsZero(). Update the construction of NDJSONEvent
(the Timestamp field) to format evt.Timestamp if non-zero, otherwise fall back
to time.Now().Format(ndjsonTimestampLayout), keeping all other fields (Source,
Type, NodeID, Provider, Model, ToolName, Content) unchanged.

In `@tracker_simulate.go`:
- Around line 164-168: The orphan nodes appended in simBFSNodeOrder are
non-deterministic because iterating graph.Nodes (a map) yields arbitrary order;
collect orphans into a separate slice (e.g., orphanNodes), sort that slice by a
stable key such as node.ID (or node.Name) using the existing sort package, then
append the sorted orphanNodes to ordered instead of appending directly from the
map; update simBFSNodeOrder to perform these steps so output and tests become
deterministic.

---

Outside diff comments:
In `@cmd/tracker/diagnose.go`:
- Around line 11-192: The file cmd/tracker/diagnose.go is not gofmt'ed; run a
formatter to fix whitespace/formatting so CI passes. Fix by running `gofmt -w`
(or `gofmt -w cmd/tracker/diagnose.go` / `goimports -w`) on the file containing
functions like diagnoseMostRecent, printDiagnoseHeader, printNodeDiagnosis, and
printDiagnoseSuggestions to normalize import formatting, spacing, and line
breaks, then re-run CI.

---

Nitpick comments:
In `@cmd/tracker/audit.go`:
- Around line 95-105: The existing comment incorrectly describes rebuilding
timestamps from entries; update or remove it to match the actual behavior:
stageStarts := make(map[string]time.Time) is created and the code simply
iterates over timeline calling printTimelineEntryFromLib(entry, stageStarts) and
then printTimelineTotalDurationFromLib(timeline). Replace the stale lines about
"Re-read timestamps..." and "Rebuild from the timeline slice itself" with a
concise comment stating that stageStarts is used for inline duration computation
and each timeline entry is printed via printTimelineEntryFromLib, or remove the
comment entirely.

In `@cmd/tracker/doctor_test.go`:
- Around line 245-253: The custom contains helper reimplements standard library
functionality; remove the contains(haystack, needle string) function and replace
its uses with strings.Contains by adding "strings" to the imports and updating
call sites to strings.Contains(haystack, needle); ensure all tests compile after
removing the helper and adjust any package-scoped references to contains
accordingly.

In `@cmd/tracker/run.go`:
- Around line 242-245: The closure rebuilds the NDJSON handler on every event;
cache stream.AgentHandler() once before creating agentHandler to avoid per-event
allocation. Retrieve and store the handler in a local variable (e.g.,
cachedAgentHandler := stream.AgentHandler()) and then use
cachedAgentHandler.HandleEvent(evt) inside the agent.EventHandlerFunc closure;
keep existing calls to logAgentEvent(evt) and the agentHandler variable name.

In `@cmd/tracker/simulate.go`:
- Around line 17-67: runSimulateCmd currently reads/parses the pipeline twice:
first via loadPipeline/loadEmbeddedPipeline so pipeline.ValidateAll can emit
warnings, then it re-reads the file/string and calls tracker.Simulate; to avoid
double reads either (A) reuse the already-loaded source by capturing the
original file contents when calling loadPipeline/loadEmbeddedPipeline and pass
that string into tracker.Simulate, or (B) move the validation call into
tracker.Simulate so you only call tracker.Simulate once; update runSimulateCmd
to stop re-reading the file when isEmbedded/is not embedded after loading
(referencing runSimulateCmd, loadPipeline, loadEmbeddedPipeline,
pipeline.ValidateAll, and tracker.Simulate).

In `@tracker_activity.go`:
- Around line 78-82: The MostRecentRunID function currently writes warnings
directly to os.Stderr; change it to avoid side-effectful stderr writes by either
(A) adding a logger/io.Writer parameter (e.g., logger or io.Writer warnOut) and
writing warnings to that instead, or (B) accumulate warning messages in a slice
and return them alongside the result (e.g., change signature to return (runID
string, warnings []string, err error)); update the error handling block that
currently calls fmt.Fprintf(os.Stderr, ...) to append the formatted message to
the warnings slice or send it to the injected logger/writer, and update call
sites accordingly.

In `@tracker_audit.go`:
- Around line 215-219: buildLibRunSummary currently writes a warning directly to
os.Stderr when LoadActivityLog fails (similar to MostRecentRunID); instead
return the warning or use the library's logger so callers can handle it. Modify
buildLibRunSummary to not print to os.Stderr: have LoadActivityLog return the
error as before, and propagate that error or append a warning string to a
returned warnings slice (or accept a logger interface) so callers receive the
warning; specifically update the block handling (activity, lerr :=
LoadActivityLog(runDir)) to remove fmt.Fprintf(os.Stderr, ...) and either return
lerr up the call chain or record the message into a []string warnings return
value (or call a provided logger), ensuring callers of buildLibRunSummary are
updated to handle the new returned warning or error.

In `@tracker_doctor_test.go`:
- Around line 11-14: The test TestDoctor_NoProbe_KeyPresent manually calls
os.Setenv and t.Cleanup to restore ANTHROPIC_API_KEY; replace that pattern with
t.Setenv("ANTHROPIC_API_KEY", "sk-ant-test-12345678901234567890") in the
TestDoctor_NoProbe_KeyPresent function so the testing framework handles teardown
automatically and remove the manual os.Setenv and t.Cleanup calls.

In `@tracker_doctor.go`:
- Around line 521-531: getBinaryVersion currently calls exec.Command without a
context which risks hangs; change its signature to accept a context (e.g.,
getBinaryVersion(ctx context.Context, path, flag string) string) and use
exec.CommandContext(ctx, path, flag). Keep the same error/return behavior, and
update all callers to pass the appropriate context (the calling function's ctx)
so the command inherits cancellation/timeouts; reference getBinaryVersion and
use exec.CommandContext for the actual replacement.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c37ecbbe-fa56-4fde-9b45-6d379db58ae3

📥 Commits

Reviewing files that changed from the base of the PR and between 76164b0 and 4257942.

📒 Files selected for processing (31)
  • CHANGELOG.md
  • README.md
  • cmd/tracker/audit.go
  • cmd/tracker/diagnose.go
  • cmd/tracker/doctor.go
  • cmd/tracker/doctor_test.go
  • cmd/tracker/flags.go
  • cmd/tracker/json_stream.go
  • cmd/tracker/run.go
  • cmd/tracker/simulate.go
  • docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md
  • docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md
  • testdata/runs/budget_halted/activity.jsonl
  • testdata/runs/budget_halted/checkpoint.json
  • testdata/runs/failed/Build/status.json
  • testdata/runs/failed/activity.jsonl
  • testdata/runs/failed/checkpoint.json
  • testdata/runs/ok/activity.jsonl
  • testdata/runs/ok/checkpoint.json
  • tracker_activity.go
  • tracker_activity_test.go
  • tracker_audit.go
  • tracker_audit_test.go
  • tracker_diagnose.go
  • tracker_diagnose_test.go
  • tracker_doctor.go
  • tracker_doctor_test.go
  • tracker_events.go
  • tracker_events_test.go
  • tracker_simulate.go
  • tracker_simulate_test.go
💤 Files with no reviewable changes (1)
  • cmd/tracker/json_stream.go

Comment thread CHANGELOG.md
Comment thread cmd/tracker/doctor.go Outdated
Comment on lines +62 to +63
Expected: build succeeds, all 14 packages pass. If any package is red before you start, stop — resolve that first.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid hard-coding package counts in the verification steps.

This plan already disagrees with itself (14 here vs 15 later), and the PR summary for this branch reports 17 passing packages. Please switch these expectations to “go test ./... passes” or update them to the current count so the plan does not go stale immediately.

Also applies to: 2619-2619

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/plans/2026-04-17-cli-library-parity-phase-2.md` around lines
62 - 63, Replace the hard-coded package-count expectation string "Expected:
build succeeds, all 14 packages pass." (and any similar occurrences like the
later "15" or other numeric counts) with a non-stale verification instruction
such as "Expected: build succeeds and go test ./... passes" or update the number
to match the current PR summary (e.g., 17) if you must keep a count; ensure you
change every matching phrase in this doc (and the repeated occurrence noted) so
the plan no longer hard-codes package counts.

Comment thread tracker_audit.go
Comment thread tracker_doctor_test.go
Comment thread tracker_doctor.go
Comment thread tracker_doctor.go Outdated
Comment thread tracker_events.go
Comment thread tracker_simulate.go
- enrichFromEntryNF now uses the dual-format timestamp parser shared
  with ParseActivityLine; previously dropped all timestamps from logs
  using the millisecond format, zeroing NodeFailure.Duration and
  silently disabling the suspicious-timing suggestion heuristic.
- CLI --probe help text updated: usage shows [--probe=false] since
  probe is now the default.
- Tests use t.Setenv for Doctor env-var mutations so pre-existing
  values are restored on cleanup.
- CheckResult.Status and CheckDetail.Status doc comments list the
  fifth value "hint" used for informational sub-items. CHANGELOG
  note added.
- Diagnose and Audit doc comments warn that runDir is trusted input;
  point callers at ResolveRunDir / DiagnoseMostRecent for untrusted
  paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix CI format check — realign struct field tags and trim a trailing
blank line that slipped in with the Phase 2 work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
tracker_doctor.go (2)

365-369: ⚠️ Potential issue | 🟠 Major

Use exec.CommandContext with timeout for external binary version calls.

Line 366/368 and Line 522 call external binaries without context; an unresponsive executable can hang Doctor indefinitely.

#!/bin/bash
set -euo pipefail

# Verify direct exec.Command usages in this file
rg -nP '\bexec\.Command\(' tracker_doctor.go

# Show the two helper functions for targeted patching
rg -n 'func getDippinVersion|func getBinaryVersion' tracker_doctor.go -A25

Also applies to: 521-523

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 365 - 369, Replace direct exec.Command calls
in getDippinVersion and getBinaryVersion with exec.CommandContext using a
context with a reasonable timeout (e.g., context.WithTimeout) to avoid hanging
if the external binary is unresponsive; create ctx, defer cancel, call
exec.CommandContext(ctx, ...).CombinedOutput(), and handle context deadline
exceeded or other errors appropriately when logging/returning the version from
those functions.

693-696: ⚠️ Potential issue | 🔴 Critical

syscall.Statfs is Unix-only; this implementation is not cross-platform.

Line 695 uses syscall.Statfs, which is unavailable on Windows builds. This can break compilation unless this function is split behind OS build tags.

#!/bin/bash
set -euo pipefail

echo "Statfs usage:"
rg -n 'syscall\.Statfs|Statfs_t' --type go

echo
echo "Build tags near doctor files:"
fd -e go | rg 'doctor' | while read -r f; do
  head -n 3 "$f" | sed "s|^|$f: |"
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 693 - 696, The checkDiskSpaceLib function
uses syscall.Statfs (Statfs_t) which is Unix-only and will fail Windows builds;
split the implementation by moving the current checkDiskSpaceLib into a file
with a non-Windows build tag (e.g., //go:build !windows) and create a
Windows-specific stub file (with //go:build windows) that defines
checkDiskSpaceLib and returns an appropriate CheckResult (e.g., "unsupported on
windows" or a safe default) so the symbol exists on all platforms and
compilation succeeds; reference the symbols checkDiskSpaceLib, syscall.Statfs,
and Statfs_t when locating and splitting the code.
tracker_diagnose.go (1)

106-110: ⚠️ Potential issue | 🟠 Major

Don’t silently downgrade unreadable or corrupt artifacts into a “clean” report.

Lines 110, 127, 134-135, 166, and 183-184 currently collapse artifact I/O / JSON failures into “no failures” or “no budget halt”. For a public library API, that produces an apparently valid DiagnoseReport with missing retry/timing data and no indication that diagnosis was incomplete. Only genuinely optional files should be ignored; the other read/parse failures need to be returned or surfaced on the report.

Also applies to: 123-135, 162-166, 183-184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_diagnose.go` around lines 106 - 110, collectNodeFailuresLib (and
related readers used by collectNodeFailuresLib) currently swallows filesystem
and JSON parse errors and returns an empty map, which hides incomplete
diagnosis; modify collectNodeFailuresLib to surface I/O and parse failures
instead of treating them as “no failures” — either change its signature to
return (map[string]*NodeFailure, error) or attach error details to the resulting
DiagnoseReport (e.g., an Errors/Incomplete field) so callers can detect
incomplete runs; update any callers (and the creation of DiagnoseReport) to
propagate and/or include those errors (reference symbols:
collectNodeFailuresLib, NodeFailure, DiagnoseReport) and ensure only truly
optional files continue to be ignored.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tracker_diagnose.go`:
- Around line 200-205: When parsing timestamps for stage events, the code
currently stores a zero time when parseActivityTimestampLib fails and later uses
it for duration math; modify the logic so that in the "stage_started" branch you
only set stageStarts[entry.NodeID] when parseActivityTimestampLib returned a
successful timestamp (check the boolean/err result), and in
updateFailureTimingNF (the function called for "stage_failed") guard the
duration calculation by verifying both the start from stageStarts and the end ts
are non-zero (or otherwise valid) before subtracting to compute duration; this
prevents incorrect durations derived from time.Time{} defaults.
- Around line 95-101: DiagnoseMostRecent currently calls MostRecentRunID which
prints warnings to os.Stderr, leaking CLI output from a library function; change
DiagnoseMostRecent to call a non-printing variant (e.g., MostRecentRunIDQuiet)
or adjust MostRecentRunID to accept a suppressWarnings bool or an io.Writer for
diagnostics so it doesn't write to os.Stderr by default, and propagate any
skipped-directory warnings back as structured data (e.g., return an error type
or a slice of warning strings) instead of printing; update DiagnoseMostRecent to
use the quiet variant/signature and handle returned warnings or errors
accordingly so the public API emits no unsolicited stderr output.

In `@tracker_doctor.go`:
- Around line 217-233: The provider loop correctly appends CheckDetail entries
with Status "error" for invalid/auth-failed keys (see isValidAPIKey,
probeProvider, and out.Details), but later logic force-sets the section summary
to "ok" when any provider is configured; update the summary computation (the
code around where section/status is set after the loop) to scan out.Details (or
the provider-specific slice) and set the section's Status to "error" if any
CheckDetail has Status == "error" (otherwise "ok"), so the DoctorReport counts
reflect the detail-level errors instead of always becoming "ok".
- Around line 24-27: The PinnedDippinVersion constant is out of date; update the
declaration of PinnedDippinVersion to match the go.mod requirement by changing
its value to "v0.19.1" so Doctor's version check uses the correct pinned version
(look for the const PinnedDippinVersion = ... in tracker_doctor.go).
- Around line 561-583: The section unconditionally sets out.Status = "ok" even
after adding warn CheckDetail entries (from the workdir home check and
missingGitignoreEntries), which hides warnings; update the end of the function
to compute and set out.Status and out.Message based on the details (or at
minimum set out.Status = "warn" when either the home/workdir check or
missingGitignoreEntries produced a warn), e.g. inspect the details slice for any
CheckDetail with Status == "warn" (references: missingGitignoreEntries, the
workdir home check that appends a warn detail, and the final
out.Status/out.Message assignments) and promote the section status to "warn" so
warnings are reflected in DoctorReport.

---

Duplicate comments:
In `@tracker_diagnose.go`:
- Around line 106-110: collectNodeFailuresLib (and related readers used by
collectNodeFailuresLib) currently swallows filesystem and JSON parse errors and
returns an empty map, which hides incomplete diagnosis; modify
collectNodeFailuresLib to surface I/O and parse failures instead of treating
them as “no failures” — either change its signature to return
(map[string]*NodeFailure, error) or attach error details to the resulting
DiagnoseReport (e.g., an Errors/Incomplete field) so callers can detect
incomplete runs; update any callers (and the creation of DiagnoseReport) to
propagate and/or include those errors (reference symbols:
collectNodeFailuresLib, NodeFailure, DiagnoseReport) and ensure only truly
optional files continue to be ignored.

In `@tracker_doctor.go`:
- Around line 365-369: Replace direct exec.Command calls in getDippinVersion and
getBinaryVersion with exec.CommandContext using a context with a reasonable
timeout (e.g., context.WithTimeout) to avoid hanging if the external binary is
unresponsive; create ctx, defer cancel, call exec.CommandContext(ctx,
...).CombinedOutput(), and handle context deadline exceeded or other errors
appropriately when logging/returning the version from those functions.
- Around line 693-696: The checkDiskSpaceLib function uses syscall.Statfs
(Statfs_t) which is Unix-only and will fail Windows builds; split the
implementation by moving the current checkDiskSpaceLib into a file with a
non-Windows build tag (e.g., //go:build !windows) and create a Windows-specific
stub file (with //go:build windows) that defines checkDiskSpaceLib and returns
an appropriate CheckResult (e.g., "unsupported on windows" or a safe default) so
the symbol exists on all platforms and compilation succeeds; reference the
symbols checkDiskSpaceLib, syscall.Statfs, and Statfs_t when locating and
splitting the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1bf50865-4416-4313-8ec5-c8182d38442d

📥 Commits

Reviewing files that changed from the base of the PR and between 4257942 and fef3871.

📒 Files selected for processing (6)
  • CHANGELOG.md
  • cmd/tracker/flags.go
  • tracker_audit.go
  • tracker_diagnose.go
  • tracker_doctor.go
  • tracker_doctor_test.go
✅ Files skipped from review due to trivial changes (1)
  • tracker_audit.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/tracker/flags.go
  • CHANGELOG.md

Comment thread tracker_diagnose.go
Comment thread tracker_diagnose.go
Comment thread tracker_doctor.go
Comment on lines +24 to +27
// PinnedDippinVersion is the dippin-lang version from go.mod.
// Keep in sync with the require line in go.mod.
const PinnedDippinVersion = "v0.18.0"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Pinned constant:"
rg -n 'PinnedDippinVersion' tracker_doctor.go

echo
echo "go.mod dippin-lang requires:"
rg -n 'github.com/2389-research/dippin-lang' go.mod

Repository: 2389-research/tracker

Length of output: 790


Update PinnedDippinVersion constant to match go.mod requirement.

PinnedDippinVersion is hardcoded to v0.18.0 but go.mod requires v0.19.1. This causes Doctor to report false version mismatches even when dippin is correctly installed at the pinned version.

Update line 26 to const PinnedDippinVersion = "v0.19.1".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 24 - 27, The PinnedDippinVersion constant is
out of date; update the declaration of PinnedDippinVersion to match the go.mod
requirement by changing its value to "v0.19.1" so Doctor's version check uses
the correct pinned version (look for the const PinnedDippinVersion = ... in
tracker_doctor.go).

Comment thread tracker_doctor.go
Comment on lines +217 to +233
if !isValidAPIKey(p.name, key) {
out.Details = append(out.Details, CheckDetail{
Status: "error",
Message: fmt.Sprintf("%-15s %s=%s (invalid format)", p.name, envName, masked),
Hint: fmt.Sprintf("%s keys should match expected format — run `tracker setup`", p.name),
})
continue
}
if probe && p.buildAdapter != nil {
authOk, authMsg := probeProvider(p, key)
if !authOk {
out.Details = append(out.Details, CheckDetail{
Status: "error",
Message: fmt.Sprintf("%-15s %s=%s (auth failed: %s)", p.name, envName, masked, authMsg),
Hint: fmt.Sprintf("your %s key is invalid or expired — export a fresh key or run `tracker setup`", p.name),
})
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Provider check can return overall ok while containing provider error details.

At Line 219/229, invalid/auth-failed providers are marked as error in details, but Line 275 forces section status to ok whenever at least one provider is configured. This under-reports health severity in DoctorReport summary counts.

Suggested fix
 func checkProvidersLib(probe bool) CheckResult {
 	out := CheckResult{Name: "LLM Providers"}
 	var configuredNames []string
 	var missingNames []string
+	hasProviderErrors := false
@@
 		if !isValidAPIKey(p.name, key) {
 			out.Details = append(out.Details, CheckDetail{
 				Status:  "error",
 				Message: fmt.Sprintf("%-15s %s=%s (invalid format)", p.name, envName, masked),
 				Hint:    fmt.Sprintf("%s keys should match expected format — run `tracker setup`", p.name),
 			})
+			hasProviderErrors = true
 			continue
 		}
@@
 			if !authOk {
 				out.Details = append(out.Details, CheckDetail{
 					Status:  "error",
 					Message: fmt.Sprintf("%-15s %s=%s (auth failed: %s)", p.name, envName, masked, authMsg),
 					Hint:    fmt.Sprintf("your %s key is invalid or expired — export a fresh key or run `tracker setup`", p.name),
 				})
+				hasProviderErrors = true
 				continue
 			}
@@
-	out.Status = "ok"
+	if hasProviderErrors {
+		out.Status = "warn"
+	} else {
+		out.Status = "ok"
+	}

Also applies to: 275-281

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 217 - 233, The provider loop correctly
appends CheckDetail entries with Status "error" for invalid/auth-failed keys
(see isValidAPIKey, probeProvider, and out.Details), but later logic force-sets
the section summary to "ok" when any provider is configured; update the summary
computation (the code around where section/status is set after the loop) to scan
out.Details (or the provider-specific slice) and set the section's Status to
"error" if any CheckDetail has Status == "error" (otherwise "ok"), so the
DoctorReport counts reflect the detail-level errors instead of always becoming
"ok".

Comment thread tracker_doctor.go
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
tracker_audit.go (1)

124-131: ⚠️ Potential issue | 🟠 Major

Handle budget_exceeded as a failure terminal event.

At Line 126, classifyStatus only recognizes pipeline_completed and pipeline_failed. Runs halted by budget limits can still be terminal failures and are currently at risk of being reported as "success" via the fallback path.

🔧 Proposed fix
 func classifyStatus(cp *pipeline.Checkpoint, activity []ActivityEntry) string {
 	for i := len(activity) - 1; i >= 0; i-- {
 		switch activity[i].Type {
 		case "pipeline_completed":
 			return "success"
 		case "pipeline_failed":
+			return "fail"
+		case "budget_exceeded":
 			return "fail"
 		}
 	}
 	if cp.CurrentNode != "" {
 		return "fail"
 	}
 	return "success"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_audit.go` around lines 124 - 131, The classifyStatus function
currently only treats "pipeline_completed" as success and "pipeline_failed" as a
terminal failure; add handling for budget-exceeded terminal events by adding a
case for activity[i].Type == "budget_exceeded" that returns "fail" so runs
halted by budget limits are classified as failures (update the switch in
classifyStatus that iterates ActivityEntry entries to include "budget_exceeded"
alongside "pipeline_failed").
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tracker_audit.go`:
- Around line 220-223: In buildLibRunSummary replace the direct write to
os.Stderr with returning structured warning information: remove the
fmt.Fprintf(os.Stderr...) call where activity, lerr := LoadActivityLog(runDir)
is handled, keep activity = nil for continuing, and add a warnings/errs return
value (e.g., []error or []string) or embed a Warnings field on the returned
summary type so callers (CLI) can decide how to present the message; ensure
LoadActivityLog error (lerr) and contextual info (name, runDir) are packaged
into that returned warning instead of printing.

---

Duplicate comments:
In `@tracker_audit.go`:
- Around line 124-131: The classifyStatus function currently only treats
"pipeline_completed" as success and "pipeline_failed" as a terminal failure; add
handling for budget-exceeded terminal events by adding a case for
activity[i].Type == "budget_exceeded" that returns "fail" so runs halted by
budget limits are classified as failures (update the switch in classifyStatus
that iterates ActivityEntry entries to include "budget_exceeded" alongside
"pipeline_failed").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6a20e61b-f0e0-4cfb-a587-65448d175159

📥 Commits

Reviewing files that changed from the base of the PR and between fef3871 and 268f95b.

📒 Files selected for processing (2)
  • cmd/tracker/diagnose.go
  • tracker_audit.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/tracker/diagnose.go

Comment thread tracker_audit.go
Comment on lines +220 to +223
activity, lerr := LoadActivityLog(runDir)
if lerr != nil {
fmt.Fprintf(os.Stderr, "warning: run %s: cannot read activity log: %v\n", name, lerr)
activity = nil // continue with nil so the summary still builds
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid writing to os.Stderr from library code paths.

At Line 222, buildLibRunSummary emits a warning directly to stderr. This introduces presentation side effects in a library API and undermines the data/presentation separation goal.

A better pattern is to return structured warning/error information up to the caller (CLI) and let the CLI decide whether/how to print.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_audit.go` around lines 220 - 223, In buildLibRunSummary replace the
direct write to os.Stderr with returning structured warning information: remove
the fmt.Fprintf(os.Stderr...) call where activity, lerr :=
LoadActivityLog(runDir) is handled, keep activity = nil for continuing, and add
a warnings/errs return value (e.g., []error or []string) or embed a Warnings
field on the returned summary type so callers (CLI) can decide how to present
the message; ensure LoadActivityLog error (lerr) and contextual info (name,
runDir) are packaged into that returned warning instead of printing.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves key CLI-only features (diagnose/audit/doctor/simulate + NDJSON --json stream writer) into the public tracker library as JSON-serializable report APIs, and refactors the CLI commands into thin presentation layers over those library APIs to achieve Phase 1 (NDJSON) + Phase 2 parity from #76.

Changes:

  • Added public library APIs: Simulate, Diagnose/DiagnoseMostRecent, Audit, ListRuns, Doctor, and shared run/activity helpers (ResolveRunDir, MostRecentRunID, LoadActivityLog, etc.).
  • Promoted the CLI NDJSON writer into tracker.NewNDJSONWriter with handler factories for pipeline/agent/LLM trace events; deleted the old CLI-private implementation.
  • Updated CLI commands to call the library and print the same outputs; added tests, fixtures, and docs/changelog entries.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tracker_simulate.go New library Simulate API and report types (graph introspection + BFS plan).
tracker_simulate_test.go Unit tests for SimulateReport fields (nodes/edges/plan/unreachable/attrs).
tracker_events.go New public NDJSON wire type + NDJSONWriter with handler factories.
tracker_events_test.go Tests for NDJSON writer correctness, stable tags, and concurrency safety.
tracker_activity.go New shared helpers for run-dir resolution and activity.jsonl parsing/sorting.
tracker_activity_test.go Tests for run-dir resolution and activity parsing/timestamp formats.
tracker_diagnose.go New library Diagnose / DiagnoseMostRecent and structured failure/suggestion reporting.
tracker_diagnose_test.go Fixture-driven tests for clean/failing/budget-halted run diagnosis.
tracker_audit.go New library Audit + ListRuns APIs and structured audit/timeline summaries.
tracker_audit_test.go Tests for audit status, timeline, durations, retries, and listing order.
tracker_doctor.go New library Doctor API (env/provider/binary/workdir/artifact/disk/pipeline checks).
tracker_doctor_test.go Tests for Doctor behavior (probe opt-in, provider detection, pipeline validation).
cmd/tracker/run.go CLI JSON event mode now uses tracker.NewNDJSONWriter handlers.
cmd/tracker/json_stream.go Removed CLI-private NDJSON stream writer (superseded by library).
cmd/tracker/simulate.go CLI simulate now prints from tracker.SimulateReport (plus legacy validation warnings).
cmd/tracker/diagnose.go CLI diagnose now calls tracker.Diagnose* and prints the report.
cmd/tracker/audit.go CLI audit/list now calls tracker.Audit / tracker.ListRuns and prints the report.
cmd/tracker/doctor.go CLI doctor now calls tracker.Doctor, prints results, and applies CLI-only fixups.
cmd/tracker/doctor_test.go Reduced scope to CLI shim behavior (flags/exit codes/printing/fixups).
cmd/tracker/flags.go Doctor --probe default changed to true; usage updated.
testdata/runs/ok/checkpoint.json New run fixture for successful run.
testdata/runs/ok/activity.jsonl New run fixture activity log for successful run.
testdata/runs/failed/checkpoint.json New run fixture for failed run with retries.
testdata/runs/failed/activity.jsonl New run fixture activity log for failed run.
testdata/runs/failed/Build/status.json New per-node status fixture for failed node.
testdata/runs/budget_halted/checkpoint.json New run fixture for budget-halted run.
testdata/runs/budget_halted/activity.jsonl New run fixture activity log for budget halt.
docs/superpowers/specs/2026-04-17-cli-library-parity-phase-2-design.md New design/spec doc describing the parity phases and APIs.
README.md Added library usage examples for diagnosing runs and streaming NDJSON.
CHANGELOG.md Documented newly added library APIs and CLI refactor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tracker_activity.go Outdated
if err != nil {
return "", err
}
return filepath.Join(runsDir, matched), nil
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResolveRunDir's docstring says it returns an absolute path, but the implementation just filepath.Join's workdir/.tracker/runs/... and will return a relative path when workdir is relative (e.g. "."). Either call filepath.Abs on the result (and return that), or update the docstring to match actual behavior.

Suggested change
return filepath.Join(runsDir, matched), nil
runDir := filepath.Join(runsDir, matched)
absRunDir, err := filepath.Abs(runDir)
if err != nil {
return "", fmt.Errorf("cannot resolve absolute run directory path: %w", err)
}
return absRunDir, nil

Copilot uses AI. Check for mistakes.
Comment thread tracker_simulate.go
Comment on lines +164 to +168
for _, node := range graph.Nodes {
if !visited[node.ID] {
ordered = append(ordered, node)
}
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simBFSNodeOrder appends orphan nodes by ranging over graph.Nodes (a map), which makes the resulting SimulateReport.Nodes ordering nondeterministic when there are multiple unreachable/orphan nodes. For a stable library API (and stable CLI output), consider appending orphans in a deterministic order (e.g. by graph.NodeOrder when present, or by sorting orphan IDs).

Suggested change
for _, node := range graph.Nodes {
if !visited[node.ID] {
ordered = append(ordered, node)
}
}
var orphanIDs []string
for id := range graph.Nodes {
if !visited[id] {
orphanIDs = append(orphanIDs, id)
}
}
sort.Strings(orphanIDs)
for _, id := range orphanIDs {
ordered = append(ordered, graph.Nodes[id])
}

Copilot uses AI. Check for mistakes.
Comment thread tracker_diagnose.go
Comment on lines +163 to +167
path := filepath.Join(runDir, "activity.jsonl")
data, err := os.ReadFile(path)
if err != nil {
return nil
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diagnose reads the entire activity.jsonl into memory (os.ReadFile) and then strings.Split's it, which can be expensive / high-memory for large runs. Consider streaming the file (bufio.Scanner) similar to LoadActivityLog, so library consumers can safely diagnose very large activity logs.

Copilot uses AI. Check for mistakes.
Comment thread tracker_doctor.go Outdated
Comment on lines +729 to +733
if !strings.HasSuffix(pipelineFile, ".dip") {
out.Details = append(out.Details, CheckDetail{
Status: "warn",
Message: fmt.Sprintf("%s is not a .dip file — may not be a valid pipeline", pipelineFile),
})
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkPipelineFileLib warns for any non-.dip file, but DoctorConfig/PipelineFile is described as supporting both .dip and .dot. As written, passing a valid .dot will always produce a warning detail ("not a .dip file"). Consider only warning when the extension is neither .dip nor .dot, or adjust the message to reflect that .dot is supported (even if deprecated).

Copilot uses AI. Check for mistakes.
Comment thread cmd/tracker/simulate.go
Comment on lines +61 to +63
report, err := tracker.Simulate(source)
if err != nil {
return err
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runSimulateCmd respects formatOverride when parsing via loadPipeline, but the library report path always calls tracker.Simulate(source) which auto-detects format from content and ignores formatOverride. This can lead to mismatched validation vs printed report (and potentially different behavior) when --format is used. Consider generating the SimulateReport from the already-parsed graph, or adding a library API that accepts an explicit format.

Copilot uses AI. Check for mistakes.
Comment thread cmd/tracker/doctor.go
Comment on lines +98 to +101
// Write-side-effect fix-ups must not run inside the library. The
// Working Directory check emits a gitignore hint in its details;
// patch the file here now that the user has seen the warning.
maybeFixGitignore(report, workdir)
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybeFixGitignore/checkGitignore will now modify an existing .gitignore automatically whenever the library emits a missing-entry warning. This is a potentially surprising write side effect (and the Phase 2 design doc describes this as a prompt-driven fix-up). Consider gating this behind an explicit user confirmation / flag, or update the docs to reflect the new non-interactive behavior.

Copilot uses AI. Check for mistakes.
```go
type AuditReport struct {
RunID string `json:"run_id"`
Status string `json:"status"` // "completed" | "in_progress" | "failed"
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The design doc's AuditReport.Status comment lists values "completed" | "in_progress" | "failed", but the implementation (tracker_audit.go) uses "success" and "fail" (and ListRuns/Audit tests assert those). Please update this doc snippet to match the actual status strings so library consumers aren't misled.

Suggested change
Status string `json:"status"` // "completed" | "in_progress" | "failed"
Status string `json:"status"` // "success" | "in_progress" | "fail"

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md Outdated
- `tracker.Diagnose(runDir)` / `tracker.DiagnoseMostRecent(workDir)` — structured `*DiagnoseReport` with node failures, budget halt, and typed suggestions (`Kind: "retry_pattern" | "escalate_limit" | "no_output" | "shell_command" | "go_test" | "suspicious_timing" | "budget"`).
- `tracker.Audit(runDir)` — structured `*AuditReport` with timeline, retries, errors, and recommendations.
- `tracker.ListRuns(workDir)` — sorted `[]RunSummary` for enumerating past runs (newest first).
- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has five values: `"ok"`, `"warn"`, `"error"`, `"skip"`, and `"hint"` (informational sub-items such as optional providers not configured).
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CHANGELOG entry says CheckDetail.Status supports "skip", but the code only uses/defines detail statuses like ok/warn/error/hint (and there are no "skip" detail emitters). Either remove "skip" from the CheckDetail.Status list in the changelog or implement/define what a skipped detail line should mean.

Suggested change
- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has five values: `"ok"`, `"warn"`, `"error"`, `"skip"`, and `"hint"` (informational sub-items such as optional providers not configured).
- `tracker.Doctor(cfg)` — structured `*DoctorReport` for preflight health checks. `ProbeProviders` defaults to false; set true to make real API calls for auth verification. `CheckDetail.Status` has four values: `"ok"`, `"warn"`, `"error"`, and `"hint"` (informational sub-items such as optional providers not configured).

Copilot uses AI. Check for mistakes.
- go.mod: upgrade github.com/2389-research/dippin-lang v0.19.1 → v0.20.0
- CI: install dippin CLI v0.20.0 (was v0.10.0 — 10 minor versions stale)
- examples/human_gate_test_suite.dip: rename "default_choice:" to
  "default:" (the real IR field name on HumanConfig).
- Makefile lint: skip human_gate_test_suite.dip with a note — the file
  exercises "timeout:" and "timeout_action:" on human nodes, which are
  tracker-level attrs but are not yet in the dippin-lang HumanConfig
  IR. v0.10.0 accepted unknown fields silently; v0.20.0 rejects them.
  See follow-up upstream issue.

All local checks pass: fmt-check, vet, build, test-short, test-race,
coverage, lint, doctor (ask_and_execute / build_product /
build_product_with_superspec all A/100).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Makefile (1)

136-140: Make the lint skip explicitly temporary to avoid a long-term CI blind spot.

Line 136 introduces a hard skip in a target that CI executes. Consider adding an expiry mechanism (e.g., version gate or tracked TODO/issue reference) so this file is automatically brought back under lint once dippin-lang IR supports timeout_action.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 136 - 140, The Makefile currently hard-skips
examples/human_gate_test_suite.dip inside the case "$$f" block which creates a
permanent CI blind spot; change the skip so it is explicitly temporary by gating
it on a clearly named env var or feature flag (e.g., LINT_ALLOW_HUMAN_TIMEOUT)
or by adding a TODO/issue reference and an expiry/version gate checked by CI.
Concretely, replace the unconditional skip for
examples/human_gate_test_suite.dip in the case "$$f" branch with a conditional
that only skips when the temp flag is set (or until a specified version/date),
and update the echo message to include the TODO/issue identifier so
CI/maintainers can track and remove the skip when dippin-lang IR supports
timeout_action.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Makefile`:
- Around line 136-140: The Makefile currently hard-skips
examples/human_gate_test_suite.dip inside the case "$$f" block which creates a
permanent CI blind spot; change the skip so it is explicitly temporary by gating
it on a clearly named env var or feature flag (e.g., LINT_ALLOW_HUMAN_TIMEOUT)
or by adding a TODO/issue reference and an expiry/version gate checked by CI.
Concretely, replace the unconditional skip for
examples/human_gate_test_suite.dip in the case "$$f" branch with a conditional
that only skips when the temp flag is set (or until a specified version/date),
and update the echo message to include the TODO/issue identifier so
CI/maintainers can track and remove the skip when dippin-lang IR supports
timeout_action.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 28c850cd-97f0-43b1-9ea1-4dc8a3390c99

📥 Commits

Reviewing files that changed from the base of the PR and between 268f95b and d178f88.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • .github/workflows/ci.yml
  • Makefile
  • examples/human_gate_test_suite.dip
  • go.mod
✅ Files skipped from review due to trivial changes (2)
  • .github/workflows/ci.yml
  • go.mod

Applies fixes from CodeRabbit / Copilot / Codex review comments:
- PinnedDippinVersion updated to v0.20.0 (was v0.18.0, mismatched go.mod)
- checkDiskSpaceLib split to tracker_doctor_unix.go / _windows.go — syscall.Statfs is Unix-only and breaks Windows builds
- classifyStatus now returns "fail" for budget_exceeded events (budget-halted runs were mis-classified as success)
- checkProvidersLib propagates individual provider error details to section Status (was always "ok" when any provider configured)
- checkWorkdirLib propagates warn details to section-level Status
- getDippinVersion uses exec.CommandContext with 5s timeout (noctx lint fix)
- NDJSONWriter.AgentHandler preserves agent.Event.Timestamp; falls back to time.Now() only when zero
- simBFSNodeOrder collects orphan nodes into a sorted slice before appending (deterministic output)
- ResolveRunDir calls filepath.Abs to satisfy its "returns absolute path" contract
- MostRecentRunID no longer writes to os.Stderr; invalid checkpoints silently skipped
- checkPipelineFileLib no longer warns on .dot files (both .dip and .dot are valid)
- Fixed ineffectual suffix := "\n" assignment in cmd/tracker/doctor.go maybeFixGitignore
- enrichFromEntryNF/updateFailureTimingNF guard against zero start timestamps in duration math
- CHANGELOG: added ### Fixed section; removed "skip" from CheckDetail.Status values (not a real status)

Skipped (noted in PR summary or tracked separately):
- Copilot: stream activity.jsonl via bufio.Scanner in Diagnose (non-trivial refactor, out of scope)
- Copilot: gate maybeFixGitignore behind user confirmation (behavior change, needs design decision)
- CodeRabbit: buildLibRunSummary stderr write — minor, would require API surface change
- CodeRabbit tracker_diagnose.go:135 — collectNodeFailuresLib/loadNodeFailureLib error propagation (larger refactor, separate PR)
- CodeRabbit tracker_diagnose.go:101 — DiagnoseMostRecent stderr fixed by silencing MostRecentRunID instead
- Codex cmd/tracker/simulate.go:61 — format-aware Simulate API (new feature, out of scope)
- CodeRabbit tracker_doctor.go:674 — artifact directory errors as "warn" vs "error" (intentional severity)
- Doc-only comments on spec/plan markdown files (not code, low value)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tracker_doctor.go (1)

529-539: Use exec.CommandContext for consistency with other version checks.

Static analysis flagged this exec.Command call. While claude --version should be quick, using CommandContext with a timeout would prevent potential hangs (consistent with how getDippinVersion was fixed).

♻️ Proposed fix
 func getBinaryVersion(path, flag string) string {
-	out, err := exec.Command(path, flag).CombinedOutput()
+	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+	defer cancel()
+	out, err := exec.CommandContext(ctx, path, flag).CombinedOutput()
 	if err != nil {
 		return "(version unknown)"
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_doctor.go` around lines 529 - 539, The getBinaryVersion function
currently uses exec.Command which can hang; change it to use exec.CommandContext
with a short context timeout (as done in getDippinVersion): create a context
with context.WithTimeout, defer cancel, build the command via
exec.CommandContext(ctx, path, flag), run CombinedOutput on the cmd, and
preserve the existing error handling and trimming logic (return "(version
unknown)" on error or empty output). Ensure you reference the function name
getBinaryVersion and reuse the same timeout duration pattern used by
getDippinVersion.
tracker_diagnose.go (1)

284-288: Budget suggestion message couples library to CLI interface.

The message references CLI flags (--max-tokens, --max-cost, --max-wall-time) alongside the library-level Config.Budget. Since this is a library API, consider making the message more generic or providing the raw data (limits vs. actual) so CLI and other consumers can format their own actionable message.

💡 Suggested alternative
 	if halt != nil {
 		out = append(out, Suggestion{
 			Kind:    SuggestionBudget,
-			Message: "Raise the relevant --max-tokens, --max-cost, or --max-wall-time flag, or remove the Config.Budget value",
+			Message: "Budget limit reached — increase token, cost, or wall-time limits in your configuration",
 		})
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tracker_diagnose.go` around lines 284 - 288, The suggestion message in the
Suggestion struct (where out = append(out, Suggestion{ Kind: SuggestionBudget,
... }) currently hardcodes CLI flags and couples library code to the CLI; change
this to a library-appropriate message (e.g., "Increase configured budget limits
or remove Config.Budget") or instead populate the Suggestion with structured
data (e.g., include the budget limits and actual usage values alongside Kind:
SuggestionBudget) so CLI/consumers can render actionable flag-specific text;
modify the creation of this Suggestion (SuggestionBudget/Message) to either use
a generic, non-CLI string or add fields on the Suggestion payload with the raw
limit vs actual values for downstream formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tracker_diagnose.go`:
- Around line 284-288: The suggestion message in the Suggestion struct (where
out = append(out, Suggestion{ Kind: SuggestionBudget, ... }) currently hardcodes
CLI flags and couples library code to the CLI; change this to a
library-appropriate message (e.g., "Increase configured budget limits or remove
Config.Budget") or instead populate the Suggestion with structured data (e.g.,
include the budget limits and actual usage values alongside Kind:
SuggestionBudget) so CLI/consumers can render actionable flag-specific text;
modify the creation of this Suggestion (SuggestionBudget/Message) to either use
a generic, non-CLI string or add fields on the Suggestion payload with the raw
limit vs actual values for downstream formatting.

In `@tracker_doctor.go`:
- Around line 529-539: The getBinaryVersion function currently uses exec.Command
which can hang; change it to use exec.CommandContext with a short context
timeout (as done in getDippinVersion): create a context with
context.WithTimeout, defer cancel, build the command via
exec.CommandContext(ctx, path, flag), run CombinedOutput on the cmd, and
preserve the existing error handling and trimming logic (return "(version
unknown)" on error or empty output). Ensure you reference the function name
getBinaryVersion and reuse the same timeout duration pattern used by
getDippinVersion.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8119966b-f0fb-457a-acd3-8523c7de80bd

📥 Commits

Reviewing files that changed from the base of the PR and between d178f88 and 9e0e6ab.

📒 Files selected for processing (10)
  • CHANGELOG.md
  • cmd/tracker/doctor.go
  • tracker_activity.go
  • tracker_audit.go
  • tracker_diagnose.go
  • tracker_doctor.go
  • tracker_doctor_unix.go
  • tracker_doctor_windows.go
  • tracker_events.go
  • tracker_simulate.go
🚧 Files skipped from review as they are similar to previous changes (3)
  • tracker_simulate.go
  • tracker_events.go
  • tracker_activity.go

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 37 out of 38 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tracker_doctor.go
Comment on lines +779 to +783
Message: fmt.Sprintf("%s valid (%d nodes, %d edges)", pipelineFile, len(graph.Nodes), len(graph.Edges)),
})
out.Status = "ok"
out.Message = fmt.Sprintf("%s is valid", pipelineFile)
return out
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkPipelineFileLib can append a warn-level detail for non-.dip/.dot files, but still sets section Status="ok" and Message=" is valid" when parsing/validation succeeds. This makes DoctorReport.Warnings/OK inconsistent with the per-detail warning. Consider promoting the section Status to "warn" (and adjusting Message) whenever any warn-level detail is present and there are no errors.

Copilot uses AI. Check for mistakes.
Comment thread tracker_simulate.go
Comment on lines +102 to +105
func buildExecutionPlan(graph *pipeline.Graph) ([]PlanStep, []string) {
if graph.StartNode == "" {
return nil, nil
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildExecutionPlan returns (nil, nil) when graph.StartNode is empty. Because SimulateReport.ExecutionPlan is not omitempty, this will JSON-marshal as null rather than an empty array, which is awkward for consumers and inconsistent with other report slices. Consider returning empty slices instead (e.g., []PlanStep{} / []string{}) so execution_plan is always an array.

Copilot uses AI. Check for mistakes.
Comment thread tracker_events.go
Comment on lines +51 to +64
func (s *NDJSONWriter) Write(evt NDJSONEvent) {
data, err := json.Marshal(evt)
if err != nil {
fmt.Fprintf(os.Stderr, "tracker: marshal NDJSON event: %v\n", err)
return
}
data = append(data, '\n')
s.mu.Lock()
defer s.mu.Unlock()
if _, werr := s.w.Write(data); werr != nil {
s.errOnce.Do(func() {
fmt.Fprintf(os.Stderr, "tracker: NDJSON stream write error: %v (further write errors suppressed)\n", werr)
})
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NDJSONWriter.Write writes marshal/write errors directly to os.Stderr. Since this is a public library type, emitting to process-global stderr is a surprising side effect and hard for consumers to test/control. Consider returning an error from Write (or exposing an ErrorHandler/logger option on NDJSONWriter) so callers decide how to surface failures.

Copilot uses AI. Check for mistakes.
Comment thread tracker_audit.go
Comment on lines +222 to +226
activity, lerr := LoadActivityLog(runDir)
if lerr != nil {
fmt.Fprintf(os.Stderr, "warning: run %s: cannot read activity log: %v\n", name, lerr)
activity = nil // continue with nil so the summary still builds
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildLibRunSummary prints a warning directly to os.Stderr when LoadActivityLog fails. For a library API (ListRuns), this creates an unexpected global side effect; callers may want to handle/log these errors themselves. Consider returning the error (or collecting per-run warnings in the RunSummary/AuditReport) instead of writing to stderr.

Copilot uses AI. Check for mistakes.
Comment thread tracker_diagnose.go
Comment on lines +134 to +135
fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err)
return nil
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadNodeFailureLib logs JSON unmarshal failures to os.Stderr and then silently drops the node. For a library API intended to cleanly separate data/presentation, this stderr output is an unexpected side effect and makes failures hard to surface programmatically. Consider returning the parse error up the stack (or collecting it into the DiagnoseReport as a warning) rather than printing.

Suggested change
fmt.Fprintf(os.Stderr, "warning: cannot parse %s: %v\n", statusPath, err)
return nil
return &NodeFailure{
NodeID: nodeID,
Outcome: "status_parse_error",
Stderr: fmt.Sprintf("cannot parse %s: %v", statusPath, err),
}

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +152
type AuditReport struct {
RunID string `json:"run_id"`
Status string `json:"status"` // "completed" | "in_progress" | "failed"
TotalDuration time.Duration `json:"total_duration_ns"`
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec doc's AuditReport API shape is out of sync with the implementation: it documents Status values as "completed"|"in_progress"|"failed" (and a different RunSummary shape), but tracker_audit.go currently uses Status values like "success"/"fail" and RunSummary has different fields. Please update this doc section to match the actual exported types so library consumers aren't misled.

Copilot uses AI. Check for mistakes.
- checkPipelineFileLib promotes section status to warn when any detail
  is warn (matches pattern already applied to checkProviders/Workdir).
- buildExecutionPlan returns empty slices instead of nil so
  SimulateReport.ExecutionPlan JSON-marshals as [] not null.
- Spec doc AuditReport section updated to match shipped types:
  Status is "success"/"fail", RunSummary has the implemented fields.

Three stderr-side-effect comments deferred to #104 (coordinated API
decision; add Logger io.Writer to configs or return errors up-stack).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@clintecker clintecker merged commit 8e23397 into main Apr 17, 2026
2 checks passed
clintecker added a commit that referenced this pull request Apr 17, 2026
Batches six related pre-1.0 fixes flagged in the expert panel review
of PR #101 (#102, #103, #104, #105, #106, #109). All changes touch
the v0.18.0 library surface and are breaking, so ship together:

- Thread context.Context through Doctor, Diagnose, DiagnoseMostRecent,
  Audit, and Simulate. Provider probes and binary version lookups
  honor ctx; getBinaryVersion now uses exec.CommandContext with a
  5-second timeout to match getDippinVersion.
- Introduce typed CheckStatus and SuggestionKind so consumers can
  switch-exhaust. Constant values are unchanged.
- NDJSONWriter.Write now returns error. First write failure is still
  logged once to stderr; subsequent failures surface via return value.
- Rename NDJSONEvent -> StreamEvent (wire format unchanged).
- Move DoctorConfig.TrackerVersion/TrackerCommit off the config struct
  behind a WithVersionInfo functional option.
- Add LogWriter to DoctorConfig/DiagnoseConfig/AuditConfig. CLI sets
  it to io.Discard so users stop seeing stray library warnings from
  audit/list/diagnose on os.Stderr.
- sanitizeProviderError strips API keys and bearer tokens from
  provider error bodies before they land in CheckDetail.Message.
- Panic-recover the NDJSON handler closures (pipeline, agent, LLM
  trace) so a misbehaving writer cannot crash the caller goroutine.
- Stream activity.jsonl in Diagnose via bufio.Scanner (1 MB buffer
  limit), matching LoadActivityLog and avoiding memory spikes on
  large runs.
- Rename internal *Lib/*NF/*ForDiagnose helpers to plain names.
- CHANGELOG notes deterministic diagnose suggestion ordering (already
  alphabetical post-#101, documented as a visible change).

Closes #102, #103, #104, #105, #106, #109.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants