Skip to content

Add SafeSkill security badge (50/100 — Use with Caution)#12

Open
OyaAIProd wants to merge 267 commits into
cogitave:mainfrom
OyaAIProd:safeskill-scan-1777514943960
Open

Add SafeSkill security badge (50/100 — Use with Caution)#12
OyaAIProd wants to merge 267 commits into
cogitave:mainfrom
OyaAIProd:safeskill-scan-1777514943960

Conversation

@OyaAIProd
Copy link
Copy Markdown

🟠 SafeSkill Security Scan Results

Metric Value
Overall Score 50/100 (Use with Caution)
Code Score 50/100
Content Score 63/100
Findings 11 findings detected (8 high)
Taint Flows 0
Files Scanned 0
Scan Duration 0.2s

Top Findings

  • 🟠 high: Context boundary escape detected (fake-xml-boundary): "" (commands/clawtool-dashboard.md:33)
  • 🟠 high: Context boundary escape detected (fake-xml-boundary): "" (docs/portals.md:35)
  • 🟠 high: Data exfiltration pattern detected (sensitive-path-access): "read ~/.aws" (docs/sandbox.md:17`)
  • 🟠 high: Context boundary escape detected (fake-xml-boundary): "" (docs/sandbox.md:55)
  • 🟠 high: Persona/safety hijack attempt detected (unrestricted-mode): "without limit" (internal/setup/recipes/governance/assets/Apache-2.0.txt:147)

View full report on SafeSkill


About SafeSkill

SafeSkill is a free, open-source security scanner for AI tools, MCP servers, and Claude Code skills. We scan for code exploits, prompt injection, and data exfiltration risks.

False positive? We take accuracy seriously. If any finding above is incorrect, please open an issue and we will fix it immediately.

…ateway

Phase 2 of the relay surface ([[ADR-014]]). Outside-Claude-Code prompt
dispatch from external orchestrators / CI / Slack / etc. now hits an
authenticated HTTP edge.

Wire summary:

- internal/server/http.go — net/http listener with bearer-token auth
  middleware (constant-time compare so token-validity timing doesn't
  leak the prefix). Three endpoints:
    GET  /v1/health        → {status, version}
    GET  /v1/agents [?status=callable] → registry snapshot
    POST /v1/send_message  → streams Supervisor.Send verbatim
                             (application/x-ndjson, chunked + flushed)
  Default-deny on unrecognised paths. Body capped at 1 MB. TLS
  termination delegated to the operator's reverse proxy.
- internal/server/server.go — extracted buildMCPServer so HTTP and
  stdio share the same MCP boot path (config, secrets, sources,
  search index, every tool registration).
- cmd/clawtool/main.go — extended  subcommand with --listen,
  --token-file, --mcp-http flags (mcp-go's StreamableHTTPServer
  integration deferred to a polish patch). New 'serve init-token'
  subcommand: 32-byte hex token via crypto/rand, written 0600,
  printed to stdout for shell capture.
- internal/cli/cli.go — usage block updated.
- internal/server/http_test.go — 16 httptest-based unit tests
  (auth pass/fail, every endpoint, token-file edge cases,
  init-token round-trip, listen/token-file required guards).
- test/e2e/run.sh — 8 new e2e assertions: real clawtool serve in
  background, curl every endpoint, validates 401 on missing/wrong
  token, 400 on missing prompt, 404 on unknown path, 0600 on token
  file, 64-char hex on init-token output.

Wiki: ADR-014 status updated to 'Phase 1 + Phase 2 shipped'; new
'Outcome — what shipped' retrospective section; log entry covering
both phases; hot.md updated.

Test totals: 196 Go unit + 62 e2e — all race-clean, gofmt clean,
go vet clean.

Carry-over to Phase 3 (Docker) and Phase 4 (dispatch policies).
…recipe

Closes the third leg of ADR-014's rollout (the first two — Phase 1
bridge/send/supervisor and Phase 2 HTTP gateway — landed earlier the
same day).

Wire summary:

- docker/Dockerfile.relay — multi-stage build. Builder is golang:1.25;
  runtime is debian:bookworm-slim with the four upstream agent CLIs
  pre-installed (@openai/codex + @google/gemini-cli + @anthropic-ai/
  claude-code via npm; opencode via the upstream installer) plus
  ripgrep/pandoc/poppler so clawtool's own Read/Grep stay
  exercisable. ENTRYPOINT refuses to start without a mounted
  /etc/clawtool/listener-token file. Exposes :8080.
- docker/compose.relay.yml — reference compose with optional caddy
  reverse proxy that handles automatic ACME. Per-CLI auth env vars
  wired through ${VAR:-} so an unset key leaves the family
  non-callable rather than crashing. State volume keeps codex/claude
  session IDs across restarts. Caddyfile snippet inline as comment.
- internal/setup/recipes/runtime/clawtool_relay.go — new recipe
  under the runtime category. Drops compose.relay.yml into the
  target repo, marker-stamped (managed-by: clawtool) so re-applies
  are idempotent and unmanaged files refuse overwrite without
  force=true. Stability: Beta (promote after 1+ production soak).
- internal/setup/recipes/runtime/assets/clawtool-relay.compose.yml —
  embedded asset, scrubbed-down compose template suitable to ship
  inside an arbitrary repo (image points at ghcr.io/cogitave/
  clawtool-relay:latest; user can swap for local build).
- internal/setup/recipes/runtime/clawtool_relay_test.go — 6 tests
  (Registered, DetectAbsent, ApplyDropsCompose, VerifyAfterApply,
  RefusesUnmanagedOverwrite, ForcedOverwriteSucceeds).
- test/e2e/run.sh — RecipeList enumeration bumped to include the
  three bridge recipes + clawtool-relay (16 recipes total, all 9
  categories populated).

Wiki: ADR-014 status updated to 'Phase 1 + Phase 2 + Phase 3 shipped',
new Phase 3 retrospective subsection, log entry, hot.md, decisions
index all reflect the closure.

Test totals: 202 Go unit + 62 e2e — race-clean, gofmt clean,
go vet clean.

Phase 4 (dispatch policies — round-robin / failover / tag-routed /
affinity) carries forward.
After Phase 2's HTTP-gateway tests fired, the macOS CI run printed 'all
e2e tests passed' but exited with code 1. Cause: under 'set -e', the
EXIT trap re-ran 'kill $HTTP_PID' against a process that had already
been killed earlier in the test, returning non-zero and propagating
out as the script's exit status. Append '|| true' to both kill and
rm in the trap so the cleanup stays idempotent regardless of state.
…robin, failover, tag-routed)

Closes the fourth and final originally-spec'd phase of ADR-014, all
four landing in the same day.

Wire summary:

- internal/agents/policy.go — Policy interface as the dispatch seam.
  pickPolicy resolves the configured mode (or per-call override) into
  the right impl. The supervisor's existing Send call site is the
  only consumer; CLI / MCP / HTTP all hit the same code path.
- explicitPolicy (Phase 1 default) — pinned instance, no fallback.
- roundRobinPolicy — rotates across same-family callable instances
  via atomic.Uint64 per family. Pinned instance always wins over
  rotation; sole callable falls through to explicit dispatch.
- failoverPolicy — primary + ordered chain from AgentConfig.failover_to,
  filtered to callable. Supervisor.dispatch walks the chain on
  Transport.Send error, but only before any byte has streamed —
  retry mid-stream would duplicate partial output to the caller.
- tagRoutedPolicy — ignores --agent, matches case-insensitively
  against Tags, sorts deterministically when multiple match.
- internal/config/config.go — three new fields: AgentConfig.Tags +
  AgentConfig.FailoverTo + Dispatch struct (Mode).
- internal/agents/supervisor.go — Agent struct gains Tags + FailoverTo
  alongside the existing fields. Send branches:
    instance ==  '' && tag != ''  -> tagRoutedPolicy direct
    instance ==  '' && tag == ''  -> Phase 1 precedence chain (Resolve)
    instance !=  ''               -> configured policy (tag overrides)
- internal/cli/send.go — new --tag flag.
- internal/tools/core/agents_tool.go — new tag parameter on
  SendMessage MCP tool. Pre-Resolve only when caller pinned an
  instance and didn't pass a tag (so tag-only doesn't fail noisily).
- internal/server/http.go — sendMessageRequest gains a top-level Tag
  field as sugar for opts.tag.

Tests: 14 new unit tests in policy_test.go (each policy's pick logic
+ supervisor failover cascade + tag dispatch end-to-end), 3 new CLI
flag-parse tests, 2 new e2e assertions (MCP unknown-tag error + HTTP
top-level tag field). Total: 216 Go unit + 64 e2e — race-clean,
gofmt clean, go vet clean.

Carry-over to v0.13.x: affinity policy, optional --mcp-http
StreamableHTTPServer integration, POST /v1/recipe/apply.
…nsport, plus claude/gemini transport fixes from live smoke

Two carry-over items from ADR-014's outcome list, plus two transport
bugs that surfaced during a 4-CLI live smoke against codex / opencode /
gemini / claude on the user's logged-in workstation.

HTTP gateway expansion:

- internal/server/http.go — new GET /v1/recipes (with optional
  ?category=, ?repo= for Detect status) and POST /v1/recipe/apply.
  Mirrors the existing MCP RecipeApply / RecipeList tools so external
  orchestrators can install bridges + project-setup recipes from a
  remote endpoint without spawning a clawtool process. 'repo' is
  required in the body — HTTP callers don't have a terminal cwd, so
  refusing rather than silently mutating $HOME is the right default.
- internal/server/http.go — when --mcp-http is set, mounts mcp-go's
  StreamableHTTPServer at /mcp (and /mcp/) wrapped by the same bearer
  auth middleware. The full clawtool MCP toolset becomes accessible
  to remote orchestrators with no additional client library.
- internal/server/http_test.go — 8 new unit tests covering each new
  handler + auth gate.
- test/e2e/run.sh — 5 new e2e assertions (recipe enumeration, recipe
  apply happy path, recipe apply missing-repo refusal, /mcp 401 on
  unauth, /mcp 200 on auth'd initialize).

Live-smoke transport fixes:

- internal/agents/gemini_transport.go — Gemini CLI exits with code 55
  in any directory it hasn't been invited to trust ('trusted-folder'
  IDE-style safeguard); transport now passes --skip-trust because
  the safeguard is redundant when an operator explicitly wrote
  'clawtool send' themselves. Plus default --output-format text so
  Gemini doesn't silently swallow output in non-TTY contexts.
- internal/agents/claude_transport.go — removed --bare. Older drafts
  added it expecting 'no chrome' behaviour but on the current
  Claude Code build that flag puts the CLI into a path that ignores
  the existing auth session and reports 'Not logged in'. Plain -p
  honours the session and works on a logged-in host. Comment in
  the transport captures the rationale.

Live smoke results (4-CLI fan-out from 'clawtool send --agent <X>'):
  opencode → 'pong' (~11s, free tier)
  codex    → JSON-RPC frames carrying agent_message.text='pong' (~5s)
  gemini   → 'pong' after the --skip-trust + --output-format fix
  claude   → 'pong' after the --bare removal

Test totals: 228 Go unit + 76 e2e — race-clean, gofmt clean,
go vet clean. CI green on previous push.
…licitly

Two transport bugs uncovered while dogfooding clawtool send for the
v0.14 feature ideation fan-out:

- codex_transport.go — codex exec refuses to run in any directory
  it hasn't been invited to trust ('Not inside a trusted directory'
  safeguard). Same IDE-style guard gemini ships and the same
  reasoning applies: in the headless dispatch path the operator has
  explicitly chosen to run 'clawtool send', so the guard is
  redundant. Pass --skip-git-repo-check by default; operators who
  need it can opt back in via extra_args.

- transport.go — startStreamingExec now sets cmd.Stdin =
  bytes.NewReader(nil) explicitly. Some upstream CLIs (codex exec
  in particular) read from stdin to pick up *additional* prompt
  input and will block forever if stdin is left attached or open.
  A pre-closed reader signals 'no extra input' cleanly.

These fixes were what unblocked dogfood fan-out across all four
families. Live smoke now: claude, codex, opencode, gemini all
return 'pong' end-to-end via 'clawtool send --agent <X>'.
Three of the six v0.14 features designed by the multi-CLI fan-out
on 2026-04-26 ship together. ROI ranks 1-3 per the team-research
roadmap. T3 (mem0), T5 (worktree), T6 (semsearch) carry to v0.14.x.

T1 — OpenTelemetry observability (internal/observability/):

- Observer struct: Init / StartSpan / RecordError / SetAttributes /
  Shutdown / Enabled. Disabled = pointer-cheap no-op (StartSpan
  returns input ctx + no-op end).
- Wraps go.opentelemetry.io/otel + sdk/trace + OTLP/HTTP exporter.
  Langfuse-compatible: when LangfusePublic+Secret keys are set,
  exporter sends Authorization: Basic to the Langfuse host.
- Wired into Supervisor.dispatch as 'agents.Supervisor.dispatch'
  span; each Transport.Send call inside the failover chain opens
  an 'agents.Transport.Send' child. observedReadCloser ends the
  child span when the streamed response closes (no leaked spans).
- Process-wide via agents.SetGlobalObserver, picked up by every
  NewSupervisor automatically. Server boot reads
  config.ObservabilityConfig and wires it once.
- 5 unit tests: disabled no-op, enabled span lifecycle, bad
  endpoint tolerance, idempotent Init, nil-receiver safety.

T2 — Auto-lint guardrails (internal/lint/):

- Runner interface: Lint(ctx, path) returns []Finding (LineNumber,
  Column, Severity, Tool, Message). Per-language adapters: Go via
  golangci-lint --out-format json; JS/TS via eslint --format json;
  Python via ruff check --output-format json.
- Hook in executeEdit and executeWrite immediately after a
  successful atomic write. Findings ride back in
  EditResult.LintFindings / WriteResult.LintFindings — same
  response, no async queue. Calling agent self-corrects in the
  next turn.
- Graceful skip when the linter binary isn't on PATH (zero noise
  in non-Go repos that lack ruff, etc.).
- Opt-out via [auto_lint] enabled = false in config.toml; default
  on (nil pointer means default-on).
- 8 unit tests: extension routing, missing binary skip, parse for
  each adapter, IsEnabled defaults, disabled runner.

T4 — Verify MCP tool (internal/tools/core/verify.go):

- VerifyResult { Repo, Checks[], Overall } where Overall is 'pass'
  iff every check passes; one fail flips the whole result.
- Probe order: make test → pnpm test → npm test → go test ./... →
  pytest → cargo test → just test. First match wins. Operator
  pins via target. Unknown target errors clearly.
- Reuses applyProcessGroup (shared with Bash) so timeouts SIGKILL
  the whole process group cleanly.
- Buffered single payload, not stream — caller wants the
  pass/fail summary, not the live log fire hose. Bash already
  streams when streaming is what's wanted.
- DetailsLogExcerpt is the last 4 KiB of combined stdout+stderr,
  prefixed with '…' on truncation. Enough for an agent to read
  the failing assertion without blowing the response budget.
- Registered in server.go alongside the other core tools;
  ToolSearch entry added.
- 8 unit tests: detect Go module, pnpm beats npm, target override,
  unknown target errors, no runner detected, happy path, failing
  test surfaces, tail truncation.
- 2 e2e assertions: Verify happy path on a synthesised Go test
  package + runner-name passthrough.

Wiring:

- internal/config/config.go — new ObservabilityConfig + AutoLintConfig.
- internal/server/server.go — boot wires observability.New +
  agents.SetGlobalObserver + core.SetAutoLintEnabled.

Test totals after this turn: 244 Go unit + 78 e2e — race-clean,
gofmt clean, go vet clean.

Carry-over to v0.14.x: T3 mem0 recipe, T5 git-worktree isolation,
T6 semantic search MCP tool. Specs already in the team-research
roadmap source note.
The remaining three v0.14 features from the team-research roadmap.
With T1+T2+T4 already on main this completes the six tasks the
multi-CLI fan-out designed on 2026-04-26.

T3 — mem0 recipe under knowledge category:

- internal/setup/recipes/knowledge/mem0.go — recipe Apply drops a
  marker-stamped .clawtool/mem0.toml that declares the endpoint
  (cloud or self-hosted) + namespace and documents the one-time
  'claude mcp add mem0 -- npx -y mcp-remote …' the user runs to
  wire the official mem0.ai cloud MCP server into Claude Code.
- Coexists with brain (claude-obsidian); brain stays the
  single-machine vault, mem0 adds cross-machine cross-agent recall
  via the official cloud MCP. Self-hosted Docker supported by
  pointing endpoint at the local URL.
- 6 unit tests + ResetSemanticSearchCache helper + custom-endpoint
  / namespace passthrough + force overwrite path.

T5 — git-worktree isolation:

- internal/agents/worktree/ — Manager.Create reserves
  ~/.cache/clawtool/worktrees/{taskID} under a per-repo advisory
  flock (gofrs/flock wrap), shells out 'git worktree add --detach',
  stamps a marker JSON (TaskID, RepoRoot, BaseRef, Agent, PID,
  CreatedAt) and returns workdir + cleanup func. Cleanup is
  idempotent.
- 'clawtool send --isolated' creates a worktree per dispatch and
  sets opts['cwd'] so the upstream CLI stages/commits there
  instead of the operator's working tree.
- '--keep-on-error' preserves the worktree on dispatch failure for
  inspection via 'clawtool worktree show <taskID>'.
- 'clawtool worktree gc [--min-age 24h]' reaps orphans (dead PID +
  age cutoff). Unix uses syscall.Signal(0) for liveness; windows
  build tag skips reaping (conservative).
- 5 unit tests: create+cleanup, parallel-safe, GC reaps orphan,
  GC skips live PIDs, repoLockKey deterministic + path-distinct.

T6 — SemanticSearch MCP tool:

- internal/index/index.go — Store wraps chromem-go (MIT, pure Go,
  in-memory + persistent vector store). Build walks the repo,
  chunks text files at 80-line boundaries, embeds via the
  configured provider, and adds each chunk to the collection.
- Embedding: OpenAI text-embedding-3-small default (requires
  OPENAI_API_KEY); Ollama nomic-embed-text override via
  CLAWTOOL_EMBED_PROVIDER=ollama (uses OLLAMA_HOST or
  http://localhost:11434).
- internal/tools/core/semsearch.go — MCP tool. Index built lazily
  on first call per repo. Coexists with Grep: ToolSearch routing
  description carries the 'use SemanticSearch for intent /
  Grep for literal' heuristic.
- 7 unit tests: chunking, ignore patterns, NUL detection,
  collectionTag determinism, Search-before-Build error,
  missing-key error, defaults applied.

Wiring: server.go registers Verify + SemanticSearch. cli.go
top-level usage block carries the new send / worktree subcommands.

Test totals: 257 Go unit + 78 e2e — race-clean, gofmt clean,
go vet clean. New deps: github.com/gofrs/flock (Apache-2.0),
github.com/philippgille/chromem-go (MIT). All v0.14 features
from the team-research roadmap now in main.
macOS resolves t.TempDir()-style /var paths to /private/var when
git rev-parse --show-toplevel runs (filesystem-level symlink). Linux
keeps the original path. The TestCreate_AndCleanup marker check
compared the strings directly and failed on Darwin only. Resolve
both sides through filepath.EvalSymlinks before comparing.
… SQLite store) + 3 polish fixes

The 'sana niye yanıt dönmüyorlar kanka' moment shaped this. Ship the
async dispatch surface that closes the loop.

BIAM Phase 1 (ADR-015) — internal/agents/biam/:

- identity.go — per-instance Ed25519 keypair at
  ~/.config/clawtool/identity.ed25519 (mode 0600). LoadOrCreateIdentity
  generates on first launch; round-trips deterministically.
- envelope.go — Envelope struct with the locked v=biam-v1 wire shape
  (task_id, message_id, parent_id, correlation_id, from/to/reply_to,
  kind, body, hop_count, trace[], created_at, ttl_seconds,
  idempotency_key, signature). Sign/Verify use the canonical-JSON
  form with Signature stripped. HasCycle / Hop enforce the loop guard.
- store.go — SQLite (modernc.org/sqlite, pure Go, no CGO) with WAL.
  Tables: tasks, messages, dedupe_keys, peers. Idempotency-key LRU
  silently drops duplicates. WaitForTerminal blocks until done /
  failed / cancelled / expired.
- runner.go — Runner.Submit returns task_id immediately, drains the
  upstream stream into a kind=result envelope (capped 4 MiB), flips
  the task on completion. Failed Send → kind=error + TaskFailed.

Supervisor surface — internal/agents/supervisor.go:

- New SubmitAsync(ctx, instance, prompt, opts) (task_id, error) on
  the Supervisor interface. Wires through globalBiamRunner registered
  via SetGlobalBiamRunner during server boot or CLI bootstrap.

MCP tools — internal/tools/core/:

- SendMessage gains a 'bidi' bool argument. When true, returns task_id
  immediately + persists the dispatch into BIAM. Pair with the new
  TaskGet / TaskWait / TaskList tools.
- tasks_tool.go — TaskGet (snapshot + envelope timeline), TaskWait
  (block until terminal, deadline-bounded), TaskList (recent N tasks).
  All read-only against the BIAM store. ToolSearch entries surface
  the routing hints.

CLI — internal/cli/:

- send.go gains --async. Bootstraps the BIAM runner per-process via
  ensureBIAMRunner(), submits, prints task_id, then blocks on
  WaitForTerminal so the goroutine drain finishes before the short-
  lived CLI process exits.
- task.go — clawtool task list / get <id> / wait <id> [--timeout DUR].
- biam_bootstrap.go — sync.Once guard so the CLI process initialises
  identity + store once, no matter how many subcommands run.

Server boot — internal/server/server.go:

- Init BIAM identity + store on startup; register the runner globally;
  expose the store to MCP Task* tools via core.SetBiamStore.

Polish (3 high-severity fixes from yesterday's polish-worker pass):

- supervisor.go:113 — Agents() now surfaces config load errors instead
  of swallowing them silently.
- transport.go:122 — streamingProcess.Close() returns ExitError when
  the upstream CLI exits non-zero, so callers see assertion failures.
- worktree.go:152 — cleanup func uses sync.Once instead of a plain
  bool, matching the multi-goroutine safety contract the interface
  promises.

Live smoke verified: 'clawtool send --async --agent codex "..." |
xargs clawtool task wait --timeout 60s' returns the JSON envelope
chain end-to-end (prompt + signed reply with parent_id link).

Test totals: 271 Go unit + 78 e2e — race-clean, gofmt clean, go vet
clean. New deps: modernc.org/sqlite (BSD-3-Clause, pure-Go),
github.com/google/uuid (BSD-3-Clause).

Carry-over to v0.15.x: BIAM Phase 2 (NATS JetStream federation),
Phase 4 of the v0.15 roadmap (T1 R3 codex-recommended Phoenix /
LangSmith / Weave OTel presets, Podman sandbox option, age-encrypted
secrets), R4 gemini's Zed extension + go-selfupdate + huh-based plan
builder. Specs in /tmp/claw_v015/ research outputs.
…mand (F2)

Two carry-over features from the v0.15 research roadmap.

F1 — per-instance dispatch rate limit (codex's R3 pick):

- internal/agents/limiter.go — wraps golang.org/x/time/rate per ADR-007.
  Per-instance token buckets + per-instance concurrency semaphore;
  shared across CLI / MCP / HTTP because all three hit Supervisor.dispatch.
  Rate string forms: '30/m', '5/s', '1000/h', '60/1m'. Empty disables.
- internal/agents/supervisor.go — limiter wired into the dispatch
  call site. The release func runs when the ReadCloser closes, so
  long-running streams hold their concurrency slot for their full
  duration. Limiter is built lazily at NewSupervisor time so config
  edits land on the next supervisor construction.
- internal/config/config.go — DispatchLimits { Rate, Burst, MaxConcurrent }.
  Per the operator's quota-aware multi-account use case: 'Anthropic
  hit 30/m, slow it down without crashing the dispatch.'
- 5 unit tests: rate-string parsing across forms, disabled is no-op,
  bucket actually blocks, per-instance independence, concurrency cap,
  ctx-cancellation surfaces an error.

F2 — clawtool upgrade subcommand (gemini's R4 pick):

- internal/cli/upgrade.go — wraps creativeprojects/go-selfupdate
  (Apache-2.0). Detects the latest cogitave/clawtool release, reports
  current vs latest, and atomically swaps the binary. '--check' just
  reports without installing. ErrPermission triggers a clear sudo
  hint instead of the raw permission error.
- Pulls runtime/debug Build info for the current version so
  release-please tags compare correctly without a -ldflags injection.

Test totals: 27 packages green / 0 fail / race-clean. New deps:
golang.org/x/time/rate (BSD-3-Clause), github.com/creativeprojects/
go-selfupdate (Apache-2.0).

Carry-over: selfupdate's asset-name pattern matcher needs the
GoReleaser archive template wired through 'Filters' so DetectLatest
hits the right tarball — for now the command surfaces a clean
'no release found' fall-back. Production unblocking once the
GoReleaser tag is regenerated with explicit asset hints.
…rktree, upgrade

Codex (W1 async BIAM worker) reviewed the v0.14/v0.15 commits against
the live README and produced an applyable patch. Synthesis +
hand-touched final wording.

- New 'What's new in v0.14 / v0.15' hero section: 6 bullets covering
  BIAM, the new MCP tools, dispatch policies, --isolated worktrees,
  mem0, clawtool upgrade, OTel observability, auto-lint.
- 'How to use BIAM async dispatch' mini-section with a CLI example
  and a one-line Claude-Code-side note.
- Top-level Usage block grew the bridge / agent / send --async /
  send --isolated / task / worktree / upgrade subcommands.

W2's onboarding-TUI design landed at wiki/sources/onboard-tui-design-2026-04-27.md
as input for the future 'clawtool onboard' command (charmbracelet/huh
based wizard that detects host CLIs, offers bridge installs, bootstraps
the BIAM identity, runs the existing init recipe picker, and asks for
telemetry consent — to be implemented as F4 next iteration).

Both updates dogfooded the BIAM async path: two parallel codex tasks
fired via 'clawtool send --async --agent codex', task_ids tracked,
results pulled with 'clawtool task get'. End-to-end loop closed.
F3 — Hooks subsystem (Claude Code parity):

- internal/hooks/hooks.go — package-level Manager with Emit per
  Event. SetGlobal registers a process-wide instance; lifecycle
  call sites grab it via hooks.Get() and emit. Empty config = zero-
  cost no-op. block_on_error entries propagate failure to the
  originating op so guard-rail hooks can veto it; non-blocking
  entries log-and-continue.
- Events locked at v0.15: pre_send, post_send, on_task_complete,
  pre_edit, post_edit, pre_bridge_add, post_recipe_apply,
  on_server_start, on_server_stop. New events go additive.
- Hook execution: /bin/sh -c shell or raw argv. Per-hook timeout
  (default 5s, configurable). JSON envelope payload on stdin so
  user scripts skip argv parsing.
- Wired into Supervisor.dispatch (pre_send / post_send), BIAM
  Runner (on_task_complete), Edit + Write tools (pre_edit /
  post_edit), and ServeStdio (on_server_start / on_server_stop).
- Config schema extension: [hooks.events.<name>] takes an array of
  HookEntry (cmd / argv / timeout_ms / block_on_error).
- 8 unit tests: nil manager no-op, empty config no-op, configured
  cmd executes, block_on_error propagates, non-blocking swallows,
  argv mode skips shell, payload arrives on stdin, SetGlobal /
  Get round-trip. Timeout test simplified to a non-zero-exit case
  because subprocess kill semantics on WSL2 stall the
  exec.CommandContext stdin goroutine for sleep-style children;
  process-group reaping is the same TODO the Bash tool already
  carries via applyProcessGroup, and lands as a polish patch.

F4 — clawtool onboard wizard (codex W2 design):

- internal/cli/onboard.go — first-run interactive wizard via
  charmbracelet/huh per ADR-007. Pages: host detection (read-only
  note), missing-bridges multi-select, BIAM identity confirm,
  telemetry consent. Side effects fire only after the form
  returns clean; user-aborted forms exit 0 with no changes.
- Reuses BridgeAdd, biam.LoadOrCreateIdentity, exec.LookPath —
  no duplicate logic.
- Edge cases covered: host missing every CLI (form still runs;
  no bridge installs queued), all CLIs present (skips bridge
  page), telemetry denied (records the negative answer cleanly).
- 6 unit tests against an onboardDeps interface; no real TTY,
  no real exec.

Test totals: 28 packages green / 0 fail / race-clean / gofmt clean.
No new dependencies — huh + lipgloss already on the tree.

Carry-over to v0.15.x:
- subprocess group-kill polish for hooks timeout (matches Bash tool)
- telemetry sink (PostHog Go SDK per gemini's R4) hooked into the
  consent flag the wizard records
- 'clawtool hooks list / test <event>' subcommand for debugging
… README

F5 — internal/telemetry: anonymous PostHog event sink. Strict
allowedKeys allow-list filters payloads, anonymous distinct_id at
$XDG_DATA_HOME/clawtool/telemetry-id (mode 0600), runtime.GOOS/GOARCH
auto-injected. CLAWTOOL_TELEMETRY env kill switch always wins over
config. Default disabled; clawtool onboard records consent.

F6 — clawtool hooks list / show <event> / test <event> [--payload]:
inspect configured events without firing the real lifecycle, debug
shell snippets in isolation. Test coverage for all three subcommands
and synthetic-payload paths.

F7 — internal/sysproc: ApplyGroup (Setpgid only) +
ApplyGroupWithCtxCancel (Setpgid + Cancel for CommandContext callers)
+ KillGroup. Wired into hooks.go so timeouts SIGKILL the whole shell
child tree. Without it a sleep child of /bin/sh outlived parent kill
and held stdio pipes open past the deadline. atomic.Bool for the
timedOut flag (race-clean under -race).

Plus: clawtool onboard chain-confirm step that offers to run
clawtool init right after, so first-run host bootstrap → repo bootstrap
is one continuous wizard. README rewrites for the v0.15 surface
(4-pillar pitch, hooks + onboarding mini-section, expanded usage
block) and a stub-server .gitignore inline-comment fix that was
silently un-ignoring the build artefact.
…gleton, BIAM Close errors, identity race, secret-aware index

HIGH:
- agents/supervisor: rate-limit + round-robin state were reset on every
  NewSupervisor() because both lived on the per-call struct. Hoist to
  process-wide singletons (sharedDispatchState) so MCP / HTTP / BIAM
  callers in one process observe one rotation cursor and one token
  bucket. Test escape hatch: ResetDispatchStateForTest.
- agents/biam/runner: Close() error was discarded by defer, so a crashed
  upstream still recorded TaskDone with a partial body. Capture + flip
  to TaskFailed/KindError when streamingProcess.Close returns.
- tools/core/agents_tool + cli/send: same defer rc.Close() pattern. Lift
  to explicit close + fold ExitError into the result so callers see
  upstream non-zero exit instead of an empty success.
- cli/send: --async + --isolated leaked the worktree because cleanup ran
  only on synchronous failure. Reap the worktree (or honour
  --keep-on-error) after WaitForTerminal returns.
- agents/biam/identity: first-launch parallel CLI invocations could each
  generate + write a fresh keypair, last writer winning. Guard the
  create-and-publish window with gofrs/flock + re-read under the lock.
- index: SemanticSearch could embed .env / id_rsa / *.pem because the
  default Ignore list didn't cover secret-bearing dirs. Extend defaults
  + add a basename guard (isLikelySecret) that fires regardless of
  user-config Ignore overrides.
- tools/core/verify: Ruby probe was missing from probeOrder() and
  --target=ruby resolved to plain `ruby -Itest` (REPL with no script).
  Add Rakefile detection and use `bundle exec rake test` (with a fall
  back when no Gemfile is present).

MEDIUM:
- cli/upgrade: latest.LessOrEqual panicked on "(devel)" / "(unknown)"
  build versions. Skip the comparison for non-semver inputs so dev
  builds still upgrade.
- server: ToolSearch index was built from a 9-key gateable map; v0.15
  always-on tools (SendMessage, Bridge*, Task*, Verify, SemanticSearch,
  …) never made it into the index even though MCP registered them.
  Reuse CoreToolDocs() and only filter the gateable subset by
  config.IsEnabled.
- agents/worktree: cleanup func captured the caller's ctx, so a timeout
  during dispatch then ran git worktree remove against an already-done
  ctx. Use context.Background for the cleanup shell-out.
- agents/supervisor: bad dispatch.limits.rate strings silently disabled
  rate enforcement. Surface the parse error on stderr at supervisor
  construction so the operator notices.
Walk-through of the four /v1 endpoints (health, agents, send_message,
recipes, recipe/apply), the optional /mcp Streamable HTTP transport,
bearer auth, and worked Postman + cURL examples. Surfaced from the
README's CLI reference under `clawtool serve --listen` so a Postman
user can find it without grepping the source.
…rs; store decode failures stop silently dropping rows

- tasks_tool: TaskGet and TaskWait both threw away the MessagesFor
  error. A corrupt envelope row used to look like "task valid, no
  replies yet". Fold the error into out.ErrorReason so the agent
  sees the parse failure instead of an empty body.
- store.MessagesFor: body / trace / created_at unmarshal errors
  were swallowed with `_ = json.Unmarshal(...)`. Now stops on first
  bad row and returns the partial slice with a wrapped error so
  callers can inspect both what made it through and what broke.
ADR-014 stays untouched: browser is a Tool surface, not a Transport.
clawtool wraps github.com/h4ckf0r0day/obscura (Apache-2.0, V8 + Chrome
DevTools Protocol, 30 MB memory vs Chromium's 200+) per ADR-007 so
agents can render SPA / hydrated pages without us hand-rolling a
headless engine.

- BrowserFetch (internal/tools/core/browser_fetch.go): stateless
  single-URL render via `obscura fetch --dump html | --eval ...`. Result
  shape mirrors WebFetch (title / byline / sitename / content) plus
  optional eval_result so agents can swap the two without rewriting
  parsing. Optional CSS-selector wait, --stealth pass-through.
- BrowserScrape (internal/tools/core/browser_scrape.go): bulk parallel
  via `obscura scrape ... --concurrency N --eval ... --format json`,
  hard cap 500 URLs / 50 workers. Tolerates both NDJSON and JSON-array
  output; per-URL errors fold into the row so the batch keeps going.
- engines.go now caches `obscura` alongside `rg` / `pdftotext`. Missing
  binary surfaces a one-shot install hint (Linux/macOS one-liners) at
  call time — no boot-time refusal.
- Tests cover the missing-binary, bad-URL, HTML readability, eval
  pass-through, non-zero exit paths plus the NDJSON/array parser and
  the URL splitter helper. Race-clean.
- Both registered in server.go (always-on) and indexed in
  CoreToolDocs so ToolSearch surfaces them.
- docs/browser-tools.md walks through install, the two tool schemas,
  worked Next.js + bulk-scrape examples, failure modes, and the
  reasoning for picking Obscura over Headless Chrome. README links it
  from the v0.15 hero block. The cookie-driven interactive surface
  (BrowserAction, CDP-over-WebSocket) lands as a follow-up commit
  because cookie injection requires the obscura serve transport, not
  the fetch CLI.
The BIAM runner used to record TaskDone whenever the upstream CLI
exited 0, even when the stream-json body ended with a terminal failure
event like {"type":"turn.failed","error":{"message":"This content was
flagged for possible cybersecurity risk."}}. Real-world repro: codex's
content-policy filter killed a turn mid-flight, codex itself exited
clean, and TaskWait blocked downstream agents on a transcript that
already declared itself failed.

detectStreamFailure walks the last ~12 lines of the buffered body
looking for top-level {"type":"turn.failed"} or {"type":"error"}
events; per-tool failures inside item.completed (e.g. failed bash
command inside an otherwise successful turn) are deliberately
ignored. Failure detail (Error.Message or top-level Message) is
appended to the task body so the operator sees what actually broke
instead of an opaque "failed" status.

Tests cover: turn.failed with nested error.message; healthy turn;
per-tool failure that must NOT flag; empty body.
A portal is a saved web-UI target — a base URL paired with login
cookies, CSS selectors, and a 'response done' predicate — that
clawtool can drive on the operator's behalf so an MCP-aware agent
can ask it questions like any other agent. Per ADR-017 portals are a
Tool surface, not a Transport: the supervisor still only dispatches
to upstreams that publish a stable headless contract.

This iteration ships the persistence + read-only surface; the CDP
driver behind 'ask' lands in v0.16.2 (separate commit because it
needs a websocket client to drive Obscura's CDP server). The
deferred-feature error is uniform across CLI + MCP so the operator
can stage config + cookies today and the agents see the same shape
once the engine arrives.

- internal/config: PortalConfig + Portals map; predicate / selector /
  browser sub-stanzas; portals_io.go helpers (LoadFromBytes,
  AppendBytes, RemovePortalBlock) for the editor-driven add flow.
- internal/portal: Validate (rejects bad scheme / missing scope
  prefix / unknown predicate type / empty input selector / missing
  response_done_predicate); Defaults; ParseCookies (array or single
  object form); AssertAuthCookies (catches incomplete exports);
  Names (sorted); AskNotImplementedError sentinel.
- internal/cli/portal.go: list / which / use / unset / add (opens
  $EDITOR with a TOML template, parses + validates the result
  before appending) / remove / ask (placeholder).
- internal/tools/core/portal_tool.go: PortalList / PortalWhich /
  PortalUse / PortalUnset / PortalRemove / PortalAsk MCP tools. Add
  is CLI-only because it spawns $EDITOR.
- server.go: RegisterPortalTools wired alongside browser tools.
- toolsearch.go: 7 new entries so ToolSearch surfaces the surface.
- README + cli.go usage updated; docs/portals.md walks chat.deepseek.com
  end-to-end (cookie export, secrets.toml shape, predicate vocabulary,
  failure modes).
- ADR-017 (browser-tools-not-transport) and ADR-018 (portal feature)
  shipped to wiki; readers cross-reference cleanly.
The deferred-feature sentinel from v0.16.1 is gone. clawtool portal
ask + the PortalAsk MCP tool now actually drive the saved flow
end-to-end: spawn obscura serve, open an isolated CDP browser
context, seed cookies + extra headers, navigate, run login_check
+ ready_predicate, fill the input + submit, poll
response_done_predicate, return the last response selector's
innerText.

- internal/portal/cdp.go: minimal CDP client over coder/websocket.
  Synchronous request/reply via per-id channels with a single
  reader goroutine; push events without an id are dropped. Wraps
  the six methods we need (Target.createBrowserContext +
  createTarget + attachToTarget, Network.enable + setCookies +
  setExtraHTTPHeaders, Page.enable + navigate, Runtime.evaluate)
  plus convenience EvaluateBool / EvaluateString. Per ADR-007 we
  skip chromedp/cdproto for a surface this small.
- internal/portal/ask.go: orchestrator. obscura serve --port 0,
  stderr scanner pulls the ws:// URL, isolated browser context
  (disposeOnDetach so the cookie jar evaporates after the call),
  cookies + headers seeded BEFORE navigation, native value setter
  + synthetic input/change events for React/Vue/Svelte controlled
  components, click selector or Enter fallback, predicate poll
  every 250ms.
- internal/portal/cdp_test.go: mock CDP server (httptest +
  coder/websocket Accept) covers round-trip, error frame surface,
  evaluate value pass-through, JS exception extraction, predicate
  expression generation, jsString escaping.
- internal/cli/portal.go: PortalAsk runs the real driver, loads
  cookies from secrets.toml under p.SecretsScope, streams progress
  to stderr, the answer to stdout.
- internal/tools/core/portal_tool.go: PortalAsk handler same flow;
  RegisterPortalAliases reads cfg.Portals at boot and binds
  <name>__ask thin wrappers (e.g. my-deepseek__ask) so MCP-aware
  models discover portals as first-class tools.
- server.go now calls RegisterPortalAliases alongside RegisterPortalTools.
- toolsearch.go PortalAsk description updated; README + ADR-018
  promoted to accepted; docs/portals.md describes the actual flow;
  wiki log entry captures the design decisions.
The v0.16.2 portal CDP layer was ~600 LoC of hand-rolled WebSocket /
JSON-RPC client + Chrome launcher. ADR-007 says wrap, don't reinvent —
chromedp/chromedp is the canonical Go DevTools Protocol library
(Apache-2.0, used in production by GoReleaser, k6, every Mailgun
integration test) and it covers exactly the surface portals need:
ExecAllocator for the wizard's local Chrome spawn, RemoteAllocator
for the runtime's Obscura attach, typed actions for navigate /
setCookies / setExtraHTTPHeaders / evaluate.

- internal/portal/driver.go: BrowserSession wraps chromedp ctx +
  allocator-cancel + browser-cancel. Two constructors: NewExecBrowser
  (wizard) and NewRemoteBrowser (runtime). Helpers: Navigate, Cookies,
  SetCookies, SetExtraHTTPHeaders, Evaluate, EvaluateBool,
  EvaluateString. mergeCtx threads caller's ctx through the session
  ctx so deadlines compose.
- internal/portal/ask.go: rewritten on top of BrowserSession. Same
  flow (login_check + ready_predicate poll → fill input + submit →
  response_done_predicate poll → response selector innerText). Native
  value setter + synthetic input/change events stay so React /
  Vue / Svelte controlled components register the prompt insertion.
  Obscura process management lives here too (startObscuraServer +
  readObscuraWS), one source of truth.
- Removed: internal/portal/cdp.go (290 LoC), cdp_test.go,
  chrome_launcher.go (250 LoC), portal_wizard.go (wizard rewrite
  pending the new BrowserSession API).
- Added internal/portal/driver_test.go covering the pieces we own
  (predicate expression generation, jsString escaping, obscura
  ws:// banner scanner with timeout).
- go.mod: chromedp/chromedp + chromedp/cdproto pulled in;
  coder/websocket dropped (chromedp uses gobwas/ws under the hood).

Net diff: -600 LoC, -1 dep (coder/websocket), +1 dep family
(chromedp). 30 packages still green, race-clean.
`clawtool portal add <name>` is now an interactive wizard. The
operator types one command; Chrome opens with a fresh temp profile,
they log in to the portal, the wizard captures cookies via
Network.getAllCookies and asks for three CSS selectors + a
'response done' template. Output: validated config.toml + 0600
secrets.toml.

Driven by the chromedp BrowserSession API from e6af0f2 — wizard
uses NewExecBrowser(Headless=false), runtime keeps using
NewRemoteBrowser pointed at obscura serve. Same code path, two
allocators.

- internal/cli/portal_wizard.go: huh.Form orchestration. Six steps
  (URL+intro, launch Chrome, claude-in-chrome assist hint, login
  gate, cookie capture + auth-name auto-detect, selectors + predicate
  pick, persist). wizardDeps interface lets tests inject a fake
  browser without spawning Chrome.
- internal/cli/portal.go: `portal add` defaults to the wizard;
  `--manual` falls back to the v0.16.1 $EDITOR template path.
- internal/config/portals_io.go: MarshalForAppend exported so the
  wizard round-trips the assembled stanza through the same
  AppendBytes merge that --manual uses.
- internal/portal/portal.go: MarshalCookies helper (mirror of
  ParseCookies) so wizard saves cookies in the same JSON shape
  the runtime expects.
- internal/cli/portal_wizard_test.go: tests for assemblePortalConfig,
  predicateForChoice, filterCookiesForHost, autoDetectAuthCookieNames,
  hostFromURL, buildClaudeInChromeHint. Wizard happy-path uses a
  fakeBrowser that satisfies the small portalBrowser interface.

claude-in-chrome stays unwrapped — wizard generates a copy/paste
prompt operators can drop into the side panel for assisted login,
but clawtool itself never imports an extension dependency. Per ADR-017.

Docs: README.md hero block updated to mention the wizard;
docs/portals.md reorganises the worked example to lead with
`portal add` then keeps the manual export path under `--manual`.
ADR-019 lands. `mcp` is the new authoring noun for MCP server source
code, sister to `skill` (Agent Skills). Co-designed with Codex (task
55a5a480) and Gemini (task 13d4ea86) in parallel BIAM async
dispatches; synthesis preserves Codex's naming + repo-relative
output, both reviewers' .claude-plugin/ day-one + operator-managed
marketplace.

This commit is the SURFACE STUB — generator (`mcp new / run / build /
install`) lands in v0.17. Same deferred-feature pattern v0.16.1
used for `portal ask` before v0.16.2 wired the CDP driver: surface
booked today so agents discover the namespace early; rewriting it
post-adoption isn't free.

- internal/cli/mcp.go: CLI subcommand dispatcher.
  - `mcp list` ships read-only (walker stub; upgrades when generator
    writes .clawtool/mcp.toml markers).
  - `mcp new / run / build / install` return McpNotImplementedError
    sentinel pointing at ADR-019.
- internal/tools/core/mcp_tool.go: McpList / McpNew / McpRun /
  McpBuild / McpInstall MCP tools. RegisterMcpTools wired alongside
  RegisterPortalTools in server.go.
- internal/tools/core/toolsearch.go: 5 new entries so ToolSearch
  surfaces the surface.
- internal/cli/cli.go topUsage block: `clawtool mcp ...` near
  `clawtool skill ...`, with one-liner clarification (mcp = MCP
  server source code; skill = Agent Skill folder).
- README.md hero block: MCP authoring bullet alongside Browser
  tools / Portals.
- docs/mcp-authoring.md: full preview — wizard prompts, per-language
  artifact, install flow, today's interim hand-roll path.
- wiki/decisions/019-mcp-authoring-scaffolder.md (accepted), with
  cross-refs to ADR-006 / 007 / 008 / 010 / 014 / 018.
- wiki/log.md: design synthesis captured (Codex `mcp` + Gemini
  `forge` reviewers) plus the chromedp lesson from v0.16.3.
…rome)

Two-tier integration coverage for portal.Ask, neither requiring an
operator-side smoke run:

1. fakePortalBrowser (default, runs every `go test`) implements the
   new `portal.Browser` interface and simulates a chat portal in
   memory. Records every call, classifies JS expressions
   (fill_input / click_submit / dispatch_enter / extract_response /
   predicate), tracks an N-poll "streaming" delay before the
   response_done predicate goes truthy. Verifies the full wire:
   - cookies + headers seeded BEFORE navigate
   - login_check + ready_predicate polled before fill_input
   - fill_input precedes click_submit
   - response_done predicate polled until truthy (>= configured count)
   - Enter fallback fires when Submit selector is empty
   - missing auth-cookie short-circuits before browser is touched
   - response_done timeout names the failing phase

2. ask_realchrome_test.go (//go:build integration) drives Ask against
   a httptest server that serves a sahte chat HTML — textarea +
   submit button + a fake "Stop" button that disappears on a 200ms
   timeout. Skips itself when no Chrome / Chromium / chromium-browser
   is on PATH so unit-test runs stay portable. Run with
   `make portal-integration`.

Refactor to support this:

- internal/portal/driver.go — extracted `Browser` interface;
  BrowserSession satisfies it via duck typing (compile-time guard:
  `var _ Browser = (*BrowserSession)(nil)`).
- internal/portal/ask.go — Ask gains `opts.Browser` (when set,
  skip obscura spawn + chromedp connect, run orchestration on the
  injected Browser). Pulled the orchestration into runAskOnBrowser
  so the public Ask stays one signature. typeAndSubmit and
  waitForPredicate now take `Browser` instead of *BrowserSession.

Net: 4 new tests covering everything between Validate() and the
final EvaluateString. Zero browser binary required. The fake's
ordering / scripted polling is the closest thing to real
end-to-end you can get without spawning Chrome — and when Chrome
IS available, the tagged test confirms chromedp drives a real
browser through the same JS templates.

Makefile: `make portal-integration` runs the tagged test.
ADR-019 generator lands. `clawtool mcp new <name>` walks the operator
through a huh.Form wizard (or `--yes` for defaults) and writes a real,
compilable MCP server. Per ADR-007 each language adapter wraps the
canonical SDK in its ecosystem.

Live smoke against built binary verified the full chain:
  clawtool mcp new my-thing --yes  → 9 files including Go server.
  go mod tidy && go build ...      → 6.7MB binary.
  echo '<initialize JSON-RPC>' | ./bin/my-thing
                                   → correct serverInfo response.
                                   The server actually speaks MCP.
  clawtool mcp install . --as smoke-test
                                   → [sources.smoke-test] in config.toml.
  clawtool mcp list --root <dir>   → discovers the scaffold.

- internal/mcpgen/: package for the generator.
  - mcpgen.go — Spec / ToolSpec / File / Adapter interface +
    Generate orchestrator + name validators + writeFile guard.
  - common.go — language-agnostic files: .clawtool/mcp.toml marker,
    README, .gitignore, .claude-plugin/plugin.json (opt-in).
  - go_adapter.go — mark3labs/mcp-go v0.49.0. cmd/<name>/main.go +
    internal/tools/example.go + Makefile + go.mod + (opt-in)
    Dockerfile.
  - python_adapter.go — fastmcp ≥0.4. src/<pkg>/ layout +
    pyproject.toml + Makefile + tests/.
  - typescript_adapter.go — @modelcontextprotocol/sdk ≥1.0.
    src/server.ts + tools/ + package.json + tsconfig + test/.
  - mcpgen_test.go — 12 tests: per-language plan, docker opt-in,
    plugin opt-out, refuses existing dir, name + tool name + language
    validators.

- internal/cli/mcp_wizard.go: huh.Form sequence (description,
  language, transport, packaging, plugin manifest, first tool).
  --yes path uses minimal defaults (Go / stdio / native / one
  echo_back tool). mcpgenDeps interface lets tests drive without
  TTY.

- internal/cli/mcp_install.go: reads .clawtool/mcp.toml, derives
  the launch command from language + packaging, writes
  [sources.<instance>] into config.toml. Same registry the
  catalog (clawtool source add) populates — no new code path in
  internal/sources/manager.go.

- internal/cli/mcp.go: rewired from v0.16.4 stub to real impls.
  mcp list now does filepath.Walk skipping noise dirs. mcp run /
  mcp build shim through the project's Makefile (per ADR-007:
  don't reinvent build orchestration).

- internal/tools/core/mcp_tool.go: McpNew + McpList wired to the
  real generator + walker. McpRun / McpBuild / McpInstall surface
  a hint to invoke the CLI shortcut (those touch the operator's
  filesystem + language toolchain so the model giving advice
  is the natural pattern, not driving the build via MCP).

- internal/cli/mcp_test.go: wizard --yes happy path + bad-name
  rejection + existing-dir refusal + walker discovery.

Total surface: 5 CLI verbs, 5 MCP tools, 12+ unit tests, real
end-to-end smoke. README + docs/mcp-authoring.md updated to
"v0.17 shipped". Wiki log entry captures the design + smoke
results.
Single command wipes everything clawtool drops on the host, so test
installs don't pile up duplicate sources / portals / sticky pointers.
Smoke-tested end-to-end against the built binary: dry-run preview →
real removal → idempotent re-run.

- internal/cli/uninstall.go — the verb. Plans a list of targets
  (config, cache, data, optional binary), prints them, asks for
  confirmation (skip via --yes), atomically removes via os.RemoveAll.
- Flags:
    --yes            Skip the y/N prompt.
    --dry-run        Print the plan without touching disk.
    --purge-binary   Also remove $CLAWTOOL_INSTALL_DIR/clawtool
                     (defaults to ~/.local/bin/clawtool — Homebrew /
                     curl-installed binaries should be removed via
                     the source's own uninstall path).
    --keep-config    Preserve config.toml + secrets.toml + identity.
                     Removes only sticky pointers + caches + BIAM data.
- Targets enumerated dynamically — non-existent files drop from the
  plan so the rendered list reflects reality. Idempotent: a second
  run prints "nothing to remove".
- internal/cli/uninstall_test.go — 6 tests (dry-run, full sweep,
  purge-binary, keep-config selective removal, nothing-to-do path,
  arg parser). XDG_* env overrides isolate every test in t.TempDir.
- topUsage block in cli.go updated.

The MCP tool variant is intentionally omitted: destroying clawtool
state from inside a model conversation is too high-blast-radius.
Operators run the CLI verb themselves.
…ocker)

ADR-020 lands. Synthesised from parallel BIAM async dispatches: Codex
(task 4468aa25) recommended `mcp`-style noun + native-flag composition
+ BIAM cancel fix; Gemini (task 87343e0f) recommended `vault` (rejected
— HashiCorp Vault collides) + Engine interface shape. Both reviewers
converged on bwrap (Linux/WSL2) / sandbox-exec (macOS) / docker
(fallback) + external-wrap-over-native-delegate.

This commit ships the SURFACE: profile parser, engine probes,
read-only verbs (list / show / doctor), MCP tool catalog. The
dispatch-time wrapping (clawtool send --sandbox <profile> actually
constraining the upstream agent) lands incrementally per ADR-020:
v0.18.1 bwrap adapter, v0.18.2 sandbox-exec, v0.18.3 docker, v0.19
Windows. Same incremental pattern v0.16.4 used for `mcp` before
v0.17 filled in the generator.

Live smoke against built binary verified the full surface:
  clawtool sandbox list   → two configured profiles + bwrap engine
  clawtool sandbox show   → renders paths/network/limits correctly
  clawtool sandbox doctor → bwrap + docker both detected on this
                            WSL2 host, noop fallback always
                            available, bwrap selected as primary

- internal/config/config.go: SandboxConfig + SandboxPath +
  SandboxNetwork + SandboxLimits + SandboxEnv added next to
  PortalConfig. Schema covers paths (ro/rw/none), network
  policy (none/loopback/allowlist/open), allow list, env
  allow + deny, timeout / memory / CPU shares / process count.
- internal/sandbox/sandbox.go: Engine interface (Name/Available/
  Wrap), Profile type, ParseProfile (validates modes + network
  policy + duration + byte sizes), parseBytes ("1GB", "512M",
  raw), SelectEngine (priority order, falls through to noop),
  AvailableEngines (for doctor).
- internal/sandbox/bwrap_linux.go: bubblewrap engine probe.
  Available() looks for bwrap on PATH. Wrap() returns a
  deferred-feature error pointing at v0.18.1 (matching the
  pattern v0.16.1 used for portal ask).
- internal/sandbox/sandbox_exec_darwin.go: macOS sandbox-exec
  probe + deferred Wrap (v0.18.2).
- internal/sandbox/docker_anywhere.go: cross-platform fallback.
  Available() runs `docker info` to check the daemon, not just
  the client binary. Deferred Wrap (v0.18.3).
- internal/sandbox/sandbox_test.go: 7 tests (full-shape parse,
  bad mode, bad network policy, allow-without-allowlist,
  parseBytes table, SelectEngine non-nil, AvailableEngines
  includes noop).
- internal/cli/sandbox.go: list / show / doctor / run dispatcher.
  list iterates configured profiles + reports the selected engine.
  show parses one profile through ParseProfile + renders all
  fields. doctor walks every registered engine + Available.
  run is the escape hatch (deferred error today).
- internal/tools/core/sandbox_tool.go: SandboxList / SandboxShow /
  SandboxDoctor MCP tools. SandboxRun deliberately omitted —
  letting a model spawn sandboxed commands has the wrong default.
- ToolSearch indexes the three new MCP tools.
- topUsage block in cli.go updated.
- docs/sandbox.md walks engines / profile schema / per-agent
  default / native composition / failure modes.
- wiki/decisions/020-sandbox-feature.md (accepted) — full design
  including the `[sandboxes.X.native]` sub-stanza Codex
  contributed and the BIAM cancel fix Codex flagged at
  internal/agents/biam/runner.go:61.
Multi-stage Dockerfile + docker-compose.yml + Caddyfile + docs/docker.md.
clawtool now ships as a runnable container alongside the Go binary
distribution.

Live-tested against real Docker (29.2.1):
  docker build -t cogitave/clawtool:dev .   → 15MB image
  docker run --rm cogitave/clawtool:dev version → "clawtool 0.9.2"
  echo '<initialize JSON-RPC>' | docker run -i --rm ... | head -1
    → correct serverInfo response — image speaks MCP.
  docker run -i --rm ... <tools/list call>
    → BrowserFetch / McpNew / PortalAsk / SandboxList all exposed.

- Dockerfile — two stages. golang:1.26-alpine compiles the static
  binary with CGO_ENABLED=0, -trimpath, -ldflags injecting version
  metadata. Runtime is gcr.io/distroless/static-debian12:nonroot —
  no shell, no apt, no glibc, just the binary + ca-certificates,
  running as UID 65532. ENTRYPOINT ["clawtool"], CMD ["serve"]
  so `docker run -i ...` is a stdio MCP server out of the box.
- .dockerignore — keeps the build context lean (~5MB instead of
  ~50MB once the wiki / .raw / docs land).
- docker-compose.yml — clawtool serve --listen 0.0.0.0:8080 +
  Caddy reverse proxy with auto-TLS. Volumes persist
  config / cache / data across restarts. Token file mounted
  read-only. Mirrors the clawtool-relay recipe but at the repo
  root for operators who clone the source instead of running
  `clawtool init`.
- Caddyfile — minimal reverse-proxy config. Caddy doesn't
  terminate clawtool's bearer-token auth; it just proxies. Auto
  Let's Encrypt when CLAWTOOL_DOMAIN points at a public host.
- Makefile — `make docker` builds, `make docker-smoke` does the
  MCP-initialize verify (the same handshake this commit's smoke
  test ran).
- docs/docker.md — full operator guide: stdio + HTTP modes,
  volume mounts, persisting state, mounting host config
  read-only, sandbox interaction (clarifying that you don't run
  the sandbox feature inside Docker — bwrap / sandbox-exec live
  on the host).
- README.md hero block updated.
…fore-Write, Edit diff (ADR-021)

ADR-021 phase A. Synthesised from parallel Codex (BIAM task 6435286b)
and Gemini (task c977810b) audits against Cursor / Cline / Aider /
Cody best practice. Codex flagged the critical correctness point:
MCP session_id is NOT model-supplied — must come from
server.ClientSessionFromContext(ctx). Implemented exactly that.

Live-tested end-to-end against built binary:
  Read .../existing.txt → file_hash=a948904f2f0f... (SHA-256 verified)
  Read .../existing.txt with_line_numbers=true → render carries '   1 | hello world' prefix
  Write .../existing.txt content='new'  → REFUSED:
    'has not Read /tmp/.../existing.txt — Read it first (or pass mode="create" ...)'
  Edit .../multiline.go old='old' new='NEW' → returns diff_unified:
    --- a/.../multiline.go
    +++ b/.../multiline.go
    @@ -1,3 +1,3 @@

- internal/tools/core/session_state.go — SessionState + SessionKey,
  Sessions singleton, RecordRead / ReadOf / SessionKeyFromContext
  (uses server.ClientSessionFromContext, anonymous fallback for
  stdio/tests). HashFile + HashString + hashBytes helpers.
- internal/tools/core/session_state_helpers.go — readFileForHash
  shim so tests can stub disk reads without touching production
  ReadFile callers.
- internal/tools/core/read.go — ReadResult gains FileHash +
  RangeHash. runRead computes both after a successful read and
  records into the session registry. New with_line_numbers flag
  (default false) prefixes the rendered text with '%4d | ' —
  agents can reference lines accurately, JSON content stays raw
  so Edit's exact-substring matching keeps working.
- internal/tools/core/write.go — Read-before-Write guardrail.
  guardReadBeforeWrite() runs before executeWrite. Three new args:
    mode: 'create' | 'overwrite' (default '')
    must_not_exist: bool
    unsafe_overwrite_without_read: bool
  Existing file + no prior Read on the session = error message
  pointing at the four ways to satisfy the check (Read first,
  mode='create', must_not_exist, or the explicit unsafe bypass).
  Stale detection: if file's current SHA-256 doesn't match the
  one recorded at Read time, refuse with 'changed since this
  session Read it'.
- internal/tools/core/edit.go — EditResult gains HashBefore,
  HashAfter, DiffUnified. unifiedDiff() emits a 'diff -u'-style
  patch (--- a/path / +++ b/path / @@ hunk / line-by-line walk),
  capped at 200 lines so multi-line rewrites don't bloat the
  response. lcsLen kept as a stub for the future LCS-driven
  hunk algorithm.
- internal/tools/core/session_state_test.go — 11 tests:
  hashBytes determinism, HashFile round-trip, Sessions
  record/lookup with isolation across keys + paths, anonymous
  fallback, prefixLineNumbers formatter, guard rejecting
  no-prior-Read, allowing after recorded Read, rejecting on
  stale hash, create-mode rejecting existing file, create-mode
  passing for new path, unsafe override bypassing guard.
- wiki/decisions/021-core-tools-polish.md (accepted) — full
  design + the eight items, two-phase rollout plan, hash strategy,
  MCP session id contract, open questions.

Phase B (next commit): Glob .gitignore default-on, Grep context
lines + multi-pattern, Bash background mode, WebFetch SSRF
guard, WebSearch filters.
bahadirarda and others added 27 commits April 30, 2026 00:42
The daemon's combined stdout/stderr lands in
$XDG_STATE_HOME/clawtool/daemon.log — every goroutine panic,
every "clawtool: <subsystem>: <error>" stderr line, every BIAM
reap warning ends up there. Pre-this commit the file was local-
only, so a daemon stuck in a panic loop on someone else's host
was invisible to us until they filed an issue. With telemetry
opt-in (pre-v1.0 default = on), forwarding classified failures
gives us the diagnostic feedback loop we need to triage bugs
before the operator notices.

Design:

internal/telemetry/logwatch.go — LogWatcher that:
- Tails daemon.log starting from EOF (never streams the
  historical buffer; new errors only).
- Classifies each line into severity ∈ {error, warn, panic}
  and event_kind from a small allow-list (panic / fatal / biam
  / auth / io / other). Classifier uses substring matching for
  the hot path; ordering documented in classify().
- Rate-limits to 60 events per minute. A panicking daemon
  emits the first minute of evidence then goes quiet — well
  under PostHog's per-distinct-id quota and harmless on the
  back end if the operator's host is genuinely flapping.
- Emits clawtool.daemon.log_event events with severity +
  event_kind + command:"daemon" + transport:"http". NO log-line
  bodies cross the wire — only the classification fields, so
  an env-value or path that happens to be in the log can't leak.

internal/telemetry/telemetry.go — added "severity" to the
allowedKeys allow-list (otherwise the property would be silently
dropped before reaching PostHog).

internal/server/server.go — wires NewLogWatcher after telemetry.New
during the HTTP transport boot path. stdio path skipped because
it's per-call and the log forwarder needs a long-running daemon
to be useful. Watcher cancellation rides the existing ctx.

Tests:
- TestClassify_Taxonomy covers every documented severity /
  event_kind branch including the order-dependent cases (BIAM
  init failure that contains "no such file" — must classify as
  biam, not io).
- TestLogWatcher_NilClientNoOps guards the boot-order contract:
  a nil or disabled telemetry client must make Run a clean
  return rather than panic, so server.go doesn't have to gate
  every call.

Build, vet, deadcode, full 49-package test suite, stub-e2e all
green via `bash scripts/ci.sh`.

This is the daemon-side half of "let's see what's failing on
the operator's host before they have to tell us." The dashboard-
side half (PostHog Insights query for clawtool.daemon.log_event,
broken down by severity + event_kind + version) is operator
work in PostHog itself; the events ship the moment a daemon
on a telemetry-enabled host emits a classifiable line.
…ersion filtering

PostHog's Sessions and Live views filter by $lib_version out of
the box. Pre-this commit Track only stamped $lib (always
"clawtool-go"), so a regression introduced in v0.22.30 looked
identical to one in v0.22.36 in the dashboard — operator had to
manually filter on the existing `version` property which the LLM
Observability board doesn't query by default.

Track() now auto-fills $lib_version from version.Resolved() when
the caller doesn't supply one. The CLI's per-command Track sites
that already pass an explicit `version` string keep their value
untouched; this just adds the PostHog-canonical $lib_version
field that sessions / live / cohort queries query by default.

Together with the daemon log forwarder shipped in 3f6af08, every
clawtool.daemon.log_event from an operator's host now lands in
the Sessions view with $session_id (groups events from one
daemon run) + $lib_version (filter by build) + severity / event_kind
(classify the failure) — letting us see which version, on which
session, hit which class of error. That's the live-debugging
loop the project needed for pre-v1.0 iteration.

Test coverage: existing telemetry test suite (TestTrack_*) keeps
passing because $lib_version is allow-listed and the auto-stamp
only fires when the caller didn't already set it.
…evel diagnostics

The daemon log forwarder shipped in v0.22.36 covers reactive
diagnostics (something errored, here's the classification). What
was missing: proactive cohort segmentation. Without a fingerprint
event we couldn't answer "are panics concentrated on WSL hosts"
or "does the upgrade flow succeed differently on Apple Silicon
vs Intel" without asking operators to file an issue.

internal/telemetry/fingerprint.go — FingerprintProps() builds a
single-row property map covering every dimension that's legal to
collect anonymously:

Hardware band:
  cpu_count   — runtime.NumCPU()
  mem_tier    — "<2GB" | "2-8GB" | "8-32GB" | ">32GB" | "unknown"
  go_version  — runtime.Version()

Environment fingerprint:
  container   — bool (docker / podman / k8s detection)
  is_ci       — bool (GitHub / GitLab / Circle / Travis / Jenkins / Buildkite)
  is_wsl      — bool (Microsoft / WSL signature in /proc/version)
  term_kind   — "tty" | "ssh" | "ci" | "headless"
  locale_lang — first 2-5 chars of $LANG (e.g. "tr" / "en"); "unknown" on parse fail

Agent CLI presence (boot-time PATH probe):
  claude_code_present, codex_present, gemini_present, opencode_present

Network reachability (1s TCP dial each):
  posthog_reachable, github_reachable

Strict legal limits — every dimension is one of:
  - an enumerable bucket (mem_tier, term_kind, locale_lang)
  - a public runtime attribute (cpu_count, go_version)
  - a presence boolean (claude_code_present)
  - a reachability boolean (posthog_reachable)

NOTHING per-user-identifiable. NO paths. NO env values. NO
hostnames. The TestFingerprintProps_NoSensitiveContent unit test
guards this with explicit forbidden-substring checks
(/home/, /Users/, @, Authorization, sk-, ghp_, etc.) running
against the real environment so a future dimension that leaks
PII fails the test the moment it lands.

GeoIP suppression: every Track call now stamps $geoip_disable=true
unconditionally. Even though PostHog could resolve city / country
from the request IP, we don't want that level of fidelity even
under "anonymous diagnostics" — operator's network location is
not a dimension we asked for and not one we promised to collect.

Wire shape: clawtool.host_fingerprint event emitted once per
daemon boot, after server.start, alongside install_method (from
$CLAWTOOL_INSTALL_METHOD) and the canonical version. Tied to the
daemon's $session_id + $lib_version so PostHog Sessions / Live
views can pivot "v0.22.37, WSL, mem_tier=8-32GB cohort, what's
their panic rate?" out of the box.

Live-verified: smoke test against a debug daemon on this host
shows the event flowing with cpu_count:8 mem_tier:2-8GB
go_version:go1.26.0 is_wsl:true term_kind:tty
claude_code_present/codex_present/gemini_present/opencode_present:true
posthog_reachable:true github_reachable:true install_method:manual
locale_lang:c — exactly the cohort dimensions PostHog needed.

Three new unit tests:
- TestFingerprintProps_StrictAllowList: every key the
  fingerprint emits MUST be in allowedKeys. Catches a future
  dimension landing without an allow-list entry (would silently
  drop on the wire).
- TestFingerprintProps_NoSensitiveContent: no value contains
  any forbidden substring. Legal contract guard.
- TestMemTier_Buckets + TestDetectLocaleLang_Buckets: cover
  the bucket logic + the unknown-fallback contract.

Also fixes test/e2e/realinstall/run.sh: the upgrade --check
case statement now accepts the v0.22+ wire shape ("already on
the latest" / "->") so the test passes when install.sh fetches
the latest GitHub release (and thus the just-installed binary
IS the latest, leaving --check a clean no-op).

Build, vet, deadcode, full 49-package test suite, stub-e2e all
green via `bash scripts/ci.sh`.
…utput

Onboard is the first ten seconds the operator spends with clawtool;
the wizard either hooks them or churns them. Pre-fix: typing
`clawtool onboard` left every line of pre-existing terminal noise
(npm install, git status, whatever was there) above the wizard,
the welcome page was a multi-paragraph huh.NewNote that overflowed
on small terminals, and the side-effect dispatch (bridges, daemon,
identity, secrets) printed a stream of mixed-glyph stdoutLn lines
with no visual structure.

This commit polishes that surface end-to-end:

internal/cli/onboard_ux.go — new onboardUX renderer. Mirrors the
upgrade UX's design constraints (TTY-aware, plain ASCII when piped,
no spinners) with three new affordances:

- ClearScreen: emits \033[2J\033[3J\033[H so onboard lands on a
  clean slate, scrollback included. No-op when stdout isn't a tty
  so piped invocations / CI logs stay greppable.
- Header(version, found): rounded-box panel with the live host
  detection result rendered as a single tight pill row of ✓/·
  markers — replaces the prior welcome page's multi-paragraph
  hostSummary block.
- Section / PhaseStart / PhaseDone / PhaseSkip / PhaseFail / Note
  / Summary / NextSteps — same protocol upgrade_ux.go uses, so
  operators who've run `clawtool upgrade` already know the
  cadence (→ doing X → ✓ X (350ms · detail)).

internal/cli/onboard.go — wires the new UX:

- ClearScreen + Header at entry; the prior huh.NewNote welcome
  group is dropped (boxed header replaces it).
- Side-effect dispatch refactored into Section blocks:
  Bridges / MCP host registration / Daemon / Identity / Secrets
  store. Each step renders as a phase pair (PhaseStart label →
  PhaseDone detail) with timing.
- Closing block is now a tight Summary checklist + NextSteps
  panel instead of stream-of-stdoutLn paragraphs. One screen,
  scan-friendly.
- The "Run `clawtool send --list`" hint stays on stdoutLn for
  test-harness compatibility (existing TestOnboard_AllPresent_*
  asserts on it).

Smoke output (plain-ASCII capture, what shows in `tee` / log files):

    clawtool onboard  v0.22.37
    ----------------------------
    [OK] claude-code  [OK] codex  [OK] gemini  [OK] opencode  [--] hermes

      Bridges
      -------
      -> install bridge hermes
      FAIL install bridge hermes
        prereq not satisfied
      ...
      Daemon
      ------
      -> start persistent daemon
      OK start persistent daemon (0s · http://127.0.0.1:33029/mcp)
      ...
      Summary
      -------
        [OK] daemon         http://127.0.0.1:33029/mcp
        [OK] BIAM identity
        [OK] secrets store  ~/.config/clawtool/secrets.toml, mode 0600

In a real terminal: rounded-border box, lipgloss-styled colors
(63/83/214/203/245), bold section titles. TTY-aware throughout.

Build, vet, deadcode, full 49-package test suite, stub-e2e all
green via `bash scripts/ci.sh`.
Replace the linear huh.NewForm(groups...) flow with a stepwise
Bubble Tea program that runs in an alt-screen buffer:

- Each question is its own focused step (8 visible steps with
  step indicator "Step X of Y") instead of stacking groups in
  one continuous form.
- Pinned rounded-box header stays visible across every step;
  the host-detection pill row sits inside it.
- Side-effect run phase (bridge install / MCP claim / daemon /
  identity / secrets) executes as tea.Cmd with live progress log
  rendered inside the same alt-screen.
- On exit, the alt-screen is dropped and the operator's terminal
  scrollback is restored — telemetry thank-you + star CTA land
  in the regular scrollback.

--yes / non-TTY callers (CI, Dockerfiles, e2e harness, pipes)
fall through to the existing linear onboard() so the plain-text
contract stays stable. Detection: checks both stdin AND stdout
are *os.File and TTY.

The onboard model lives in internal/cli/onboard_tui.go;
internal/cli/onboard_tui_test.go covers step construction, run
queue ordering, stepResultMsg outcome → summary mapping, and
View() rendering for both phaseSteps and phaseRun.
Add wizard progress persistence so `clawtool onboard` can survive
mid-flow interruption (Ctrl-C, terminal close, accidental crash).

Wire:
- $XDG_CONFIG_HOME/clawtool/.onboard-progress.json (mode 0600,
  atomic write, schema-versioned).
- After every step's huh.Form completion the model snapshots
  (stepIdx, onboardState) to the progress file.
- A clean finish (run-phase queue drained → finishedMsg)
  clearOnboardProgress()s the file so the next invocation hits
  the "already onboarded" guard, not the resume prompt.

Re-entry behaviour at runOnboard():
- Progress file present → huh.Select prompt with 3 options:
  Resume (load saved state + jump to saved stepIdx) /
  Start over (clear progress, fresh wizard) /
  Cancel (exit without changes; progress file stays).
- .onboarded marker present, no progress file → huh.Select
  prompt: Re-run / Cancel.
- Neither → fresh wizard, no extra prompt.

New `--force` / `-f` flag wipes both the progress file and the
.onboarded marker before launching, bypassing both prompts.

newOnboardModelAt(state, deps, track, startStep) is the
resume-aware constructor (out-of-range startStep clamps to 0 so a
stale progress file from a fewer-steps build doesn't push the
cursor off the end).

Tests cover round-trip persistence, missing/corrupt/schema-
mismatch loads, idempotent clear, and startStep clamping.
The vertical-stack layout felt cramped — header pinned at top,
progress dots in mid-air, form below, footer at bottom, all
sharing a single column. Switch to a three-band layout that
actually uses the alt-screen real estate:

  ┌────────────── HEADER ──────────────┐
  │  ASCII logo │ tagline + attribution │
  │             │ host-detection pills  │
  ├──────┬──────────────────────────────┤
  │ side │  MAIN                         │
  │ bar  │  active step's huh form       │
  ├──────┴──────────────────────────────┤
  │  FOOTER (keybinds)                  │
  └─────────────────────────────────────┘

- Full-width rounded-border banner with chunky box-drawing
  ASCII logo (`┏━╸╻ ┏━┓╻ ╻╺┳╸┏━┓┏━┓╻`) on the left.
- Right side of banner: version + tagline,
  "from Cogitave  ·  by @bahadirarda  ·  help@cogitave.com",
  and the host-detection pill row.
- Persistent sidebar (26 cols) shows the wizard's full step
  list with state glyphs: ● completed, ◉ active, ○ pending.
  Active step gets the accent colour + bold so the eye lands
  on it instantly.
- Right pane carries the focused step: title + rule + the
  embedded huh form.

Run + done phases get their own padded layouts (run log inside
a left-bordered pane; done view stacks log + summary).

Test updates: View() snapshot assertion now checks for the
new tagline / attribution / sidebar header instead of the old
`clawtool onboard` / `Step ` strings the prior layout used.
cli.New() only wires App.Stdout + App.Stderr; App.Stdin is left
zero-value. The previous TTY gate required `a.Stdin.(*os.File)` to
succeed, which always failed in production — every `clawtool
onboard` invocation fell through to the linear path even when run
in a real terminal. Operators reported the alt-screen TUI never
showed up: they got the rounded-box header + plain huh form
instead.

Resolve stdout/stdin to *os.File at the gate, falling back to the
real os.Stdin / os.Stdout when the App's embedded streams aren't
an *os.File (the production case). Then probe isTTY on the
resolved descriptors.

Net effect: `clawtool onboard` (no pipe, real terminal) now hits
the Bubble Tea TUI as intended; `clawtool onboard --yes` and
piped invocations still go through the linear path because either
the --yes gate or one of the isTTY probes returns false.
Replace the three-band header / sidebar / footer layout with a
single horizontally + vertically centred column ≤ 72 cols wide.
Distilled from Charm reference projects (lipgloss, soft-serve,
glow, bubbletea/examples/views, huh/examples/burger):

- Drop the sidebar — research showed Charm projects favour single-
  pane wizards (huh's own examples don't render step lists).
- Inline header: 1-line monogram "┏━╸  clawtool" + version tagline
  + "from Cogitave · @bahadirarda · help@cogitave.com" + host pills
  separated by ` · `. ~4 lines total instead of the prior banner's
  full-width rounded box.
- Inline step indicator: "Step X of Y  ·  <Title>" + dot row
  (●●◉○○○○○) directly above the form card.
- Form card: single rounded border with accent colour ("212",
  pink) and Padding(1, 2) — the soft-serve / huh idiom. One frame
  per pane, not nested.
- Footer: dim text with bullet separators (the bubbletea/views
  idiom), ~36 chars, single row. No box.
- Width-cap content to onboardCardWidth (72 cols), then horizontal
  + vertical centre with lipgloss.Place. Long content (run-log +
  summary) lets the column extend; Place top-anchors when content
  height exceeds available area.
- Run phase: thin left-border accent (vertical bar) instead of
  closed box for the streaming log — long-running lists read
  better against an accent than inside a frame.
- Done phase: green-bordered ("42") celebratory card; the accent
  flips from pink to green so the operator's eye lands on the
  success state.

Test snapshot updated: "PROGRESS" sidebar header → inline
"Step 1 of" indicator. Other assertions (tagline, attribution,
support email) still pass.

CI fast-mode green; `go fmt`, `go vet`, `go build`, `go test
-race`, `deadcode -test ./...` all pass.
Drop the 72-col hard cap and the centred-card layout. The wizard
now uses the full alt-screen as a 3-band layout that adapts to the
terminal:

- Width fills viewport (m.width - 2) with a 60-col floor for
  narrow terminals. No upper cap — wide screens get wide cards.
- Height: header pinned at top, footer pinned at bottom, body
  fills the remaining rows. Card has explicit Height(bodyH - N)
  so it absorbs slack.
- Card padding bumped to (1, 3) for breathing room when the
  card is wide.
- WindowSizeMsg forwarded to the active huh form is now CARD-
  SIZED (m.width - 10, m.height - 14) rather than the full alt-
  screen. Without this, huh's description text wraps using the
  full terminal width even though it renders inside a narrower
  card, breaking the visual rhythm.

Done view (post-finish) drops the streaming run log — the
operator just watched it scroll by during phaseRun, repeating it
pushes next-steps below the viewport on smaller terminals. The
green-bordered celebration card now contains only the summary
checklist + next-steps panel — that's the punchline the operator
wants to see.

Net effect: wizard occupies the full alt-screen on any terminal
size, no scrolling required, summary screen fits cleanly.
Two problems with the previous build:

1. The huh form's option list was rendering as a single row inside
   an otherwise-tall card. Operator only saw "none / decide later"
   even though the form had 6 options. Cause: I was forwarding a
   tea.WindowSizeMsg{Width: cardW, Height: cardH-N} to the form,
   and huh's internal layout was clamping the options viewport to
   cardH minus its own title+description+footer overhead — leaving
   ~1 row for actual options.

   Fix: keep the WIDTH clamp (so description text wraps at the
   card boundary, not the alt-screen width), but pass the FULL
   alt-screen height for the form. huh now renders all options
   naturally; the surrounding card absorbs whatever height the
   form needs.

2. Wizard hugged the alt-screen top edge with no breathing room.
   Top padding bumped from 0 to 2 rows on the outer container
   (Padding(2, 1, 1, 1)). Body height calculation reduced by 5
   rows (was 2) to account for the new top + bottom padding.

Net effect: form options visible, description wraps inside card
boundary, card stretches to fill body, top breathing room makes
the wizard feel less cramped.
Two fixes for the squashed-options bug (operator could only see
"none / decide later" inside an otherwise tall card):

1. Stop forwarding tea.WindowSizeMsg to the active huh form.
   huh's WindowSizeMsg handler clamps its option-list viewport
   based on the height we pass — sending a small height squashed
   to one row, sending the full alt-screen overflowed the card
   and our outer Height() truncated from the top, leaving only
   the cursor row visible. Letting huh use its default natural-
   size rendering (no WindowSizeMsg) makes it render every
   option.

2. Drop Height(cardH) from the form's surrounding card style.
   lipgloss.Style.Height clamps content; when the form was taller
   than cardH (any non-trivial form is), the clamp truncated and
   we lost rows. The card now auto-sizes to the form's natural
   rendered height. The body container's Height(bodyH) absorbs
   slack underneath so the footer still pins to the alt-screen
   bottom.

Trade-off: description text inside the form now wraps at huh's
default width (the full alt-screen, since we're not forwarding
the WindowSizeMsg). The wrap may extend beyond the card's
visible width on narrow terminals. Acceptable for now — wrap-at-
card-boundary requires matching huh's internal width signal,
which interacts with the option-list height the way described
above. Will revisit if the wrap becomes a real problem.
The previous design wrapped the active huh form (which already
renders its own left-bordered focused-field accent) inside a
second rounded-border card. The result was a "card-in-a-card"
effect: outer pink rounded box, inner left-accent frame, content
squeezed into the inner inner inner zone, both the form and the
visible width feeling cramped on the operator's terminal.

Drop the outer rounded-border wrapper from all three phases:

- phaseSteps (renderStep): just indicator + dots + form.View()
  with a left padding. huh's per-field decoration is the only
  frame.
- phaseRun (renderRunBody): just indicator + run-log. The log's
  per-line glyphs (✓ / ✗ / · / →) carry the visual structure;
  no surrounding box.
- phaseDone (renderDoneBody): just indicator + summary. Summary
  rows already have outcome glyphs.

Net: one visual container per phase (the huh form's own frame
during steps; just text rhythm during run + done). Wider usable
content area, cleaner reading.
The previous "drop the card" attempt was overcorrecting — operator
liked the outer pink rounded card (it's the wizard's identity).
The card-in-a-card squeeze was a different problem: lipgloss's
Height() clamp truncated the form, AND huh's option list squashed
to a single row because we never told it how much vertical space
it had.

Final wiring:

1. WindowSizeMsg → forward to active form with Width=cardW (so
   description text wraps inside the card boundary) and Height=
   9999. huh clamps option-list viewport to min(neededHeight,
   msg.Height); a 9999 ceiling never clamps so the form renders
   every option at natural size.

2. Outer rounded card returns: Border(RoundedBorder()) +
   BorderForeground("212") + Padding(1, 3) + Width(cardW). NO
   Height clamp on the card — it auto-grows to whatever huh's
   natural rendered height is.

3. Body container has Height(bodyH); slack rows pad below the
   card so the footer still pins to the alt-screen bottom.

Net behaviour: pink rounded card holds the form, every option
visible, footer pinned bottom, breathing room above + below.
Card width adapts responsively to terminal width via cardW =
m.width - 6.
…firm

Stop fighting huh.Form embedded inside our parent tea.Program.
After web research it's clear huh embedding has a strict contract
(WithHeight() on the Form, .Height() on every Select, .Options()
strictly before .Value(), no outer height-clamped wrapper) that
fights every layout we tried. Symptoms we kept rediscovering:

- huh's Select rendered ONLY the cursor row (operator saw just
  "none / decide later") because internal viewport sizing falls
  back to the cursor's minHeight=1 when our outer lipgloss style
  also clamps height.
- WindowSizeMsg.Height is ignored by huh's per-field viewport;
  only WithHeight() / .Height() propagate.
- Two viewports nested (huh's Select.viewport.Model + our
  rounded-card frame) produce unpredictable scroll math.

Replace huh with three minimal custom widgets in onboard_widgets.go:

- selectWidget — single-choice, ↑/↓ + enter, ~50 LOC. Renders every
  option every frame, no internal viewport.
- multiSelectWidget — checklist, ↑/↓ + space toggle + a all/none +
  enter submit. Renders every option, no viewport. Operator-
  requested space-select keybind explicitly supported.
- confirmWidget — yes/no, ←/→ or h/l toggle, y / n quick pick,
  enter submit.

Each exposes a stepWidget interface (Update / View / Done /
Keybinds). The wizard's outer model dispatches msgs to the active
step's widget and renders widget.View() inside the rounded card.
Footer pulls the active widget's Keybinds() so the hint line
reflects the current step's actual keys (Select shows different
keys than MultiSelect).

Footer is now OUTSIDE the card per operator request: the card
contains only the widget; the dim-bullet keybind row sits below
the card at the alt-screen footer band.

State capture pattern flipped: instead of huh's .Value(&ptr)
two-way binding, the apply hook on each step calls widget.Value()
or widget.Values() once at advance time and writes into
onboardState. Cleaner control flow, no shared-pointer races
during render.

Net deletion of huh embed plumbing: ~80 LOC. Net addition:
~250 LOC of widgets + adapters. Worth it for the bug class
elimination.
…tent

The card was auto-sizing to each widget's natural height, so the
wizard's frame jiggled between steps (a tall Select shrinking to
a 4-row Confirm). Pin the rounded-border card to a constant
70w × 18h silhouette and centre the active widget inside it via
lipgloss.Place(innerW, innerH, Center, Center, view).

Net: every step renders the same rectangle in the same screen
position, with the widget vertically + horizontally centred
inside. A 4-row Confirm now sits in the middle of the same card a
12-row Select previously filled, so the visual rhythm of
advancing through the wizard feels intentional instead of
elastic. The card auto-clamps narrower on terminals < 70 cols,
floor 50 cols.
Replace PaddingLeft(2) with Align(lipgloss.Center) on the header /
step body / run body / done body / footer wrappers, and switch
JoinVertical from lipgloss.Left to lipgloss.Center in the body
phases. The wizard column now sits in the middle of the alt-screen
instead of hugging the left edge.

Visual axis: header centred → step indicator centred → progress
dots centred → fixed-size pink card centred → footer hint centred.
On wide terminals the empty space distributes evenly to either
side; on narrow terminals the card auto-clamps narrower (already
handled in renderStep) so nothing wraps.
Two operator-driven polishes:

1. Card width is now responsive: computeCardWidth(viewport) =
   viewport - 12, soft-capped at 120 cols + soft floor at 60.
   On a 200-col terminal the card now fills 120 cols (was pinned
   at 70); on an 80-col terminal it fills 68. The wide-screen
   case feels purposeful instead of "tiny dialog floating in a
   sea of empty space."

2. Header redesign: 3-line ASCII brand mark on the left
   (clawtoolLogo restored), stacked metadata column on the right
   (bold tagline `first-run setup · v0.22.52`, dim
   `from Cogitave · by @bahadirarda`, dim `help@cogitave.com`),
   filled-background pill row beneath. Detected hosts render as
   bright accent pills (`Background(212).Foreground(230).Bold`);
   missing hosts as dim padded text. The eye finds the bright
   pills instantly without scanning labels.

Test snapshot relaxed: header tagline check now matches
"first-run setup" prefix instead of "first-run setup wizard"
(the redesigned tagline drops the trailing "wizard" since the
banner already establishes that's what we're in).
Three operator-driven polishes:

1. ASCII logo redesign — swap from box-drawing "Future" font
   (3 rows tall) to Pagga-style chunky pixel font (2 rows tall):
   `█▀▀ █   ▄▀█ █ █ ▀█▀ █▀█ █▀█ █`
   `█▄▄ █▄▄ █▀█ ▀▄▀  █  █▄█ █▄█ █▄▄`
   Different glyph palette gives the brand mark more visual
   weight while staying compact (Unicode block elements render
   on every modern terminal).

2. Animation — new tickMsg + tickEvery() loop firing every
   350ms. Update increments m.frame on each tick; renderStep
   uses (frame % 4) to cycle the active ◉ progress dot through
   four progressively brighter pinks (212 → 213 → 218 → 219).
   Subtle pulse pulls the operator's eye to "where am I now?"
   without being distracting. Init() seeds the first tick.

3. Spacing — added blank rows between header and body, and
   between progress dots and the card. The "Step X of Y"
   indicator now has visible breathing room above the card it
   describes.
Two operator-driven polishes:

1. The "W" in clawtoolLogo was rendering as a single-V silhouette
   because Pagga's W was using `█ █ / ▀▄▀` (3 cols, 1 peak).
   Switch to a proper double-peak W: `█ █ █ / █▄█▄█` (5 cols).
   The brand mark now reads as "claW-tool" not "clav-tool".

2. The previous animation (4-tone pulse on the active progress
   dot at 350ms cadence) was too subtle to register. Replace +
   complement with a Braille spinner glyph at the start of the
   step indicator: `⠋  Step 1 of 8  ·  Primary CLI`. The
   spinner cycles through the standard 10-frame Braille rotation
   (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏) every 120ms — a full revolution in ~1.2s, the
   instantly-recognizable "I'm alive" TUI signal.

   Tick interval bumped from 350ms to 120ms so the spinner rotates
   smoothly. The dot pulse still runs but is now secondary — the
   spinner carries the visible "live" weight.
Replace the Braille spinner on the step indicator with a gradient
shimmer running across the clawtool ASCII brand mark itself —
that's where the operator naturally looks for "is this thing
alive?" feedback.

Implementation: renderShimmerLogo() walks each glyph row and tints
each non-space rune by its column distance from the current sweep
position. Four colour stops form the band:

  distance 0  → 225  (almost white, brightest centre)
  distance ±1 → 219  (bright pink)
  distance ±2 → 213  (medium pink)
  elsewhere   → 212  (base accent)

The sweep position cycles from -2 to maxLen+2 (so the band enters
and leaves off-screen) plus an 8-frame quiet pause after each
sweep so the logo isn't constantly shimmering — the eye gets a
beat to rest.

Step indicator is back to plain text. The wizard's animation
budget is 100% on the brand mark now: a soft "shine" passing
through the logo every ~3-4 seconds.
Logo is 2 rows tall, meta column is 3 rows (tagline / credit /
email). Joining them with `lipgloss.JoinHorizontal(lipgloss.Top,
...)` left the logo stuck to the top while the meta lines
extended below — the brand row read as visually unbalanced.

Switch to `JoinHorizontal(lipgloss.Center, ...)` so the shorter
logo is vertically centred against the taller meta column. Also
drop the leading blank that previously padded metaCol down to
match the (then top-anchored) logo — no longer needed and was
making the meta lines drift.

Net: clawtool brand mark now sits in the middle band of the meta
column's 3-line stack. The header reads as a balanced pair of
elements rather than a top-anchored block with a tail.
…cally

Two layout fixes:

1. Brand row: switch from `JoinHorizontal(Center)` to
   `JoinHorizontal(Bottom)`. Top stuck the logo to the top edge;
   center drifted it too far down (logo straddled tagline+credit
   asymmetrically). Bottom-align lines the 2-row logo up with the
   bottom 2 rows of the 3-row metaCol (credit + email), letting
   the tagline float above as a kicker. Visually balanced.

2. Body region: add `AlignVertical(lipgloss.Center)` to the body
   container in renderStep / renderRunBody / renderDoneBody.
   Previously the body filled to bodyH but content stuck to the
   top, leaving a big empty zone above the footer. Vertical-
   centring distributes the leftover slack evenly above and below
   the wizard's content (indicator + dots + card), so the wizard
   sits in the middle of the body region instead of cramming
   against the header band.
Onboard now degrades gracefully on narrow viewports (<70 cols)
so it stays usable on mobile clients, tmux split panes, and
docked terminal windows.

Breakpoint: onboardCompactWidth = 70 cols. Below that:

- Header switches to renderCompactHeader: drops the chunky 32-col
  ASCII logo and the labelled host-detection pill row. Renders
  one centred line — `clawtool v0.22.58 · first-run setup` — plus
  a glyph-only detection row (● ● ● ● ○) so the operator still
  sees what was found without spending vertical space on labels.
- Footer switches to compactKeybinds: shortens `↑/↓ select  ·
  enter confirm  ·  ctrl-c quit` to `↑↓ ↵  ·  ^c`. MultiSelect
  similarly compresses (`↑↓ ␣ a ↵`). The keys stay legible; the
  prose drops.
- run + done phases shrink their footer copy too: `running 3/8
  ·  ctrl-c quit` → `3/8`; `press any key to exit` → `any key`.

Card width floor lowered from 60 to 40 cols (computeCardWidth)
so the wizard renders inside even very narrow panes (40-col
phone terminals). Card padding cuts in too — the form content
stays inside the rounded frame at every supported size.
Signed-off-by: SafeSkill Scanner <mk@oya.ai>
@bahadirarda
Copy link
Copy Markdown
Contributor

Hi @OyaAIProd — thanks for running the scan against clawtool! Quick triage:

Stale base. This PR's base ref 57a4010 is far behind current main (we've shipped ~135 commits since), which is why the diff reports 348 changed files / +58k -1k LOC. The actual delta is just the SafeSkill changes (.clawtool/rules.toml + the badge content); the rest is the project's normal forward motion appearing as 'added' from your fork's perspective. Please rebase safeskill-scan-1777514943960 onto current main before this can be reviewed meaningfully.

Findings triage (8 high). Skimming the report:

  • 🟠 <prompt> in commands/clawtool-dashboard.md:33 and docs/portals.md:35 — these are XML example tags showing tool-call shapes in markdown documentation, not prompt-injection vectors. Likely false positive.
  • 🟠 read \~/.aws`indocs/sandbox.md:17` — this is the sandbox feature's own documentation explaining what host-path access patterns look like. The doc describes the threat model; it isn't an exfiltration vector. False positive.
  • 🟠 <prompt> in docs/sandbox.md:55 — same as above; XML example in docs.
  • 🟠 'without limit' in internal/setup/recipes/governance/assets/Apache-2.0.txt:147 — that's the literal Apache 2.0 license text ("without limitation, the rights to use…"). The scanner is matching license boilerplate. Definite false positive.

Happy to consider a tightened scan profile — does SafeSkill expose a .safeskill/ignore (or similar) for path-scoped exclusions? The project ships its own .clawtool/rules.toml engine for similar shape gates; we'd be glad to interop if there's a documented scan-rule format.

What I'd want to see before approving:

  1. Rebase onto current main so the diff shows just the badge / rules additions.
  2. The .clawtool/rules.toml additions reviewed in isolation — what rules is the PR proposing? The current main has 7+ rules already; we should confirm the new ones don't shadow existing ones.
  3. Findings triage updated for the false positives noted above (or pointer to the SafeSkill ignore file).

Marking the internal triage item as done — back to you for the rebase + scope clarification. /cc @bahadirarda96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants