Add SafeSkill security badge (50/100 — Use with Caution)#12
Add SafeSkill security badge (50/100 — Use with Caution)#12OyaAIProd wants to merge 267 commits into
Conversation
…ateway
Phase 2 of the relay surface ([[ADR-014]]). Outside-Claude-Code prompt
dispatch from external orchestrators / CI / Slack / etc. now hits an
authenticated HTTP edge.
Wire summary:
- internal/server/http.go — net/http listener with bearer-token auth
middleware (constant-time compare so token-validity timing doesn't
leak the prefix). Three endpoints:
GET /v1/health → {status, version}
GET /v1/agents [?status=callable] → registry snapshot
POST /v1/send_message → streams Supervisor.Send verbatim
(application/x-ndjson, chunked + flushed)
Default-deny on unrecognised paths. Body capped at 1 MB. TLS
termination delegated to the operator's reverse proxy.
- internal/server/server.go — extracted buildMCPServer so HTTP and
stdio share the same MCP boot path (config, secrets, sources,
search index, every tool registration).
- cmd/clawtool/main.go — extended subcommand with --listen,
--token-file, --mcp-http flags (mcp-go's StreamableHTTPServer
integration deferred to a polish patch). New 'serve init-token'
subcommand: 32-byte hex token via crypto/rand, written 0600,
printed to stdout for shell capture.
- internal/cli/cli.go — usage block updated.
- internal/server/http_test.go — 16 httptest-based unit tests
(auth pass/fail, every endpoint, token-file edge cases,
init-token round-trip, listen/token-file required guards).
- test/e2e/run.sh — 8 new e2e assertions: real clawtool serve in
background, curl every endpoint, validates 401 on missing/wrong
token, 400 on missing prompt, 404 on unknown path, 0600 on token
file, 64-char hex on init-token output.
Wiki: ADR-014 status updated to 'Phase 1 + Phase 2 shipped'; new
'Outcome — what shipped' retrospective section; log entry covering
both phases; hot.md updated.
Test totals: 196 Go unit + 62 e2e — all race-clean, gofmt clean,
go vet clean.
Carry-over to Phase 3 (Docker) and Phase 4 (dispatch policies).
…recipe
Closes the third leg of ADR-014's rollout (the first two — Phase 1
bridge/send/supervisor and Phase 2 HTTP gateway — landed earlier the
same day).
Wire summary:
- docker/Dockerfile.relay — multi-stage build. Builder is golang:1.25;
runtime is debian:bookworm-slim with the four upstream agent CLIs
pre-installed (@openai/codex + @google/gemini-cli + @anthropic-ai/
claude-code via npm; opencode via the upstream installer) plus
ripgrep/pandoc/poppler so clawtool's own Read/Grep stay
exercisable. ENTRYPOINT refuses to start without a mounted
/etc/clawtool/listener-token file. Exposes :8080.
- docker/compose.relay.yml — reference compose with optional caddy
reverse proxy that handles automatic ACME. Per-CLI auth env vars
wired through ${VAR:-} so an unset key leaves the family
non-callable rather than crashing. State volume keeps codex/claude
session IDs across restarts. Caddyfile snippet inline as comment.
- internal/setup/recipes/runtime/clawtool_relay.go — new recipe
under the runtime category. Drops compose.relay.yml into the
target repo, marker-stamped (managed-by: clawtool) so re-applies
are idempotent and unmanaged files refuse overwrite without
force=true. Stability: Beta (promote after 1+ production soak).
- internal/setup/recipes/runtime/assets/clawtool-relay.compose.yml —
embedded asset, scrubbed-down compose template suitable to ship
inside an arbitrary repo (image points at ghcr.io/cogitave/
clawtool-relay:latest; user can swap for local build).
- internal/setup/recipes/runtime/clawtool_relay_test.go — 6 tests
(Registered, DetectAbsent, ApplyDropsCompose, VerifyAfterApply,
RefusesUnmanagedOverwrite, ForcedOverwriteSucceeds).
- test/e2e/run.sh — RecipeList enumeration bumped to include the
three bridge recipes + clawtool-relay (16 recipes total, all 9
categories populated).
Wiki: ADR-014 status updated to 'Phase 1 + Phase 2 + Phase 3 shipped',
new Phase 3 retrospective subsection, log entry, hot.md, decisions
index all reflect the closure.
Test totals: 202 Go unit + 62 e2e — race-clean, gofmt clean,
go vet clean.
Phase 4 (dispatch policies — round-robin / failover / tag-routed /
affinity) carries forward.
After Phase 2's HTTP-gateway tests fired, the macOS CI run printed 'all e2e tests passed' but exited with code 1. Cause: under 'set -e', the EXIT trap re-ran 'kill $HTTP_PID' against a process that had already been killed earlier in the test, returning non-zero and propagating out as the script's exit status. Append '|| true' to both kill and rm in the trap so the cleanup stays idempotent regardless of state.
…robin, failover, tag-routed)
Closes the fourth and final originally-spec'd phase of ADR-014, all
four landing in the same day.
Wire summary:
- internal/agents/policy.go — Policy interface as the dispatch seam.
pickPolicy resolves the configured mode (or per-call override) into
the right impl. The supervisor's existing Send call site is the
only consumer; CLI / MCP / HTTP all hit the same code path.
- explicitPolicy (Phase 1 default) — pinned instance, no fallback.
- roundRobinPolicy — rotates across same-family callable instances
via atomic.Uint64 per family. Pinned instance always wins over
rotation; sole callable falls through to explicit dispatch.
- failoverPolicy — primary + ordered chain from AgentConfig.failover_to,
filtered to callable. Supervisor.dispatch walks the chain on
Transport.Send error, but only before any byte has streamed —
retry mid-stream would duplicate partial output to the caller.
- tagRoutedPolicy — ignores --agent, matches case-insensitively
against Tags, sorts deterministically when multiple match.
- internal/config/config.go — three new fields: AgentConfig.Tags +
AgentConfig.FailoverTo + Dispatch struct (Mode).
- internal/agents/supervisor.go — Agent struct gains Tags + FailoverTo
alongside the existing fields. Send branches:
instance == '' && tag != '' -> tagRoutedPolicy direct
instance == '' && tag == '' -> Phase 1 precedence chain (Resolve)
instance != '' -> configured policy (tag overrides)
- internal/cli/send.go — new --tag flag.
- internal/tools/core/agents_tool.go — new tag parameter on
SendMessage MCP tool. Pre-Resolve only when caller pinned an
instance and didn't pass a tag (so tag-only doesn't fail noisily).
- internal/server/http.go — sendMessageRequest gains a top-level Tag
field as sugar for opts.tag.
Tests: 14 new unit tests in policy_test.go (each policy's pick logic
+ supervisor failover cascade + tag dispatch end-to-end), 3 new CLI
flag-parse tests, 2 new e2e assertions (MCP unknown-tag error + HTTP
top-level tag field). Total: 216 Go unit + 64 e2e — race-clean,
gofmt clean, go vet clean.
Carry-over to v0.13.x: affinity policy, optional --mcp-http
StreamableHTTPServer integration, POST /v1/recipe/apply.
…nsport, plus claude/gemini transport fixes from live smoke
Two carry-over items from ADR-014's outcome list, plus two transport
bugs that surfaced during a 4-CLI live smoke against codex / opencode /
gemini / claude on the user's logged-in workstation.
HTTP gateway expansion:
- internal/server/http.go — new GET /v1/recipes (with optional
?category=, ?repo= for Detect status) and POST /v1/recipe/apply.
Mirrors the existing MCP RecipeApply / RecipeList tools so external
orchestrators can install bridges + project-setup recipes from a
remote endpoint without spawning a clawtool process. 'repo' is
required in the body — HTTP callers don't have a terminal cwd, so
refusing rather than silently mutating $HOME is the right default.
- internal/server/http.go — when --mcp-http is set, mounts mcp-go's
StreamableHTTPServer at /mcp (and /mcp/) wrapped by the same bearer
auth middleware. The full clawtool MCP toolset becomes accessible
to remote orchestrators with no additional client library.
- internal/server/http_test.go — 8 new unit tests covering each new
handler + auth gate.
- test/e2e/run.sh — 5 new e2e assertions (recipe enumeration, recipe
apply happy path, recipe apply missing-repo refusal, /mcp 401 on
unauth, /mcp 200 on auth'd initialize).
Live-smoke transport fixes:
- internal/agents/gemini_transport.go — Gemini CLI exits with code 55
in any directory it hasn't been invited to trust ('trusted-folder'
IDE-style safeguard); transport now passes --skip-trust because
the safeguard is redundant when an operator explicitly wrote
'clawtool send' themselves. Plus default --output-format text so
Gemini doesn't silently swallow output in non-TTY contexts.
- internal/agents/claude_transport.go — removed --bare. Older drafts
added it expecting 'no chrome' behaviour but on the current
Claude Code build that flag puts the CLI into a path that ignores
the existing auth session and reports 'Not logged in'. Plain -p
honours the session and works on a logged-in host. Comment in
the transport captures the rationale.
Live smoke results (4-CLI fan-out from 'clawtool send --agent <X>'):
opencode → 'pong' (~11s, free tier)
codex → JSON-RPC frames carrying agent_message.text='pong' (~5s)
gemini → 'pong' after the --skip-trust + --output-format fix
claude → 'pong' after the --bare removal
Test totals: 228 Go unit + 76 e2e — race-clean, gofmt clean,
go vet clean. CI green on previous push.
…licitly
Two transport bugs uncovered while dogfooding clawtool send for the
v0.14 feature ideation fan-out:
- codex_transport.go — codex exec refuses to run in any directory
it hasn't been invited to trust ('Not inside a trusted directory'
safeguard). Same IDE-style guard gemini ships and the same
reasoning applies: in the headless dispatch path the operator has
explicitly chosen to run 'clawtool send', so the guard is
redundant. Pass --skip-git-repo-check by default; operators who
need it can opt back in via extra_args.
- transport.go — startStreamingExec now sets cmd.Stdin =
bytes.NewReader(nil) explicitly. Some upstream CLIs (codex exec
in particular) read from stdin to pick up *additional* prompt
input and will block forever if stdin is left attached or open.
A pre-closed reader signals 'no extra input' cleanly.
These fixes were what unblocked dogfood fan-out across all four
families. Live smoke now: claude, codex, opencode, gemini all
return 'pong' end-to-end via 'clawtool send --agent <X>'.
Three of the six v0.14 features designed by the multi-CLI fan-out
on 2026-04-26 ship together. ROI ranks 1-3 per the team-research
roadmap. T3 (mem0), T5 (worktree), T6 (semsearch) carry to v0.14.x.
T1 — OpenTelemetry observability (internal/observability/):
- Observer struct: Init / StartSpan / RecordError / SetAttributes /
Shutdown / Enabled. Disabled = pointer-cheap no-op (StartSpan
returns input ctx + no-op end).
- Wraps go.opentelemetry.io/otel + sdk/trace + OTLP/HTTP exporter.
Langfuse-compatible: when LangfusePublic+Secret keys are set,
exporter sends Authorization: Basic to the Langfuse host.
- Wired into Supervisor.dispatch as 'agents.Supervisor.dispatch'
span; each Transport.Send call inside the failover chain opens
an 'agents.Transport.Send' child. observedReadCloser ends the
child span when the streamed response closes (no leaked spans).
- Process-wide via agents.SetGlobalObserver, picked up by every
NewSupervisor automatically. Server boot reads
config.ObservabilityConfig and wires it once.
- 5 unit tests: disabled no-op, enabled span lifecycle, bad
endpoint tolerance, idempotent Init, nil-receiver safety.
T2 — Auto-lint guardrails (internal/lint/):
- Runner interface: Lint(ctx, path) returns []Finding (LineNumber,
Column, Severity, Tool, Message). Per-language adapters: Go via
golangci-lint --out-format json; JS/TS via eslint --format json;
Python via ruff check --output-format json.
- Hook in executeEdit and executeWrite immediately after a
successful atomic write. Findings ride back in
EditResult.LintFindings / WriteResult.LintFindings — same
response, no async queue. Calling agent self-corrects in the
next turn.
- Graceful skip when the linter binary isn't on PATH (zero noise
in non-Go repos that lack ruff, etc.).
- Opt-out via [auto_lint] enabled = false in config.toml; default
on (nil pointer means default-on).
- 8 unit tests: extension routing, missing binary skip, parse for
each adapter, IsEnabled defaults, disabled runner.
T4 — Verify MCP tool (internal/tools/core/verify.go):
- VerifyResult { Repo, Checks[], Overall } where Overall is 'pass'
iff every check passes; one fail flips the whole result.
- Probe order: make test → pnpm test → npm test → go test ./... →
pytest → cargo test → just test. First match wins. Operator
pins via target. Unknown target errors clearly.
- Reuses applyProcessGroup (shared with Bash) so timeouts SIGKILL
the whole process group cleanly.
- Buffered single payload, not stream — caller wants the
pass/fail summary, not the live log fire hose. Bash already
streams when streaming is what's wanted.
- DetailsLogExcerpt is the last 4 KiB of combined stdout+stderr,
prefixed with '…' on truncation. Enough for an agent to read
the failing assertion without blowing the response budget.
- Registered in server.go alongside the other core tools;
ToolSearch entry added.
- 8 unit tests: detect Go module, pnpm beats npm, target override,
unknown target errors, no runner detected, happy path, failing
test surfaces, tail truncation.
- 2 e2e assertions: Verify happy path on a synthesised Go test
package + runner-name passthrough.
Wiring:
- internal/config/config.go — new ObservabilityConfig + AutoLintConfig.
- internal/server/server.go — boot wires observability.New +
agents.SetGlobalObserver + core.SetAutoLintEnabled.
Test totals after this turn: 244 Go unit + 78 e2e — race-clean,
gofmt clean, go vet clean.
Carry-over to v0.14.x: T3 mem0 recipe, T5 git-worktree isolation,
T6 semantic search MCP tool. Specs already in the team-research
roadmap source note.
The remaining three v0.14 features from the team-research roadmap.
With T1+T2+T4 already on main this completes the six tasks the
multi-CLI fan-out designed on 2026-04-26.
T3 — mem0 recipe under knowledge category:
- internal/setup/recipes/knowledge/mem0.go — recipe Apply drops a
marker-stamped .clawtool/mem0.toml that declares the endpoint
(cloud or self-hosted) + namespace and documents the one-time
'claude mcp add mem0 -- npx -y mcp-remote …' the user runs to
wire the official mem0.ai cloud MCP server into Claude Code.
- Coexists with brain (claude-obsidian); brain stays the
single-machine vault, mem0 adds cross-machine cross-agent recall
via the official cloud MCP. Self-hosted Docker supported by
pointing endpoint at the local URL.
- 6 unit tests + ResetSemanticSearchCache helper + custom-endpoint
/ namespace passthrough + force overwrite path.
T5 — git-worktree isolation:
- internal/agents/worktree/ — Manager.Create reserves
~/.cache/clawtool/worktrees/{taskID} under a per-repo advisory
flock (gofrs/flock wrap), shells out 'git worktree add --detach',
stamps a marker JSON (TaskID, RepoRoot, BaseRef, Agent, PID,
CreatedAt) and returns workdir + cleanup func. Cleanup is
idempotent.
- 'clawtool send --isolated' creates a worktree per dispatch and
sets opts['cwd'] so the upstream CLI stages/commits there
instead of the operator's working tree.
- '--keep-on-error' preserves the worktree on dispatch failure for
inspection via 'clawtool worktree show <taskID>'.
- 'clawtool worktree gc [--min-age 24h]' reaps orphans (dead PID +
age cutoff). Unix uses syscall.Signal(0) for liveness; windows
build tag skips reaping (conservative).
- 5 unit tests: create+cleanup, parallel-safe, GC reaps orphan,
GC skips live PIDs, repoLockKey deterministic + path-distinct.
T6 — SemanticSearch MCP tool:
- internal/index/index.go — Store wraps chromem-go (MIT, pure Go,
in-memory + persistent vector store). Build walks the repo,
chunks text files at 80-line boundaries, embeds via the
configured provider, and adds each chunk to the collection.
- Embedding: OpenAI text-embedding-3-small default (requires
OPENAI_API_KEY); Ollama nomic-embed-text override via
CLAWTOOL_EMBED_PROVIDER=ollama (uses OLLAMA_HOST or
http://localhost:11434).
- internal/tools/core/semsearch.go — MCP tool. Index built lazily
on first call per repo. Coexists with Grep: ToolSearch routing
description carries the 'use SemanticSearch for intent /
Grep for literal' heuristic.
- 7 unit tests: chunking, ignore patterns, NUL detection,
collectionTag determinism, Search-before-Build error,
missing-key error, defaults applied.
Wiring: server.go registers Verify + SemanticSearch. cli.go
top-level usage block carries the new send / worktree subcommands.
Test totals: 257 Go unit + 78 e2e — race-clean, gofmt clean,
go vet clean. New deps: github.com/gofrs/flock (Apache-2.0),
github.com/philippgille/chromem-go (MIT). All v0.14 features
from the team-research roadmap now in main.
macOS resolves t.TempDir()-style /var paths to /private/var when git rev-parse --show-toplevel runs (filesystem-level symlink). Linux keeps the original path. The TestCreate_AndCleanup marker check compared the strings directly and failed on Darwin only. Resolve both sides through filepath.EvalSymlinks before comparing.
… SQLite store) + 3 polish fixes The 'sana niye yanıt dönmüyorlar kanka' moment shaped this. Ship the async dispatch surface that closes the loop. BIAM Phase 1 (ADR-015) — internal/agents/biam/: - identity.go — per-instance Ed25519 keypair at ~/.config/clawtool/identity.ed25519 (mode 0600). LoadOrCreateIdentity generates on first launch; round-trips deterministically. - envelope.go — Envelope struct with the locked v=biam-v1 wire shape (task_id, message_id, parent_id, correlation_id, from/to/reply_to, kind, body, hop_count, trace[], created_at, ttl_seconds, idempotency_key, signature). Sign/Verify use the canonical-JSON form with Signature stripped. HasCycle / Hop enforce the loop guard. - store.go — SQLite (modernc.org/sqlite, pure Go, no CGO) with WAL. Tables: tasks, messages, dedupe_keys, peers. Idempotency-key LRU silently drops duplicates. WaitForTerminal blocks until done / failed / cancelled / expired. - runner.go — Runner.Submit returns task_id immediately, drains the upstream stream into a kind=result envelope (capped 4 MiB), flips the task on completion. Failed Send → kind=error + TaskFailed. Supervisor surface — internal/agents/supervisor.go: - New SubmitAsync(ctx, instance, prompt, opts) (task_id, error) on the Supervisor interface. Wires through globalBiamRunner registered via SetGlobalBiamRunner during server boot or CLI bootstrap. MCP tools — internal/tools/core/: - SendMessage gains a 'bidi' bool argument. When true, returns task_id immediately + persists the dispatch into BIAM. Pair with the new TaskGet / TaskWait / TaskList tools. - tasks_tool.go — TaskGet (snapshot + envelope timeline), TaskWait (block until terminal, deadline-bounded), TaskList (recent N tasks). All read-only against the BIAM store. ToolSearch entries surface the routing hints. CLI — internal/cli/: - send.go gains --async. Bootstraps the BIAM runner per-process via ensureBIAMRunner(), submits, prints task_id, then blocks on WaitForTerminal so the goroutine drain finishes before the short- lived CLI process exits. - task.go — clawtool task list / get <id> / wait <id> [--timeout DUR]. - biam_bootstrap.go — sync.Once guard so the CLI process initialises identity + store once, no matter how many subcommands run. Server boot — internal/server/server.go: - Init BIAM identity + store on startup; register the runner globally; expose the store to MCP Task* tools via core.SetBiamStore. Polish (3 high-severity fixes from yesterday's polish-worker pass): - supervisor.go:113 — Agents() now surfaces config load errors instead of swallowing them silently. - transport.go:122 — streamingProcess.Close() returns ExitError when the upstream CLI exits non-zero, so callers see assertion failures. - worktree.go:152 — cleanup func uses sync.Once instead of a plain bool, matching the multi-goroutine safety contract the interface promises. Live smoke verified: 'clawtool send --async --agent codex "..." | xargs clawtool task wait --timeout 60s' returns the JSON envelope chain end-to-end (prompt + signed reply with parent_id link). Test totals: 271 Go unit + 78 e2e — race-clean, gofmt clean, go vet clean. New deps: modernc.org/sqlite (BSD-3-Clause, pure-Go), github.com/google/uuid (BSD-3-Clause). Carry-over to v0.15.x: BIAM Phase 2 (NATS JetStream federation), Phase 4 of the v0.15 roadmap (T1 R3 codex-recommended Phoenix / LangSmith / Weave OTel presets, Podman sandbox option, age-encrypted secrets), R4 gemini's Zed extension + go-selfupdate + huh-based plan builder. Specs in /tmp/claw_v015/ research outputs.
…mand (F2)
Two carry-over features from the v0.15 research roadmap.
F1 — per-instance dispatch rate limit (codex's R3 pick):
- internal/agents/limiter.go — wraps golang.org/x/time/rate per ADR-007.
Per-instance token buckets + per-instance concurrency semaphore;
shared across CLI / MCP / HTTP because all three hit Supervisor.dispatch.
Rate string forms: '30/m', '5/s', '1000/h', '60/1m'. Empty disables.
- internal/agents/supervisor.go — limiter wired into the dispatch
call site. The release func runs when the ReadCloser closes, so
long-running streams hold their concurrency slot for their full
duration. Limiter is built lazily at NewSupervisor time so config
edits land on the next supervisor construction.
- internal/config/config.go — DispatchLimits { Rate, Burst, MaxConcurrent }.
Per the operator's quota-aware multi-account use case: 'Anthropic
hit 30/m, slow it down without crashing the dispatch.'
- 5 unit tests: rate-string parsing across forms, disabled is no-op,
bucket actually blocks, per-instance independence, concurrency cap,
ctx-cancellation surfaces an error.
F2 — clawtool upgrade subcommand (gemini's R4 pick):
- internal/cli/upgrade.go — wraps creativeprojects/go-selfupdate
(Apache-2.0). Detects the latest cogitave/clawtool release, reports
current vs latest, and atomically swaps the binary. '--check' just
reports without installing. ErrPermission triggers a clear sudo
hint instead of the raw permission error.
- Pulls runtime/debug Build info for the current version so
release-please tags compare correctly without a -ldflags injection.
Test totals: 27 packages green / 0 fail / race-clean. New deps:
golang.org/x/time/rate (BSD-3-Clause), github.com/creativeprojects/
go-selfupdate (Apache-2.0).
Carry-over: selfupdate's asset-name pattern matcher needs the
GoReleaser archive template wired through 'Filters' so DetectLatest
hits the right tarball — for now the command surfaces a clean
'no release found' fall-back. Production unblocking once the
GoReleaser tag is regenerated with explicit asset hints.
…rktree, upgrade Codex (W1 async BIAM worker) reviewed the v0.14/v0.15 commits against the live README and produced an applyable patch. Synthesis + hand-touched final wording. - New 'What's new in v0.14 / v0.15' hero section: 6 bullets covering BIAM, the new MCP tools, dispatch policies, --isolated worktrees, mem0, clawtool upgrade, OTel observability, auto-lint. - 'How to use BIAM async dispatch' mini-section with a CLI example and a one-line Claude-Code-side note. - Top-level Usage block grew the bridge / agent / send --async / send --isolated / task / worktree / upgrade subcommands. W2's onboarding-TUI design landed at wiki/sources/onboard-tui-design-2026-04-27.md as input for the future 'clawtool onboard' command (charmbracelet/huh based wizard that detects host CLIs, offers bridge installs, bootstraps the BIAM identity, runs the existing init recipe picker, and asks for telemetry consent — to be implemented as F4 next iteration). Both updates dogfooded the BIAM async path: two parallel codex tasks fired via 'clawtool send --async --agent codex', task_ids tracked, results pulled with 'clawtool task get'. End-to-end loop closed.
F3 — Hooks subsystem (Claude Code parity): - internal/hooks/hooks.go — package-level Manager with Emit per Event. SetGlobal registers a process-wide instance; lifecycle call sites grab it via hooks.Get() and emit. Empty config = zero- cost no-op. block_on_error entries propagate failure to the originating op so guard-rail hooks can veto it; non-blocking entries log-and-continue. - Events locked at v0.15: pre_send, post_send, on_task_complete, pre_edit, post_edit, pre_bridge_add, post_recipe_apply, on_server_start, on_server_stop. New events go additive. - Hook execution: /bin/sh -c shell or raw argv. Per-hook timeout (default 5s, configurable). JSON envelope payload on stdin so user scripts skip argv parsing. - Wired into Supervisor.dispatch (pre_send / post_send), BIAM Runner (on_task_complete), Edit + Write tools (pre_edit / post_edit), and ServeStdio (on_server_start / on_server_stop). - Config schema extension: [hooks.events.<name>] takes an array of HookEntry (cmd / argv / timeout_ms / block_on_error). - 8 unit tests: nil manager no-op, empty config no-op, configured cmd executes, block_on_error propagates, non-blocking swallows, argv mode skips shell, payload arrives on stdin, SetGlobal / Get round-trip. Timeout test simplified to a non-zero-exit case because subprocess kill semantics on WSL2 stall the exec.CommandContext stdin goroutine for sleep-style children; process-group reaping is the same TODO the Bash tool already carries via applyProcessGroup, and lands as a polish patch. F4 — clawtool onboard wizard (codex W2 design): - internal/cli/onboard.go — first-run interactive wizard via charmbracelet/huh per ADR-007. Pages: host detection (read-only note), missing-bridges multi-select, BIAM identity confirm, telemetry consent. Side effects fire only after the form returns clean; user-aborted forms exit 0 with no changes. - Reuses BridgeAdd, biam.LoadOrCreateIdentity, exec.LookPath — no duplicate logic. - Edge cases covered: host missing every CLI (form still runs; no bridge installs queued), all CLIs present (skips bridge page), telemetry denied (records the negative answer cleanly). - 6 unit tests against an onboardDeps interface; no real TTY, no real exec. Test totals: 28 packages green / 0 fail / race-clean / gofmt clean. No new dependencies — huh + lipgloss already on the tree. Carry-over to v0.15.x: - subprocess group-kill polish for hooks timeout (matches Bash tool) - telemetry sink (PostHog Go SDK per gemini's R4) hooked into the consent flag the wizard records - 'clawtool hooks list / test <event>' subcommand for debugging
… README F5 — internal/telemetry: anonymous PostHog event sink. Strict allowedKeys allow-list filters payloads, anonymous distinct_id at $XDG_DATA_HOME/clawtool/telemetry-id (mode 0600), runtime.GOOS/GOARCH auto-injected. CLAWTOOL_TELEMETRY env kill switch always wins over config. Default disabled; clawtool onboard records consent. F6 — clawtool hooks list / show <event> / test <event> [--payload]: inspect configured events without firing the real lifecycle, debug shell snippets in isolation. Test coverage for all three subcommands and synthetic-payload paths. F7 — internal/sysproc: ApplyGroup (Setpgid only) + ApplyGroupWithCtxCancel (Setpgid + Cancel for CommandContext callers) + KillGroup. Wired into hooks.go so timeouts SIGKILL the whole shell child tree. Without it a sleep child of /bin/sh outlived parent kill and held stdio pipes open past the deadline. atomic.Bool for the timedOut flag (race-clean under -race). Plus: clawtool onboard chain-confirm step that offers to run clawtool init right after, so first-run host bootstrap → repo bootstrap is one continuous wizard. README rewrites for the v0.15 surface (4-pillar pitch, hooks + onboarding mini-section, expanded usage block) and a stub-server .gitignore inline-comment fix that was silently un-ignoring the build artefact.
…gleton, BIAM Close errors, identity race, secret-aware index HIGH: - agents/supervisor: rate-limit + round-robin state were reset on every NewSupervisor() because both lived on the per-call struct. Hoist to process-wide singletons (sharedDispatchState) so MCP / HTTP / BIAM callers in one process observe one rotation cursor and one token bucket. Test escape hatch: ResetDispatchStateForTest. - agents/biam/runner: Close() error was discarded by defer, so a crashed upstream still recorded TaskDone with a partial body. Capture + flip to TaskFailed/KindError when streamingProcess.Close returns. - tools/core/agents_tool + cli/send: same defer rc.Close() pattern. Lift to explicit close + fold ExitError into the result so callers see upstream non-zero exit instead of an empty success. - cli/send: --async + --isolated leaked the worktree because cleanup ran only on synchronous failure. Reap the worktree (or honour --keep-on-error) after WaitForTerminal returns. - agents/biam/identity: first-launch parallel CLI invocations could each generate + write a fresh keypair, last writer winning. Guard the create-and-publish window with gofrs/flock + re-read under the lock. - index: SemanticSearch could embed .env / id_rsa / *.pem because the default Ignore list didn't cover secret-bearing dirs. Extend defaults + add a basename guard (isLikelySecret) that fires regardless of user-config Ignore overrides. - tools/core/verify: Ruby probe was missing from probeOrder() and --target=ruby resolved to plain `ruby -Itest` (REPL with no script). Add Rakefile detection and use `bundle exec rake test` (with a fall back when no Gemfile is present). MEDIUM: - cli/upgrade: latest.LessOrEqual panicked on "(devel)" / "(unknown)" build versions. Skip the comparison for non-semver inputs so dev builds still upgrade. - server: ToolSearch index was built from a 9-key gateable map; v0.15 always-on tools (SendMessage, Bridge*, Task*, Verify, SemanticSearch, …) never made it into the index even though MCP registered them. Reuse CoreToolDocs() and only filter the gateable subset by config.IsEnabled. - agents/worktree: cleanup func captured the caller's ctx, so a timeout during dispatch then ran git worktree remove against an already-done ctx. Use context.Background for the cleanup shell-out. - agents/supervisor: bad dispatch.limits.rate strings silently disabled rate enforcement. Surface the parse error on stderr at supervisor construction so the operator notices.
Walk-through of the four /v1 endpoints (health, agents, send_message, recipes, recipe/apply), the optional /mcp Streamable HTTP transport, bearer auth, and worked Postman + cURL examples. Surfaced from the README's CLI reference under `clawtool serve --listen` so a Postman user can find it without grepping the source.
…rs; store decode failures stop silently dropping rows - tasks_tool: TaskGet and TaskWait both threw away the MessagesFor error. A corrupt envelope row used to look like "task valid, no replies yet". Fold the error into out.ErrorReason so the agent sees the parse failure instead of an empty body. - store.MessagesFor: body / trace / created_at unmarshal errors were swallowed with `_ = json.Unmarshal(...)`. Now stops on first bad row and returns the partial slice with a wrapped error so callers can inspect both what made it through and what broke.
ADR-014 stays untouched: browser is a Tool surface, not a Transport. clawtool wraps github.com/h4ckf0r0day/obscura (Apache-2.0, V8 + Chrome DevTools Protocol, 30 MB memory vs Chromium's 200+) per ADR-007 so agents can render SPA / hydrated pages without us hand-rolling a headless engine. - BrowserFetch (internal/tools/core/browser_fetch.go): stateless single-URL render via `obscura fetch --dump html | --eval ...`. Result shape mirrors WebFetch (title / byline / sitename / content) plus optional eval_result so agents can swap the two without rewriting parsing. Optional CSS-selector wait, --stealth pass-through. - BrowserScrape (internal/tools/core/browser_scrape.go): bulk parallel via `obscura scrape ... --concurrency N --eval ... --format json`, hard cap 500 URLs / 50 workers. Tolerates both NDJSON and JSON-array output; per-URL errors fold into the row so the batch keeps going. - engines.go now caches `obscura` alongside `rg` / `pdftotext`. Missing binary surfaces a one-shot install hint (Linux/macOS one-liners) at call time — no boot-time refusal. - Tests cover the missing-binary, bad-URL, HTML readability, eval pass-through, non-zero exit paths plus the NDJSON/array parser and the URL splitter helper. Race-clean. - Both registered in server.go (always-on) and indexed in CoreToolDocs so ToolSearch surfaces them. - docs/browser-tools.md walks through install, the two tool schemas, worked Next.js + bulk-scrape examples, failure modes, and the reasoning for picking Obscura over Headless Chrome. README links it from the v0.15 hero block. The cookie-driven interactive surface (BrowserAction, CDP-over-WebSocket) lands as a follow-up commit because cookie injection requires the obscura serve transport, not the fetch CLI.
The BIAM runner used to record TaskDone whenever the upstream CLI
exited 0, even when the stream-json body ended with a terminal failure
event like {"type":"turn.failed","error":{"message":"This content was
flagged for possible cybersecurity risk."}}. Real-world repro: codex's
content-policy filter killed a turn mid-flight, codex itself exited
clean, and TaskWait blocked downstream agents on a transcript that
already declared itself failed.
detectStreamFailure walks the last ~12 lines of the buffered body
looking for top-level {"type":"turn.failed"} or {"type":"error"}
events; per-tool failures inside item.completed (e.g. failed bash
command inside an otherwise successful turn) are deliberately
ignored. Failure detail (Error.Message or top-level Message) is
appended to the task body so the operator sees what actually broke
instead of an opaque "failed" status.
Tests cover: turn.failed with nested error.message; healthy turn;
per-tool failure that must NOT flag; empty body.
A portal is a saved web-UI target — a base URL paired with login cookies, CSS selectors, and a 'response done' predicate — that clawtool can drive on the operator's behalf so an MCP-aware agent can ask it questions like any other agent. Per ADR-017 portals are a Tool surface, not a Transport: the supervisor still only dispatches to upstreams that publish a stable headless contract. This iteration ships the persistence + read-only surface; the CDP driver behind 'ask' lands in v0.16.2 (separate commit because it needs a websocket client to drive Obscura's CDP server). The deferred-feature error is uniform across CLI + MCP so the operator can stage config + cookies today and the agents see the same shape once the engine arrives. - internal/config: PortalConfig + Portals map; predicate / selector / browser sub-stanzas; portals_io.go helpers (LoadFromBytes, AppendBytes, RemovePortalBlock) for the editor-driven add flow. - internal/portal: Validate (rejects bad scheme / missing scope prefix / unknown predicate type / empty input selector / missing response_done_predicate); Defaults; ParseCookies (array or single object form); AssertAuthCookies (catches incomplete exports); Names (sorted); AskNotImplementedError sentinel. - internal/cli/portal.go: list / which / use / unset / add (opens $EDITOR with a TOML template, parses + validates the result before appending) / remove / ask (placeholder). - internal/tools/core/portal_tool.go: PortalList / PortalWhich / PortalUse / PortalUnset / PortalRemove / PortalAsk MCP tools. Add is CLI-only because it spawns $EDITOR. - server.go: RegisterPortalTools wired alongside browser tools. - toolsearch.go: 7 new entries so ToolSearch surfaces the surface. - README + cli.go usage updated; docs/portals.md walks chat.deepseek.com end-to-end (cookie export, secrets.toml shape, predicate vocabulary, failure modes). - ADR-017 (browser-tools-not-transport) and ADR-018 (portal feature) shipped to wiki; readers cross-reference cleanly.
The deferred-feature sentinel from v0.16.1 is gone. clawtool portal ask + the PortalAsk MCP tool now actually drive the saved flow end-to-end: spawn obscura serve, open an isolated CDP browser context, seed cookies + extra headers, navigate, run login_check + ready_predicate, fill the input + submit, poll response_done_predicate, return the last response selector's innerText. - internal/portal/cdp.go: minimal CDP client over coder/websocket. Synchronous request/reply via per-id channels with a single reader goroutine; push events without an id are dropped. Wraps the six methods we need (Target.createBrowserContext + createTarget + attachToTarget, Network.enable + setCookies + setExtraHTTPHeaders, Page.enable + navigate, Runtime.evaluate) plus convenience EvaluateBool / EvaluateString. Per ADR-007 we skip chromedp/cdproto for a surface this small. - internal/portal/ask.go: orchestrator. obscura serve --port 0, stderr scanner pulls the ws:// URL, isolated browser context (disposeOnDetach so the cookie jar evaporates after the call), cookies + headers seeded BEFORE navigation, native value setter + synthetic input/change events for React/Vue/Svelte controlled components, click selector or Enter fallback, predicate poll every 250ms. - internal/portal/cdp_test.go: mock CDP server (httptest + coder/websocket Accept) covers round-trip, error frame surface, evaluate value pass-through, JS exception extraction, predicate expression generation, jsString escaping. - internal/cli/portal.go: PortalAsk runs the real driver, loads cookies from secrets.toml under p.SecretsScope, streams progress to stderr, the answer to stdout. - internal/tools/core/portal_tool.go: PortalAsk handler same flow; RegisterPortalAliases reads cfg.Portals at boot and binds <name>__ask thin wrappers (e.g. my-deepseek__ask) so MCP-aware models discover portals as first-class tools. - server.go now calls RegisterPortalAliases alongside RegisterPortalTools. - toolsearch.go PortalAsk description updated; README + ADR-018 promoted to accepted; docs/portals.md describes the actual flow; wiki log entry captures the design decisions.
The v0.16.2 portal CDP layer was ~600 LoC of hand-rolled WebSocket / JSON-RPC client + Chrome launcher. ADR-007 says wrap, don't reinvent — chromedp/chromedp is the canonical Go DevTools Protocol library (Apache-2.0, used in production by GoReleaser, k6, every Mailgun integration test) and it covers exactly the surface portals need: ExecAllocator for the wizard's local Chrome spawn, RemoteAllocator for the runtime's Obscura attach, typed actions for navigate / setCookies / setExtraHTTPHeaders / evaluate. - internal/portal/driver.go: BrowserSession wraps chromedp ctx + allocator-cancel + browser-cancel. Two constructors: NewExecBrowser (wizard) and NewRemoteBrowser (runtime). Helpers: Navigate, Cookies, SetCookies, SetExtraHTTPHeaders, Evaluate, EvaluateBool, EvaluateString. mergeCtx threads caller's ctx through the session ctx so deadlines compose. - internal/portal/ask.go: rewritten on top of BrowserSession. Same flow (login_check + ready_predicate poll → fill input + submit → response_done_predicate poll → response selector innerText). Native value setter + synthetic input/change events stay so React / Vue / Svelte controlled components register the prompt insertion. Obscura process management lives here too (startObscuraServer + readObscuraWS), one source of truth. - Removed: internal/portal/cdp.go (290 LoC), cdp_test.go, chrome_launcher.go (250 LoC), portal_wizard.go (wizard rewrite pending the new BrowserSession API). - Added internal/portal/driver_test.go covering the pieces we own (predicate expression generation, jsString escaping, obscura ws:// banner scanner with timeout). - go.mod: chromedp/chromedp + chromedp/cdproto pulled in; coder/websocket dropped (chromedp uses gobwas/ws under the hood). Net diff: -600 LoC, -1 dep (coder/websocket), +1 dep family (chromedp). 30 packages still green, race-clean.
`clawtool portal add <name>` is now an interactive wizard. The operator types one command; Chrome opens with a fresh temp profile, they log in to the portal, the wizard captures cookies via Network.getAllCookies and asks for three CSS selectors + a 'response done' template. Output: validated config.toml + 0600 secrets.toml. Driven by the chromedp BrowserSession API from e6af0f2 — wizard uses NewExecBrowser(Headless=false), runtime keeps using NewRemoteBrowser pointed at obscura serve. Same code path, two allocators. - internal/cli/portal_wizard.go: huh.Form orchestration. Six steps (URL+intro, launch Chrome, claude-in-chrome assist hint, login gate, cookie capture + auth-name auto-detect, selectors + predicate pick, persist). wizardDeps interface lets tests inject a fake browser without spawning Chrome. - internal/cli/portal.go: `portal add` defaults to the wizard; `--manual` falls back to the v0.16.1 $EDITOR template path. - internal/config/portals_io.go: MarshalForAppend exported so the wizard round-trips the assembled stanza through the same AppendBytes merge that --manual uses. - internal/portal/portal.go: MarshalCookies helper (mirror of ParseCookies) so wizard saves cookies in the same JSON shape the runtime expects. - internal/cli/portal_wizard_test.go: tests for assemblePortalConfig, predicateForChoice, filterCookiesForHost, autoDetectAuthCookieNames, hostFromURL, buildClaudeInChromeHint. Wizard happy-path uses a fakeBrowser that satisfies the small portalBrowser interface. claude-in-chrome stays unwrapped — wizard generates a copy/paste prompt operators can drop into the side panel for assisted login, but clawtool itself never imports an extension dependency. Per ADR-017. Docs: README.md hero block updated to mention the wizard; docs/portals.md reorganises the worked example to lead with `portal add` then keeps the manual export path under `--manual`.
ADR-019 lands. `mcp` is the new authoring noun for MCP server source
code, sister to `skill` (Agent Skills). Co-designed with Codex (task
55a5a480) and Gemini (task 13d4ea86) in parallel BIAM async
dispatches; synthesis preserves Codex's naming + repo-relative
output, both reviewers' .claude-plugin/ day-one + operator-managed
marketplace.
This commit is the SURFACE STUB — generator (`mcp new / run / build /
install`) lands in v0.17. Same deferred-feature pattern v0.16.1
used for `portal ask` before v0.16.2 wired the CDP driver: surface
booked today so agents discover the namespace early; rewriting it
post-adoption isn't free.
- internal/cli/mcp.go: CLI subcommand dispatcher.
- `mcp list` ships read-only (walker stub; upgrades when generator
writes .clawtool/mcp.toml markers).
- `mcp new / run / build / install` return McpNotImplementedError
sentinel pointing at ADR-019.
- internal/tools/core/mcp_tool.go: McpList / McpNew / McpRun /
McpBuild / McpInstall MCP tools. RegisterMcpTools wired alongside
RegisterPortalTools in server.go.
- internal/tools/core/toolsearch.go: 5 new entries so ToolSearch
surfaces the surface.
- internal/cli/cli.go topUsage block: `clawtool mcp ...` near
`clawtool skill ...`, with one-liner clarification (mcp = MCP
server source code; skill = Agent Skill folder).
- README.md hero block: MCP authoring bullet alongside Browser
tools / Portals.
- docs/mcp-authoring.md: full preview — wizard prompts, per-language
artifact, install flow, today's interim hand-roll path.
- wiki/decisions/019-mcp-authoring-scaffolder.md (accepted), with
cross-refs to ADR-006 / 007 / 008 / 010 / 014 / 018.
- wiki/log.md: design synthesis captured (Codex `mcp` + Gemini
`forge` reviewers) plus the chromedp lesson from v0.16.3.
…rome) Two-tier integration coverage for portal.Ask, neither requiring an operator-side smoke run: 1. fakePortalBrowser (default, runs every `go test`) implements the new `portal.Browser` interface and simulates a chat portal in memory. Records every call, classifies JS expressions (fill_input / click_submit / dispatch_enter / extract_response / predicate), tracks an N-poll "streaming" delay before the response_done predicate goes truthy. Verifies the full wire: - cookies + headers seeded BEFORE navigate - login_check + ready_predicate polled before fill_input - fill_input precedes click_submit - response_done predicate polled until truthy (>= configured count) - Enter fallback fires when Submit selector is empty - missing auth-cookie short-circuits before browser is touched - response_done timeout names the failing phase 2. ask_realchrome_test.go (//go:build integration) drives Ask against a httptest server that serves a sahte chat HTML — textarea + submit button + a fake "Stop" button that disappears on a 200ms timeout. Skips itself when no Chrome / Chromium / chromium-browser is on PATH so unit-test runs stay portable. Run with `make portal-integration`. Refactor to support this: - internal/portal/driver.go — extracted `Browser` interface; BrowserSession satisfies it via duck typing (compile-time guard: `var _ Browser = (*BrowserSession)(nil)`). - internal/portal/ask.go — Ask gains `opts.Browser` (when set, skip obscura spawn + chromedp connect, run orchestration on the injected Browser). Pulled the orchestration into runAskOnBrowser so the public Ask stays one signature. typeAndSubmit and waitForPredicate now take `Browser` instead of *BrowserSession. Net: 4 new tests covering everything between Validate() and the final EvaluateString. Zero browser binary required. The fake's ordering / scripted polling is the closest thing to real end-to-end you can get without spawning Chrome — and when Chrome IS available, the tagged test confirms chromedp drives a real browser through the same JS templates. Makefile: `make portal-integration` runs the tagged test.
ADR-019 generator lands. `clawtool mcp new <name>` walks the operator
through a huh.Form wizard (or `--yes` for defaults) and writes a real,
compilable MCP server. Per ADR-007 each language adapter wraps the
canonical SDK in its ecosystem.
Live smoke against built binary verified the full chain:
clawtool mcp new my-thing --yes → 9 files including Go server.
go mod tidy && go build ... → 6.7MB binary.
echo '<initialize JSON-RPC>' | ./bin/my-thing
→ correct serverInfo response.
The server actually speaks MCP.
clawtool mcp install . --as smoke-test
→ [sources.smoke-test] in config.toml.
clawtool mcp list --root <dir> → discovers the scaffold.
- internal/mcpgen/: package for the generator.
- mcpgen.go — Spec / ToolSpec / File / Adapter interface +
Generate orchestrator + name validators + writeFile guard.
- common.go — language-agnostic files: .clawtool/mcp.toml marker,
README, .gitignore, .claude-plugin/plugin.json (opt-in).
- go_adapter.go — mark3labs/mcp-go v0.49.0. cmd/<name>/main.go +
internal/tools/example.go + Makefile + go.mod + (opt-in)
Dockerfile.
- python_adapter.go — fastmcp ≥0.4. src/<pkg>/ layout +
pyproject.toml + Makefile + tests/.
- typescript_adapter.go — @modelcontextprotocol/sdk ≥1.0.
src/server.ts + tools/ + package.json + tsconfig + test/.
- mcpgen_test.go — 12 tests: per-language plan, docker opt-in,
plugin opt-out, refuses existing dir, name + tool name + language
validators.
- internal/cli/mcp_wizard.go: huh.Form sequence (description,
language, transport, packaging, plugin manifest, first tool).
--yes path uses minimal defaults (Go / stdio / native / one
echo_back tool). mcpgenDeps interface lets tests drive without
TTY.
- internal/cli/mcp_install.go: reads .clawtool/mcp.toml, derives
the launch command from language + packaging, writes
[sources.<instance>] into config.toml. Same registry the
catalog (clawtool source add) populates — no new code path in
internal/sources/manager.go.
- internal/cli/mcp.go: rewired from v0.16.4 stub to real impls.
mcp list now does filepath.Walk skipping noise dirs. mcp run /
mcp build shim through the project's Makefile (per ADR-007:
don't reinvent build orchestration).
- internal/tools/core/mcp_tool.go: McpNew + McpList wired to the
real generator + walker. McpRun / McpBuild / McpInstall surface
a hint to invoke the CLI shortcut (those touch the operator's
filesystem + language toolchain so the model giving advice
is the natural pattern, not driving the build via MCP).
- internal/cli/mcp_test.go: wizard --yes happy path + bad-name
rejection + existing-dir refusal + walker discovery.
Total surface: 5 CLI verbs, 5 MCP tools, 12+ unit tests, real
end-to-end smoke. README + docs/mcp-authoring.md updated to
"v0.17 shipped". Wiki log entry captures the design + smoke
results.
Single command wipes everything clawtool drops on the host, so test
installs don't pile up duplicate sources / portals / sticky pointers.
Smoke-tested end-to-end against the built binary: dry-run preview →
real removal → idempotent re-run.
- internal/cli/uninstall.go — the verb. Plans a list of targets
(config, cache, data, optional binary), prints them, asks for
confirmation (skip via --yes), atomically removes via os.RemoveAll.
- Flags:
--yes Skip the y/N prompt.
--dry-run Print the plan without touching disk.
--purge-binary Also remove $CLAWTOOL_INSTALL_DIR/clawtool
(defaults to ~/.local/bin/clawtool — Homebrew /
curl-installed binaries should be removed via
the source's own uninstall path).
--keep-config Preserve config.toml + secrets.toml + identity.
Removes only sticky pointers + caches + BIAM data.
- Targets enumerated dynamically — non-existent files drop from the
plan so the rendered list reflects reality. Idempotent: a second
run prints "nothing to remove".
- internal/cli/uninstall_test.go — 6 tests (dry-run, full sweep,
purge-binary, keep-config selective removal, nothing-to-do path,
arg parser). XDG_* env overrides isolate every test in t.TempDir.
- topUsage block in cli.go updated.
The MCP tool variant is intentionally omitted: destroying clawtool
state from inside a model conversation is too high-blast-radius.
Operators run the CLI verb themselves.
…ocker)
ADR-020 lands. Synthesised from parallel BIAM async dispatches: Codex
(task 4468aa25) recommended `mcp`-style noun + native-flag composition
+ BIAM cancel fix; Gemini (task 87343e0f) recommended `vault` (rejected
— HashiCorp Vault collides) + Engine interface shape. Both reviewers
converged on bwrap (Linux/WSL2) / sandbox-exec (macOS) / docker
(fallback) + external-wrap-over-native-delegate.
This commit ships the SURFACE: profile parser, engine probes,
read-only verbs (list / show / doctor), MCP tool catalog. The
dispatch-time wrapping (clawtool send --sandbox <profile> actually
constraining the upstream agent) lands incrementally per ADR-020:
v0.18.1 bwrap adapter, v0.18.2 sandbox-exec, v0.18.3 docker, v0.19
Windows. Same incremental pattern v0.16.4 used for `mcp` before
v0.17 filled in the generator.
Live smoke against built binary verified the full surface:
clawtool sandbox list → two configured profiles + bwrap engine
clawtool sandbox show → renders paths/network/limits correctly
clawtool sandbox doctor → bwrap + docker both detected on this
WSL2 host, noop fallback always
available, bwrap selected as primary
- internal/config/config.go: SandboxConfig + SandboxPath +
SandboxNetwork + SandboxLimits + SandboxEnv added next to
PortalConfig. Schema covers paths (ro/rw/none), network
policy (none/loopback/allowlist/open), allow list, env
allow + deny, timeout / memory / CPU shares / process count.
- internal/sandbox/sandbox.go: Engine interface (Name/Available/
Wrap), Profile type, ParseProfile (validates modes + network
policy + duration + byte sizes), parseBytes ("1GB", "512M",
raw), SelectEngine (priority order, falls through to noop),
AvailableEngines (for doctor).
- internal/sandbox/bwrap_linux.go: bubblewrap engine probe.
Available() looks for bwrap on PATH. Wrap() returns a
deferred-feature error pointing at v0.18.1 (matching the
pattern v0.16.1 used for portal ask).
- internal/sandbox/sandbox_exec_darwin.go: macOS sandbox-exec
probe + deferred Wrap (v0.18.2).
- internal/sandbox/docker_anywhere.go: cross-platform fallback.
Available() runs `docker info` to check the daemon, not just
the client binary. Deferred Wrap (v0.18.3).
- internal/sandbox/sandbox_test.go: 7 tests (full-shape parse,
bad mode, bad network policy, allow-without-allowlist,
parseBytes table, SelectEngine non-nil, AvailableEngines
includes noop).
- internal/cli/sandbox.go: list / show / doctor / run dispatcher.
list iterates configured profiles + reports the selected engine.
show parses one profile through ParseProfile + renders all
fields. doctor walks every registered engine + Available.
run is the escape hatch (deferred error today).
- internal/tools/core/sandbox_tool.go: SandboxList / SandboxShow /
SandboxDoctor MCP tools. SandboxRun deliberately omitted —
letting a model spawn sandboxed commands has the wrong default.
- ToolSearch indexes the three new MCP tools.
- topUsage block in cli.go updated.
- docs/sandbox.md walks engines / profile schema / per-agent
default / native composition / failure modes.
- wiki/decisions/020-sandbox-feature.md (accepted) — full design
including the `[sandboxes.X.native]` sub-stanza Codex
contributed and the BIAM cancel fix Codex flagged at
internal/agents/biam/runner.go:61.
Multi-stage Dockerfile + docker-compose.yml + Caddyfile + docs/docker.md.
clawtool now ships as a runnable container alongside the Go binary
distribution.
Live-tested against real Docker (29.2.1):
docker build -t cogitave/clawtool:dev . → 15MB image
docker run --rm cogitave/clawtool:dev version → "clawtool 0.9.2"
echo '<initialize JSON-RPC>' | docker run -i --rm ... | head -1
→ correct serverInfo response — image speaks MCP.
docker run -i --rm ... <tools/list call>
→ BrowserFetch / McpNew / PortalAsk / SandboxList all exposed.
- Dockerfile — two stages. golang:1.26-alpine compiles the static
binary with CGO_ENABLED=0, -trimpath, -ldflags injecting version
metadata. Runtime is gcr.io/distroless/static-debian12:nonroot —
no shell, no apt, no glibc, just the binary + ca-certificates,
running as UID 65532. ENTRYPOINT ["clawtool"], CMD ["serve"]
so `docker run -i ...` is a stdio MCP server out of the box.
- .dockerignore — keeps the build context lean (~5MB instead of
~50MB once the wiki / .raw / docs land).
- docker-compose.yml — clawtool serve --listen 0.0.0.0:8080 +
Caddy reverse proxy with auto-TLS. Volumes persist
config / cache / data across restarts. Token file mounted
read-only. Mirrors the clawtool-relay recipe but at the repo
root for operators who clone the source instead of running
`clawtool init`.
- Caddyfile — minimal reverse-proxy config. Caddy doesn't
terminate clawtool's bearer-token auth; it just proxies. Auto
Let's Encrypt when CLAWTOOL_DOMAIN points at a public host.
- Makefile — `make docker` builds, `make docker-smoke` does the
MCP-initialize verify (the same handshake this commit's smoke
test ran).
- docs/docker.md — full operator guide: stdio + HTTP modes,
volume mounts, persisting state, mounting host config
read-only, sandbox interaction (clarifying that you don't run
the sandbox feature inside Docker — bwrap / sandbox-exec live
on the host).
- README.md hero block updated.
…fore-Write, Edit diff (ADR-021)
ADR-021 phase A. Synthesised from parallel Codex (BIAM task 6435286b)
and Gemini (task c977810b) audits against Cursor / Cline / Aider /
Cody best practice. Codex flagged the critical correctness point:
MCP session_id is NOT model-supplied — must come from
server.ClientSessionFromContext(ctx). Implemented exactly that.
Live-tested end-to-end against built binary:
Read .../existing.txt → file_hash=a948904f2f0f... (SHA-256 verified)
Read .../existing.txt with_line_numbers=true → render carries ' 1 | hello world' prefix
Write .../existing.txt content='new' → REFUSED:
'has not Read /tmp/.../existing.txt — Read it first (or pass mode="create" ...)'
Edit .../multiline.go old='old' new='NEW' → returns diff_unified:
--- a/.../multiline.go
+++ b/.../multiline.go
@@ -1,3 +1,3 @@
- internal/tools/core/session_state.go — SessionState + SessionKey,
Sessions singleton, RecordRead / ReadOf / SessionKeyFromContext
(uses server.ClientSessionFromContext, anonymous fallback for
stdio/tests). HashFile + HashString + hashBytes helpers.
- internal/tools/core/session_state_helpers.go — readFileForHash
shim so tests can stub disk reads without touching production
ReadFile callers.
- internal/tools/core/read.go — ReadResult gains FileHash +
RangeHash. runRead computes both after a successful read and
records into the session registry. New with_line_numbers flag
(default false) prefixes the rendered text with '%4d | ' —
agents can reference lines accurately, JSON content stays raw
so Edit's exact-substring matching keeps working.
- internal/tools/core/write.go — Read-before-Write guardrail.
guardReadBeforeWrite() runs before executeWrite. Three new args:
mode: 'create' | 'overwrite' (default '')
must_not_exist: bool
unsafe_overwrite_without_read: bool
Existing file + no prior Read on the session = error message
pointing at the four ways to satisfy the check (Read first,
mode='create', must_not_exist, or the explicit unsafe bypass).
Stale detection: if file's current SHA-256 doesn't match the
one recorded at Read time, refuse with 'changed since this
session Read it'.
- internal/tools/core/edit.go — EditResult gains HashBefore,
HashAfter, DiffUnified. unifiedDiff() emits a 'diff -u'-style
patch (--- a/path / +++ b/path / @@ hunk / line-by-line walk),
capped at 200 lines so multi-line rewrites don't bloat the
response. lcsLen kept as a stub for the future LCS-driven
hunk algorithm.
- internal/tools/core/session_state_test.go — 11 tests:
hashBytes determinism, HashFile round-trip, Sessions
record/lookup with isolation across keys + paths, anonymous
fallback, prefixLineNumbers formatter, guard rejecting
no-prior-Read, allowing after recorded Read, rejecting on
stale hash, create-mode rejecting existing file, create-mode
passing for new path, unsafe override bypassing guard.
- wiki/decisions/021-core-tools-polish.md (accepted) — full
design + the eight items, two-phase rollout plan, hash strategy,
MCP session id contract, open questions.
Phase B (next commit): Glob .gitignore default-on, Grep context
lines + multi-pattern, Bash background mode, WebFetch SSRF
guard, WebSearch filters.
The daemon's combined stdout/stderr lands in
$XDG_STATE_HOME/clawtool/daemon.log — every goroutine panic,
every "clawtool: <subsystem>: <error>" stderr line, every BIAM
reap warning ends up there. Pre-this commit the file was local-
only, so a daemon stuck in a panic loop on someone else's host
was invisible to us until they filed an issue. With telemetry
opt-in (pre-v1.0 default = on), forwarding classified failures
gives us the diagnostic feedback loop we need to triage bugs
before the operator notices.
Design:
internal/telemetry/logwatch.go — LogWatcher that:
- Tails daemon.log starting from EOF (never streams the
historical buffer; new errors only).
- Classifies each line into severity ∈ {error, warn, panic}
and event_kind from a small allow-list (panic / fatal / biam
/ auth / io / other). Classifier uses substring matching for
the hot path; ordering documented in classify().
- Rate-limits to 60 events per minute. A panicking daemon
emits the first minute of evidence then goes quiet — well
under PostHog's per-distinct-id quota and harmless on the
back end if the operator's host is genuinely flapping.
- Emits clawtool.daemon.log_event events with severity +
event_kind + command:"daemon" + transport:"http". NO log-line
bodies cross the wire — only the classification fields, so
an env-value or path that happens to be in the log can't leak.
internal/telemetry/telemetry.go — added "severity" to the
allowedKeys allow-list (otherwise the property would be silently
dropped before reaching PostHog).
internal/server/server.go — wires NewLogWatcher after telemetry.New
during the HTTP transport boot path. stdio path skipped because
it's per-call and the log forwarder needs a long-running daemon
to be useful. Watcher cancellation rides the existing ctx.
Tests:
- TestClassify_Taxonomy covers every documented severity /
event_kind branch including the order-dependent cases (BIAM
init failure that contains "no such file" — must classify as
biam, not io).
- TestLogWatcher_NilClientNoOps guards the boot-order contract:
a nil or disabled telemetry client must make Run a clean
return rather than panic, so server.go doesn't have to gate
every call.
Build, vet, deadcode, full 49-package test suite, stub-e2e all
green via `bash scripts/ci.sh`.
This is the daemon-side half of "let's see what's failing on
the operator's host before they have to tell us." The dashboard-
side half (PostHog Insights query for clawtool.daemon.log_event,
broken down by severity + event_kind + version) is operator
work in PostHog itself; the events ship the moment a daemon
on a telemetry-enabled host emits a classifiable line.
…ersion filtering PostHog's Sessions and Live views filter by $lib_version out of the box. Pre-this commit Track only stamped $lib (always "clawtool-go"), so a regression introduced in v0.22.30 looked identical to one in v0.22.36 in the dashboard — operator had to manually filter on the existing `version` property which the LLM Observability board doesn't query by default. Track() now auto-fills $lib_version from version.Resolved() when the caller doesn't supply one. The CLI's per-command Track sites that already pass an explicit `version` string keep their value untouched; this just adds the PostHog-canonical $lib_version field that sessions / live / cohort queries query by default. Together with the daemon log forwarder shipped in 3f6af08, every clawtool.daemon.log_event from an operator's host now lands in the Sessions view with $session_id (groups events from one daemon run) + $lib_version (filter by build) + severity / event_kind (classify the failure) — letting us see which version, on which session, hit which class of error. That's the live-debugging loop the project needed for pre-v1.0 iteration. Test coverage: existing telemetry test suite (TestTrack_*) keeps passing because $lib_version is allow-listed and the auto-stamp only fires when the caller didn't already set it.
…evel diagnostics
The daemon log forwarder shipped in v0.22.36 covers reactive
diagnostics (something errored, here's the classification). What
was missing: proactive cohort segmentation. Without a fingerprint
event we couldn't answer "are panics concentrated on WSL hosts"
or "does the upgrade flow succeed differently on Apple Silicon
vs Intel" without asking operators to file an issue.
internal/telemetry/fingerprint.go — FingerprintProps() builds a
single-row property map covering every dimension that's legal to
collect anonymously:
Hardware band:
cpu_count — runtime.NumCPU()
mem_tier — "<2GB" | "2-8GB" | "8-32GB" | ">32GB" | "unknown"
go_version — runtime.Version()
Environment fingerprint:
container — bool (docker / podman / k8s detection)
is_ci — bool (GitHub / GitLab / Circle / Travis / Jenkins / Buildkite)
is_wsl — bool (Microsoft / WSL signature in /proc/version)
term_kind — "tty" | "ssh" | "ci" | "headless"
locale_lang — first 2-5 chars of $LANG (e.g. "tr" / "en"); "unknown" on parse fail
Agent CLI presence (boot-time PATH probe):
claude_code_present, codex_present, gemini_present, opencode_present
Network reachability (1s TCP dial each):
posthog_reachable, github_reachable
Strict legal limits — every dimension is one of:
- an enumerable bucket (mem_tier, term_kind, locale_lang)
- a public runtime attribute (cpu_count, go_version)
- a presence boolean (claude_code_present)
- a reachability boolean (posthog_reachable)
NOTHING per-user-identifiable. NO paths. NO env values. NO
hostnames. The TestFingerprintProps_NoSensitiveContent unit test
guards this with explicit forbidden-substring checks
(/home/, /Users/, @, Authorization, sk-, ghp_, etc.) running
against the real environment so a future dimension that leaks
PII fails the test the moment it lands.
GeoIP suppression: every Track call now stamps $geoip_disable=true
unconditionally. Even though PostHog could resolve city / country
from the request IP, we don't want that level of fidelity even
under "anonymous diagnostics" — operator's network location is
not a dimension we asked for and not one we promised to collect.
Wire shape: clawtool.host_fingerprint event emitted once per
daemon boot, after server.start, alongside install_method (from
$CLAWTOOL_INSTALL_METHOD) and the canonical version. Tied to the
daemon's $session_id + $lib_version so PostHog Sessions / Live
views can pivot "v0.22.37, WSL, mem_tier=8-32GB cohort, what's
their panic rate?" out of the box.
Live-verified: smoke test against a debug daemon on this host
shows the event flowing with cpu_count:8 mem_tier:2-8GB
go_version:go1.26.0 is_wsl:true term_kind:tty
claude_code_present/codex_present/gemini_present/opencode_present:true
posthog_reachable:true github_reachable:true install_method:manual
locale_lang:c — exactly the cohort dimensions PostHog needed.
Three new unit tests:
- TestFingerprintProps_StrictAllowList: every key the
fingerprint emits MUST be in allowedKeys. Catches a future
dimension landing without an allow-list entry (would silently
drop on the wire).
- TestFingerprintProps_NoSensitiveContent: no value contains
any forbidden substring. Legal contract guard.
- TestMemTier_Buckets + TestDetectLocaleLang_Buckets: cover
the bucket logic + the unknown-fallback contract.
Also fixes test/e2e/realinstall/run.sh: the upgrade --check
case statement now accepts the v0.22+ wire shape ("already on
the latest" / "->") so the test passes when install.sh fetches
the latest GitHub release (and thus the just-installed binary
IS the latest, leaving --check a clean no-op).
Build, vet, deadcode, full 49-package test suite, stub-e2e all
green via `bash scripts/ci.sh`.
…utput
Onboard is the first ten seconds the operator spends with clawtool;
the wizard either hooks them or churns them. Pre-fix: typing
`clawtool onboard` left every line of pre-existing terminal noise
(npm install, git status, whatever was there) above the wizard,
the welcome page was a multi-paragraph huh.NewNote that overflowed
on small terminals, and the side-effect dispatch (bridges, daemon,
identity, secrets) printed a stream of mixed-glyph stdoutLn lines
with no visual structure.
This commit polishes that surface end-to-end:
internal/cli/onboard_ux.go — new onboardUX renderer. Mirrors the
upgrade UX's design constraints (TTY-aware, plain ASCII when piped,
no spinners) with three new affordances:
- ClearScreen: emits \033[2J\033[3J\033[H so onboard lands on a
clean slate, scrollback included. No-op when stdout isn't a tty
so piped invocations / CI logs stay greppable.
- Header(version, found): rounded-box panel with the live host
detection result rendered as a single tight pill row of ✓/·
markers — replaces the prior welcome page's multi-paragraph
hostSummary block.
- Section / PhaseStart / PhaseDone / PhaseSkip / PhaseFail / Note
/ Summary / NextSteps — same protocol upgrade_ux.go uses, so
operators who've run `clawtool upgrade` already know the
cadence (→ doing X → ✓ X (350ms · detail)).
internal/cli/onboard.go — wires the new UX:
- ClearScreen + Header at entry; the prior huh.NewNote welcome
group is dropped (boxed header replaces it).
- Side-effect dispatch refactored into Section blocks:
Bridges / MCP host registration / Daemon / Identity / Secrets
store. Each step renders as a phase pair (PhaseStart label →
PhaseDone detail) with timing.
- Closing block is now a tight Summary checklist + NextSteps
panel instead of stream-of-stdoutLn paragraphs. One screen,
scan-friendly.
- The "Run `clawtool send --list`" hint stays on stdoutLn for
test-harness compatibility (existing TestOnboard_AllPresent_*
asserts on it).
Smoke output (plain-ASCII capture, what shows in `tee` / log files):
clawtool onboard v0.22.37
----------------------------
[OK] claude-code [OK] codex [OK] gemini [OK] opencode [--] hermes
Bridges
-------
-> install bridge hermes
FAIL install bridge hermes
prereq not satisfied
...
Daemon
------
-> start persistent daemon
OK start persistent daemon (0s · http://127.0.0.1:33029/mcp)
...
Summary
-------
[OK] daemon http://127.0.0.1:33029/mcp
[OK] BIAM identity
[OK] secrets store ~/.config/clawtool/secrets.toml, mode 0600
In a real terminal: rounded-border box, lipgloss-styled colors
(63/83/214/203/245), bold section titles. TTY-aware throughout.
Build, vet, deadcode, full 49-package test suite, stub-e2e all
green via `bash scripts/ci.sh`.
Replace the linear huh.NewForm(groups...) flow with a stepwise Bubble Tea program that runs in an alt-screen buffer: - Each question is its own focused step (8 visible steps with step indicator "Step X of Y") instead of stacking groups in one continuous form. - Pinned rounded-box header stays visible across every step; the host-detection pill row sits inside it. - Side-effect run phase (bridge install / MCP claim / daemon / identity / secrets) executes as tea.Cmd with live progress log rendered inside the same alt-screen. - On exit, the alt-screen is dropped and the operator's terminal scrollback is restored — telemetry thank-you + star CTA land in the regular scrollback. --yes / non-TTY callers (CI, Dockerfiles, e2e harness, pipes) fall through to the existing linear onboard() so the plain-text contract stays stable. Detection: checks both stdin AND stdout are *os.File and TTY. The onboard model lives in internal/cli/onboard_tui.go; internal/cli/onboard_tui_test.go covers step construction, run queue ordering, stepResultMsg outcome → summary mapping, and View() rendering for both phaseSteps and phaseRun.
Add wizard progress persistence so `clawtool onboard` can survive mid-flow interruption (Ctrl-C, terminal close, accidental crash). Wire: - $XDG_CONFIG_HOME/clawtool/.onboard-progress.json (mode 0600, atomic write, schema-versioned). - After every step's huh.Form completion the model snapshots (stepIdx, onboardState) to the progress file. - A clean finish (run-phase queue drained → finishedMsg) clearOnboardProgress()s the file so the next invocation hits the "already onboarded" guard, not the resume prompt. Re-entry behaviour at runOnboard(): - Progress file present → huh.Select prompt with 3 options: Resume (load saved state + jump to saved stepIdx) / Start over (clear progress, fresh wizard) / Cancel (exit without changes; progress file stays). - .onboarded marker present, no progress file → huh.Select prompt: Re-run / Cancel. - Neither → fresh wizard, no extra prompt. New `--force` / `-f` flag wipes both the progress file and the .onboarded marker before launching, bypassing both prompts. newOnboardModelAt(state, deps, track, startStep) is the resume-aware constructor (out-of-range startStep clamps to 0 so a stale progress file from a fewer-steps build doesn't push the cursor off the end). Tests cover round-trip persistence, missing/corrupt/schema- mismatch loads, idempotent clear, and startStep clamping.
The vertical-stack layout felt cramped — header pinned at top, progress dots in mid-air, form below, footer at bottom, all sharing a single column. Switch to a three-band layout that actually uses the alt-screen real estate: ┌────────────── HEADER ──────────────┐ │ ASCII logo │ tagline + attribution │ │ │ host-detection pills │ ├──────┬──────────────────────────────┤ │ side │ MAIN │ │ bar │ active step's huh form │ ├──────┴──────────────────────────────┤ │ FOOTER (keybinds) │ └─────────────────────────────────────┘ - Full-width rounded-border banner with chunky box-drawing ASCII logo (`┏━╸╻ ┏━┓╻ ╻╺┳╸┏━┓┏━┓╻`) on the left. - Right side of banner: version + tagline, "from Cogitave · by @bahadirarda · help@cogitave.com", and the host-detection pill row. - Persistent sidebar (26 cols) shows the wizard's full step list with state glyphs: ● completed, ◉ active, ○ pending. Active step gets the accent colour + bold so the eye lands on it instantly. - Right pane carries the focused step: title + rule + the embedded huh form. Run + done phases get their own padded layouts (run log inside a left-bordered pane; done view stacks log + summary). Test updates: View() snapshot assertion now checks for the new tagline / attribution / sidebar header instead of the old `clawtool onboard` / `Step ` strings the prior layout used.
cli.New() only wires App.Stdout + App.Stderr; App.Stdin is left zero-value. The previous TTY gate required `a.Stdin.(*os.File)` to succeed, which always failed in production — every `clawtool onboard` invocation fell through to the linear path even when run in a real terminal. Operators reported the alt-screen TUI never showed up: they got the rounded-box header + plain huh form instead. Resolve stdout/stdin to *os.File at the gate, falling back to the real os.Stdin / os.Stdout when the App's embedded streams aren't an *os.File (the production case). Then probe isTTY on the resolved descriptors. Net effect: `clawtool onboard` (no pipe, real terminal) now hits the Bubble Tea TUI as intended; `clawtool onboard --yes` and piped invocations still go through the linear path because either the --yes gate or one of the isTTY probes returns false.
Replace the three-band header / sidebar / footer layout with a single horizontally + vertically centred column ≤ 72 cols wide. Distilled from Charm reference projects (lipgloss, soft-serve, glow, bubbletea/examples/views, huh/examples/burger): - Drop the sidebar — research showed Charm projects favour single- pane wizards (huh's own examples don't render step lists). - Inline header: 1-line monogram "┏━╸ clawtool" + version tagline + "from Cogitave · @bahadirarda · help@cogitave.com" + host pills separated by ` · `. ~4 lines total instead of the prior banner's full-width rounded box. - Inline step indicator: "Step X of Y · <Title>" + dot row (●●◉○○○○○) directly above the form card. - Form card: single rounded border with accent colour ("212", pink) and Padding(1, 2) — the soft-serve / huh idiom. One frame per pane, not nested. - Footer: dim text with bullet separators (the bubbletea/views idiom), ~36 chars, single row. No box. - Width-cap content to onboardCardWidth (72 cols), then horizontal + vertical centre with lipgloss.Place. Long content (run-log + summary) lets the column extend; Place top-anchors when content height exceeds available area. - Run phase: thin left-border accent (vertical bar) instead of closed box for the streaming log — long-running lists read better against an accent than inside a frame. - Done phase: green-bordered ("42") celebratory card; the accent flips from pink to green so the operator's eye lands on the success state. Test snapshot updated: "PROGRESS" sidebar header → inline "Step 1 of" indicator. Other assertions (tagline, attribution, support email) still pass. CI fast-mode green; `go fmt`, `go vet`, `go build`, `go test -race`, `deadcode -test ./...` all pass.
Drop the 72-col hard cap and the centred-card layout. The wizard now uses the full alt-screen as a 3-band layout that adapts to the terminal: - Width fills viewport (m.width - 2) with a 60-col floor for narrow terminals. No upper cap — wide screens get wide cards. - Height: header pinned at top, footer pinned at bottom, body fills the remaining rows. Card has explicit Height(bodyH - N) so it absorbs slack. - Card padding bumped to (1, 3) for breathing room when the card is wide. - WindowSizeMsg forwarded to the active huh form is now CARD- SIZED (m.width - 10, m.height - 14) rather than the full alt- screen. Without this, huh's description text wraps using the full terminal width even though it renders inside a narrower card, breaking the visual rhythm. Done view (post-finish) drops the streaming run log — the operator just watched it scroll by during phaseRun, repeating it pushes next-steps below the viewport on smaller terminals. The green-bordered celebration card now contains only the summary checklist + next-steps panel — that's the punchline the operator wants to see. Net effect: wizard occupies the full alt-screen on any terminal size, no scrolling required, summary screen fits cleanly.
Two problems with the previous build:
1. The huh form's option list was rendering as a single row inside
an otherwise-tall card. Operator only saw "none / decide later"
even though the form had 6 options. Cause: I was forwarding a
tea.WindowSizeMsg{Width: cardW, Height: cardH-N} to the form,
and huh's internal layout was clamping the options viewport to
cardH minus its own title+description+footer overhead — leaving
~1 row for actual options.
Fix: keep the WIDTH clamp (so description text wraps at the
card boundary, not the alt-screen width), but pass the FULL
alt-screen height for the form. huh now renders all options
naturally; the surrounding card absorbs whatever height the
form needs.
2. Wizard hugged the alt-screen top edge with no breathing room.
Top padding bumped from 0 to 2 rows on the outer container
(Padding(2, 1, 1, 1)). Body height calculation reduced by 5
rows (was 2) to account for the new top + bottom padding.
Net effect: form options visible, description wraps inside card
boundary, card stretches to fill body, top breathing room makes
the wizard feel less cramped.
Two fixes for the squashed-options bug (operator could only see "none / decide later" inside an otherwise tall card): 1. Stop forwarding tea.WindowSizeMsg to the active huh form. huh's WindowSizeMsg handler clamps its option-list viewport based on the height we pass — sending a small height squashed to one row, sending the full alt-screen overflowed the card and our outer Height() truncated from the top, leaving only the cursor row visible. Letting huh use its default natural- size rendering (no WindowSizeMsg) makes it render every option. 2. Drop Height(cardH) from the form's surrounding card style. lipgloss.Style.Height clamps content; when the form was taller than cardH (any non-trivial form is), the clamp truncated and we lost rows. The card now auto-sizes to the form's natural rendered height. The body container's Height(bodyH) absorbs slack underneath so the footer still pins to the alt-screen bottom. Trade-off: description text inside the form now wraps at huh's default width (the full alt-screen, since we're not forwarding the WindowSizeMsg). The wrap may extend beyond the card's visible width on narrow terminals. Acceptable for now — wrap-at- card-boundary requires matching huh's internal width signal, which interacts with the option-list height the way described above. Will revisit if the wrap becomes a real problem.
The previous design wrapped the active huh form (which already renders its own left-bordered focused-field accent) inside a second rounded-border card. The result was a "card-in-a-card" effect: outer pink rounded box, inner left-accent frame, content squeezed into the inner inner inner zone, both the form and the visible width feeling cramped on the operator's terminal. Drop the outer rounded-border wrapper from all three phases: - phaseSteps (renderStep): just indicator + dots + form.View() with a left padding. huh's per-field decoration is the only frame. - phaseRun (renderRunBody): just indicator + run-log. The log's per-line glyphs (✓ / ✗ / · / →) carry the visual structure; no surrounding box. - phaseDone (renderDoneBody): just indicator + summary. Summary rows already have outcome glyphs. Net: one visual container per phase (the huh form's own frame during steps; just text rhythm during run + done). Wider usable content area, cleaner reading.
The previous "drop the card" attempt was overcorrecting — operator
liked the outer pink rounded card (it's the wizard's identity).
The card-in-a-card squeeze was a different problem: lipgloss's
Height() clamp truncated the form, AND huh's option list squashed
to a single row because we never told it how much vertical space
it had.
Final wiring:
1. WindowSizeMsg → forward to active form with Width=cardW (so
description text wraps inside the card boundary) and Height=
9999. huh clamps option-list viewport to min(neededHeight,
msg.Height); a 9999 ceiling never clamps so the form renders
every option at natural size.
2. Outer rounded card returns: Border(RoundedBorder()) +
BorderForeground("212") + Padding(1, 3) + Width(cardW). NO
Height clamp on the card — it auto-grows to whatever huh's
natural rendered height is.
3. Body container has Height(bodyH); slack rows pad below the
card so the footer still pins to the alt-screen bottom.
Net behaviour: pink rounded card holds the form, every option
visible, footer pinned bottom, breathing room above + below.
Card width adapts responsively to terminal width via cardW =
m.width - 6.
…firm Stop fighting huh.Form embedded inside our parent tea.Program. After web research it's clear huh embedding has a strict contract (WithHeight() on the Form, .Height() on every Select, .Options() strictly before .Value(), no outer height-clamped wrapper) that fights every layout we tried. Symptoms we kept rediscovering: - huh's Select rendered ONLY the cursor row (operator saw just "none / decide later") because internal viewport sizing falls back to the cursor's minHeight=1 when our outer lipgloss style also clamps height. - WindowSizeMsg.Height is ignored by huh's per-field viewport; only WithHeight() / .Height() propagate. - Two viewports nested (huh's Select.viewport.Model + our rounded-card frame) produce unpredictable scroll math. Replace huh with three minimal custom widgets in onboard_widgets.go: - selectWidget — single-choice, ↑/↓ + enter, ~50 LOC. Renders every option every frame, no internal viewport. - multiSelectWidget — checklist, ↑/↓ + space toggle + a all/none + enter submit. Renders every option, no viewport. Operator- requested space-select keybind explicitly supported. - confirmWidget — yes/no, ←/→ or h/l toggle, y / n quick pick, enter submit. Each exposes a stepWidget interface (Update / View / Done / Keybinds). The wizard's outer model dispatches msgs to the active step's widget and renders widget.View() inside the rounded card. Footer pulls the active widget's Keybinds() so the hint line reflects the current step's actual keys (Select shows different keys than MultiSelect). Footer is now OUTSIDE the card per operator request: the card contains only the widget; the dim-bullet keybind row sits below the card at the alt-screen footer band. State capture pattern flipped: instead of huh's .Value(&ptr) two-way binding, the apply hook on each step calls widget.Value() or widget.Values() once at advance time and writes into onboardState. Cleaner control flow, no shared-pointer races during render. Net deletion of huh embed plumbing: ~80 LOC. Net addition: ~250 LOC of widgets + adapters. Worth it for the bug class elimination.
…tent The card was auto-sizing to each widget's natural height, so the wizard's frame jiggled between steps (a tall Select shrinking to a 4-row Confirm). Pin the rounded-border card to a constant 70w × 18h silhouette and centre the active widget inside it via lipgloss.Place(innerW, innerH, Center, Center, view). Net: every step renders the same rectangle in the same screen position, with the widget vertically + horizontally centred inside. A 4-row Confirm now sits in the middle of the same card a 12-row Select previously filled, so the visual rhythm of advancing through the wizard feels intentional instead of elastic. The card auto-clamps narrower on terminals < 70 cols, floor 50 cols.
Replace PaddingLeft(2) with Align(lipgloss.Center) on the header / step body / run body / done body / footer wrappers, and switch JoinVertical from lipgloss.Left to lipgloss.Center in the body phases. The wizard column now sits in the middle of the alt-screen instead of hugging the left edge. Visual axis: header centred → step indicator centred → progress dots centred → fixed-size pink card centred → footer hint centred. On wide terminals the empty space distributes evenly to either side; on narrow terminals the card auto-clamps narrower (already handled in renderStep) so nothing wraps.
Two operator-driven polishes: 1. Card width is now responsive: computeCardWidth(viewport) = viewport - 12, soft-capped at 120 cols + soft floor at 60. On a 200-col terminal the card now fills 120 cols (was pinned at 70); on an 80-col terminal it fills 68. The wide-screen case feels purposeful instead of "tiny dialog floating in a sea of empty space." 2. Header redesign: 3-line ASCII brand mark on the left (clawtoolLogo restored), stacked metadata column on the right (bold tagline `first-run setup · v0.22.52`, dim `from Cogitave · by @bahadirarda`, dim `help@cogitave.com`), filled-background pill row beneath. Detected hosts render as bright accent pills (`Background(212).Foreground(230).Bold`); missing hosts as dim padded text. The eye finds the bright pills instantly without scanning labels. Test snapshot relaxed: header tagline check now matches "first-run setup" prefix instead of "first-run setup wizard" (the redesigned tagline drops the trailing "wizard" since the banner already establishes that's what we're in).
Three operator-driven polishes: 1. ASCII logo redesign — swap from box-drawing "Future" font (3 rows tall) to Pagga-style chunky pixel font (2 rows tall): `█▀▀ █ ▄▀█ █ █ ▀█▀ █▀█ █▀█ █` `█▄▄ █▄▄ █▀█ ▀▄▀ █ █▄█ █▄█ █▄▄` Different glyph palette gives the brand mark more visual weight while staying compact (Unicode block elements render on every modern terminal). 2. Animation — new tickMsg + tickEvery() loop firing every 350ms. Update increments m.frame on each tick; renderStep uses (frame % 4) to cycle the active ◉ progress dot through four progressively brighter pinks (212 → 213 → 218 → 219). Subtle pulse pulls the operator's eye to "where am I now?" without being distracting. Init() seeds the first tick. 3. Spacing — added blank rows between header and body, and between progress dots and the card. The "Step X of Y" indicator now has visible breathing room above the card it describes.
Two operator-driven polishes: 1. The "W" in clawtoolLogo was rendering as a single-V silhouette because Pagga's W was using `█ █ / ▀▄▀` (3 cols, 1 peak). Switch to a proper double-peak W: `█ █ █ / █▄█▄█` (5 cols). The brand mark now reads as "claW-tool" not "clav-tool". 2. The previous animation (4-tone pulse on the active progress dot at 350ms cadence) was too subtle to register. Replace + complement with a Braille spinner glyph at the start of the step indicator: `⠋ Step 1 of 8 · Primary CLI`. The spinner cycles through the standard 10-frame Braille rotation (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏) every 120ms — a full revolution in ~1.2s, the instantly-recognizable "I'm alive" TUI signal. Tick interval bumped from 350ms to 120ms so the spinner rotates smoothly. The dot pulse still runs but is now secondary — the spinner carries the visible "live" weight.
Replace the Braille spinner on the step indicator with a gradient shimmer running across the clawtool ASCII brand mark itself — that's where the operator naturally looks for "is this thing alive?" feedback. Implementation: renderShimmerLogo() walks each glyph row and tints each non-space rune by its column distance from the current sweep position. Four colour stops form the band: distance 0 → 225 (almost white, brightest centre) distance ±1 → 219 (bright pink) distance ±2 → 213 (medium pink) elsewhere → 212 (base accent) The sweep position cycles from -2 to maxLen+2 (so the band enters and leaves off-screen) plus an 8-frame quiet pause after each sweep so the logo isn't constantly shimmering — the eye gets a beat to rest. Step indicator is back to plain text. The wizard's animation budget is 100% on the brand mark now: a soft "shine" passing through the logo every ~3-4 seconds.
Logo is 2 rows tall, meta column is 3 rows (tagline / credit / email). Joining them with `lipgloss.JoinHorizontal(lipgloss.Top, ...)` left the logo stuck to the top while the meta lines extended below — the brand row read as visually unbalanced. Switch to `JoinHorizontal(lipgloss.Center, ...)` so the shorter logo is vertically centred against the taller meta column. Also drop the leading blank that previously padded metaCol down to match the (then top-anchored) logo — no longer needed and was making the meta lines drift. Net: clawtool brand mark now sits in the middle band of the meta column's 3-line stack. The header reads as a balanced pair of elements rather than a top-anchored block with a tail.
…cally Two layout fixes: 1. Brand row: switch from `JoinHorizontal(Center)` to `JoinHorizontal(Bottom)`. Top stuck the logo to the top edge; center drifted it too far down (logo straddled tagline+credit asymmetrically). Bottom-align lines the 2-row logo up with the bottom 2 rows of the 3-row metaCol (credit + email), letting the tagline float above as a kicker. Visually balanced. 2. Body region: add `AlignVertical(lipgloss.Center)` to the body container in renderStep / renderRunBody / renderDoneBody. Previously the body filled to bodyH but content stuck to the top, leaving a big empty zone above the footer. Vertical- centring distributes the leftover slack evenly above and below the wizard's content (indicator + dots + card), so the wizard sits in the middle of the body region instead of cramming against the header band.
Onboard now degrades gracefully on narrow viewports (<70 cols) so it stays usable on mobile clients, tmux split panes, and docked terminal windows. Breakpoint: onboardCompactWidth = 70 cols. Below that: - Header switches to renderCompactHeader: drops the chunky 32-col ASCII logo and the labelled host-detection pill row. Renders one centred line — `clawtool v0.22.58 · first-run setup` — plus a glyph-only detection row (● ● ● ● ○) so the operator still sees what was found without spending vertical space on labels. - Footer switches to compactKeybinds: shortens `↑/↓ select · enter confirm · ctrl-c quit` to `↑↓ ↵ · ^c`. MultiSelect similarly compresses (`↑↓ ␣ a ↵`). The keys stay legible; the prose drops. - run + done phases shrink their footer copy too: `running 3/8 · ctrl-c quit` → `3/8`; `press any key to exit` → `any key`. Card width floor lowered from 60 to 40 cols (computeCardWidth) so the wizard renders inside even very narrow panes (40-col phone terminals). Card padding cuts in too — the form content stays inside the rounded frame at every supported size.
Signed-off-by: SafeSkill Scanner <mk@oya.ai>
|
Hi @OyaAIProd — thanks for running the scan against clawtool! Quick triage: Stale base. This PR's base ref Findings triage (8 high). Skimming the report:
Happy to consider a tightened scan profile — does SafeSkill expose a What I'd want to see before approving:
Marking the internal triage item as done — back to you for the rebase + scope clarification. /cc @bahadirarda96 |
🟠 SafeSkill Security Scan Results
Top Findings
commands/clawtool-dashboard.md:33)docs/portals.md:35)~/.aws" (docs/sandbox.md:17`)docs/sandbox.md:55)internal/setup/recipes/governance/assets/Apache-2.0.txt:147)View full report on SafeSkill
About SafeSkill
SafeSkill is a free, open-source security scanner for AI tools, MCP servers, and Claude Code skills. We scan for code exploits, prompt injection, and data exfiltration risks.
False positive? We take accuracy seriously. If any finding above is incorrect, please open an issue and we will fix it immediately.