From 02f2fe66b69dac86e4b8e034a0e6b5c175662961 Mon Sep 17 00:00:00 2001 From: Santosh Date: Wed, 8 Apr 2026 11:49:35 +0530 Subject: [PATCH 1/4] feat(skill+cli): agentfield-multi-reasoner-builder skill, af doctor, af init --docker MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds a Claude-Code-style skill that teaches any coding agent to design and ship complete multi-reasoner systems on AgentField, plus two new CLI commands that make the skill agent-first instead of human-first. ## New skill: skills/agentfield-multi-reasoner-builder/ A self-contained skill (1 SKILL.md + 6 reference files, ~11k words total) that turns a one-line user request into a runnable Docker-compose multi-reasoner system. Designed to be portable across coding agents (Claude Code, Cursor, Codex, Gemini) — references use plain markdown and absolute paths, no Claude-only @-syntax. Key sections: - HARD GATE blocking code-writing until references are read - Hard rejections + rationalization counters + red-flags table inlined in SKILL.md so they fire on every invocation - "Build now, key later" grooming protocol — one question max - Mandatory patterns: per-request model propagation, router-package layout when reasoners > 4, tags=["entry"] on the public reasoner - Harness availability gate that forbids app.harness() in default scaffolds (the python:3.11-slim container has no coding-agent CLI) - 3-option fallback pattern for .ai() gates: deeper reasoner, safe default (recommended for safety/regulated systems), or harness - Output contract that requires the agent to print the UI URL, verification ladder, and a sample curl using realistic data References: choosing-primitives.md (philosophy + verified SDK signatures), architecture-patterns.md (8 composition patterns from real AgentField examples), scaffold-recipe.md (canonical 4-file router-package layout + universal Dockerfile + offline validation checklist), verification.md (discovery API ladder), project-claude-template.md, anti-patterns.md. ## New CLI command: af doctor control-plane/internal/cli/doctor.go Single command that returns ground-truth environment JSON the skill consumes once instead of probing each tool by hand. Reports: - Available harness provider CLIs (claude-code, codex, gemini, opencode) - Provider API keys set (OPENROUTER, OPENAI, ANTHROPIC, GOOGLE) without leaking the value - Docker availability + control-plane image cache state + reachability - Python/Node versions - A `recommendation` block with the suggested provider, AI_MODEL string, and an explicit harness_usable boolean — directives for the agent, not just facts Both --json (canonical for tools/skills) and pretty (human) output modes. ## af init --docker flag control-plane/internal/cli/init.go control-plane/internal/templates/templates.go control-plane/internal/templates/docker/{Dockerfile,compose,env,dockerignore}.tmpl Adds --docker to af init, generating four zero-change infrastructure files alongside the existing language scaffold: - Dockerfile: universal python:3.11-slim, build context = project dir, installs agentfield from requirements.txt — works in-repo and standalone - docker-compose.yml: control-plane + agent service with healthcheck and depends_on condition: service_healthy - .env.example: all four provider keys + AI_MODEL with the --default-model value baked in - .dockerignore Visible flags trimmed to --docker and --default-model. Granular flags (--control-plane-image, --control-plane-port, --agent-port) hidden via MarkHidden() because they have correct defaults. CLAUDE.md and README.md are intentionally NOT generated by af init — those are produced by the skill AFTER the agent writes the real reasoner architecture, so they contain real names and real curl examples instead of TODO placeholders. ## Tested end-to-end A fresh non-Claude codex CLI subprocess (no auto-skill discovery, no prior context) successfully used the skill + new CLI commands to build a complete loan underwriting backend at /tmp/af-skill-test-build/ loan-underwriter/ — 14 files, 1412 lines, real composite intelligence (8 reasoners, parallel hunters + HUNT->PROVE adversarial + dynamic intake routing + deterministic governance overrides + fact registry with citation IDs). Both python3 -m py_compile and docker compose config validated cleanly. The codex self-assessment surfaced 5 real skill bugs which are fixed in this commit: 1. scaffold-recipe.md Dockerfile section now matches af init --docker output (universal, not repo-coupled) 2. Canonical 4-file router-package layout documented explicitly 3. .ai() fallback pattern lists 3 options including the deterministic safe-default Pydantic instance (the right answer for regulated/safety-critical systems) 4. python -> python3 across all validation commands for portability 5. "Build now, key later" rule explicit in the grooming protocol Co-Authored-By: Claude Opus 4.6 (1M context) --- control-plane/internal/cli/doctor.go | 349 ++++++++++++ control-plane/internal/cli/init.go | 67 ++- control-plane/internal/cli/root.go | 3 + .../templates/docker/.dockerignore.tmpl | 8 + .../templates/docker/.env.example.tmpl | 24 + .../templates/docker/docker-compose.yml.tmpl | 39 ++ .../templates/docker/python.Dockerfile.tmpl | 16 + control-plane/internal/templates/templates.go | 45 +- .../SKILL.md | 323 +++++++++++ .../references/anti-patterns.md | 143 +++++ .../references/architecture-patterns.md | 243 +++++++++ .../references/choosing-primitives.md | 502 ++++++++++++++++++ .../references/project-claude-template.md | 118 ++++ .../references/scaffold-recipe.md | 502 ++++++++++++++++++ .../references/verification.md | 108 ++++ 15 files changed, 2473 insertions(+), 17 deletions(-) create mode 100644 control-plane/internal/cli/doctor.go create mode 100644 control-plane/internal/templates/docker/.dockerignore.tmpl create mode 100644 control-plane/internal/templates/docker/.env.example.tmpl create mode 100644 control-plane/internal/templates/docker/docker-compose.yml.tmpl create mode 100644 control-plane/internal/templates/docker/python.Dockerfile.tmpl create mode 100644 skills/agentfield-multi-reasoner-builder/SKILL.md create mode 100644 skills/agentfield-multi-reasoner-builder/references/anti-patterns.md create mode 100644 skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md create mode 100644 skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md create mode 100644 skills/agentfield-multi-reasoner-builder/references/project-claude-template.md create mode 100644 skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md create mode 100644 skills/agentfield-multi-reasoner-builder/references/verification.md diff --git a/control-plane/internal/cli/doctor.go b/control-plane/internal/cli/doctor.go new file mode 100644 index 000000000..6abcae296 --- /dev/null +++ b/control-plane/internal/cli/doctor.go @@ -0,0 +1,349 @@ +package cli + +import ( + "context" + "encoding/json" + "fmt" + "net/http" + "os" + "os/exec" + "runtime" + "strings" + "time" + + "github.com/fatih/color" + "github.com/spf13/cobra" +) + +// DoctorReport is the JSON structure returned by `af doctor --json`. +// Coding agents (and skills like agentfield-multi-reasoner-builder) call this once +// to learn what's actually available in the environment instead of probing manually. +type DoctorReport struct { + OS string `json:"os"` + Arch string `json:"arch"` + Python ToolStatus `json:"python"` + Node ToolStatus `json:"node"` + Docker ToolStatus `json:"docker"` + HarnessProviders map[string]ToolStatus `json:"harness_providers"` + ProviderKeys map[string]ProviderKey `json:"provider_keys"` + ControlPlane ControlPlaneStatus `json:"control_plane"` + Recommendation Recommendation `json:"recommendation"` +} + +// ToolStatus describes whether a CLI is available and, if so, where. +type ToolStatus struct { + Available bool `json:"available"` + Path string `json:"path,omitempty"` + Version string `json:"version,omitempty"` +} + +// ProviderKey reports whether a provider's API key env var is set +// (without ever leaking the value). +type ProviderKey struct { + EnvVar string `json:"env_var"` + Set bool `json:"set"` +} + +// ControlPlaneStatus reports whether a local control plane is reachable +// and whether the Docker image is locally available. +type ControlPlaneStatus struct { + URL string `json:"url"` + Reachable bool `json:"reachable"` + HealthStatus string `json:"health_status,omitempty"` + DockerImageName string `json:"docker_image_name"` + DockerImageLocal bool `json:"docker_image_local"` +} + +// Recommendation tells the caller (a skill or a coding agent) what to default to, +// based on what's actually present in the environment. +type Recommendation struct { + Provider string `json:"provider"` // "openrouter" / "openai" / "anthropic" / "google" / "none" + AIModel string `json:"ai_model"` // suggested LiteLLM-style model string + HarnessUsable bool `json:"harness_usable"` // true only if at least one provider CLI is on PATH + HarnessProviders []string `json:"harness_providers"` // available provider CLI names + Notes []string `json:"notes"` // human-readable suggestions +} + +// providerEnvVars maps provider name -> env var. Order matters for the recommendation. +var providerEnvVars = []struct { + Name string + EnvVar string + Model string // suggested default model when this provider is the chosen one +}{ + {Name: "openrouter", EnvVar: "OPENROUTER_API_KEY", Model: "openrouter/anthropic/claude-3.5-sonnet"}, + {Name: "anthropic", EnvVar: "ANTHROPIC_API_KEY", Model: "claude-3-5-sonnet-20241022"}, + {Name: "openai", EnvVar: "OPENAI_API_KEY", Model: "gpt-4o"}, + {Name: "google", EnvVar: "GOOGLE_API_KEY", Model: "gemini-1.5-pro"}, +} + +// harnessProviders is the canonical list of CLIs `app.harness()` knows how to drive. +var harnessProviders = []struct { + Name string // value passed to provider= in app.harness() + Binary string // executable name to look up on PATH +}{ + {Name: "claude-code", Binary: "claude"}, + {Name: "codex", Binary: "codex"}, + {Name: "gemini", Binary: "gemini"}, + {Name: "opencode", Binary: "opencode"}, +} + +// NewDoctorCommand builds the `af doctor` command. +func NewDoctorCommand() *cobra.Command { + var jsonOut bool + var controlPlaneURL string + + cmd := &cobra.Command{ + Use: "doctor", + Short: "Inspect the local environment for AgentField development capabilities", + Long: `Doctor inspects the local environment and reports what's available for +building AgentField multi-reasoner systems: + + • Available harness provider CLIs (claude-code, codex, gemini, opencode) + • Provider API keys set in the environment (without leaking values) + • Docker availability and whether the control-plane image is locally cached + • Whether a local control plane is reachable + • A recommended default provider, model, and whether app.harness() is usable + +Coding agents and skills (e.g. agentfield-multi-reasoner-builder) should call +this once at the start of a build to learn ground truth instead of probing +each tool by hand. + +Examples: + af doctor # Pretty human-readable output + af doctor --json # Machine-readable JSON for tooling/skills + af doctor --json | jq # Pipe to jq for filtering`, + RunE: func(cmd *cobra.Command, args []string) error { + report := buildDoctorReport(controlPlaneURL) + + if jsonOut { + enc := json.NewEncoder(os.Stdout) + enc.SetIndent("", " ") + return enc.Encode(report) + } + + printDoctorReport(report) + return nil + }, + } + + cmd.Flags().BoolVar(&jsonOut, "json", false, "Output the report as JSON (recommended for tools and skills)") + cmd.Flags().StringVar(&controlPlaneURL, "server", "http://localhost:8080", "Control plane URL to probe for /api/v1/health") + + return cmd +} + +// buildDoctorReport collects the full environment snapshot. +func buildDoctorReport(controlPlaneURL string) DoctorReport { + report := DoctorReport{ + OS: runtime.GOOS, + Arch: runtime.GOARCH, + Python: checkTool("python3", "--version"), + Node: checkTool("node", "--version"), + Docker: checkTool("docker", "--version"), + HarnessProviders: map[string]ToolStatus{}, + ProviderKeys: map[string]ProviderKey{}, + } + + // Some systems use "python" instead of "python3" + if !report.Python.Available { + report.Python = checkTool("python", "--version") + } + + // Harness CLIs + availableHarness := []string{} + for _, h := range harnessProviders { + status := checkTool(h.Binary, "--version") + report.HarnessProviders[h.Name] = status + if status.Available { + availableHarness = append(availableHarness, h.Name) + } + } + + // Provider keys + chosenProvider := "" + chosenModel := "" + for _, p := range providerEnvVars { + set := strings.TrimSpace(os.Getenv(p.EnvVar)) != "" + report.ProviderKeys[p.Name] = ProviderKey{EnvVar: p.EnvVar, Set: set} + if set && chosenProvider == "" { + chosenProvider = p.Name + chosenModel = p.Model + } + } + + // Control plane + report.ControlPlane = checkControlPlane(controlPlaneURL) + + // Recommendation + notes := []string{} + if chosenProvider == "" { + chosenProvider = "none" + chosenModel = "openrouter/anthropic/claude-3.5-sonnet" + notes = append(notes, "No provider API key detected. Set OPENROUTER_API_KEY (recommended) or OPENAI_API_KEY / ANTHROPIC_API_KEY before building.") + } else { + notes = append(notes, fmt.Sprintf("Provider key detected: %s. Default model: %s", chosenProvider, chosenModel)) + } + + if len(availableHarness) == 0 { + notes = append(notes, "No harness provider CLIs available. Do NOT use app.harness() in scaffolds — use app.ai(tools=[...]) or chunked-loop reasoners instead.") + } else { + notes = append(notes, fmt.Sprintf("Harness providers available: %s. app.harness(provider=...) is usable.", strings.Join(availableHarness, ", "))) + } + + if !report.Docker.Available { + notes = append(notes, "Docker not available. Generated docker-compose.yml will validate with `docker compose config` but cannot be run locally.") + } + + if !report.ControlPlane.DockerImageLocal && report.Docker.Available { + notes = append(notes, fmt.Sprintf("Control plane image %s not present locally. First `docker compose up` will pull it.", report.ControlPlane.DockerImageName)) + } + + report.Recommendation = Recommendation{ + Provider: chosenProvider, + AIModel: chosenModel, + HarnessUsable: len(availableHarness) > 0, + HarnessProviders: availableHarness, + Notes: notes, + } + + return report +} + +// checkTool runs ` ` and reports whether the binary is on PATH. +func checkTool(bin, versionFlag string) ToolStatus { + path, err := exec.LookPath(bin) + if err != nil { + return ToolStatus{Available: false} + } + status := ToolStatus{Available: true, Path: path} + + ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second) + defer cancel() + + cmd := exec.CommandContext(ctx, bin, versionFlag) + out, err := cmd.CombinedOutput() + if err == nil { + // First line of version output is usually enough; trim noise. + first := strings.SplitN(strings.TrimSpace(string(out)), "\n", 2)[0] + status.Version = first + } + return status +} + +// checkControlPlane probes the control plane and checks for the docker image. +func checkControlPlane(url string) ControlPlaneStatus { + status := ControlPlaneStatus{ + URL: url, + DockerImageName: "agentfield/control-plane:latest", + } + + // Health check + ctx, cancel := context.WithTimeout(context.Background(), 1500*time.Millisecond) + defer cancel() + req, _ := http.NewRequestWithContext(ctx, http.MethodGet, strings.TrimRight(url, "/")+"/api/v1/health", nil) + resp, err := http.DefaultClient.Do(req) + if err == nil { + defer resp.Body.Close() + status.Reachable = resp.StatusCode == 200 + status.HealthStatus = resp.Status + } + + // Local docker image + if _, err := exec.LookPath("docker"); err == nil { + ctx2, cancel2 := context.WithTimeout(context.Background(), 2*time.Second) + defer cancel2() + out, err := exec.CommandContext(ctx2, "docker", "image", "inspect", status.DockerImageName).CombinedOutput() + if err == nil && len(out) > 0 && !strings.Contains(string(out), "No such image") { + status.DockerImageLocal = true + } + } + + return status +} + +// printDoctorReport renders the report in human-readable form. +func printDoctorReport(r DoctorReport) { + bold := color.New(color.Bold) + green := color.New(color.FgGreen) + red := color.New(color.FgRed) + yellow := color.New(color.FgYellow) + cyan := color.New(color.FgCyan) + + bold.Println("AgentField environment doctor") + fmt.Printf(" os/arch: %s/%s\n\n", r.OS, r.Arch) + + bold.Println("Runtimes") + printToolLine("python", r.Python, green, red) + printToolLine("node", r.Node, green, red) + printToolLine("docker", r.Docker, green, red) + fmt.Println() + + bold.Println("Harness provider CLIs (for app.harness)") + for _, h := range harnessProviders { + printToolLine(h.Name, r.HarnessProviders[h.Name], green, red) + } + fmt.Println() + + bold.Println("Provider API keys") + for _, p := range providerEnvVars { + key := r.ProviderKeys[p.Name] + mark := "✗" + c := red + if key.Set { + mark = "✓" + c = green + } + c.Printf(" %s %-12s %s (%s)\n", mark, p.Name, key.EnvVar, ifThen(key.Set, "set", "unset")) + } + fmt.Println() + + bold.Println("Control plane") + cyan.Printf(" url: %s\n", r.ControlPlane.URL) + mark := "✗ unreachable" + c := red + if r.ControlPlane.Reachable { + mark = "✓ reachable (" + r.ControlPlane.HealthStatus + ")" + c = green + } + c.Printf(" %s\n", mark) + mark = "✗ image not cached" + c = yellow + if r.ControlPlane.DockerImageLocal { + mark = "✓ image cached locally" + c = green + } + c.Printf(" %s (%s)\n", mark, r.ControlPlane.DockerImageName) + fmt.Println() + + bold.Println("Recommendation") + cyan.Printf(" provider: %s\n", r.Recommendation.Provider) + cyan.Printf(" AI_MODEL: %s\n", r.Recommendation.AIModel) + cyan.Printf(" harness usable: %v\n", r.Recommendation.HarnessUsable) + if len(r.Recommendation.HarnessProviders) > 0 { + cyan.Printf(" harness providers: %s\n", strings.Join(r.Recommendation.HarnessProviders, ", ")) + } + for _, note := range r.Recommendation.Notes { + fmt.Printf(" • %s\n", note) + } + fmt.Println() + fmt.Println("Tip: pipe to jq for tooling — `af doctor --json | jq`") +} + +func printToolLine(name string, status ToolStatus, ok, fail *color.Color) { + if status.Available { + ok.Printf(" ✓ %-12s %s", name, status.Path) + if status.Version != "" { + fmt.Printf(" (%s)", status.Version) + } + fmt.Println() + } else { + fail.Printf(" ✗ %-12s not found\n", name) + } +} + +func ifThen(b bool, t, f string) string { + if b { + return t + } + return f +} diff --git a/control-plane/internal/cli/init.go b/control-plane/internal/cli/init.go index c3479e8a4..5434c6fc9 100644 --- a/control-plane/internal/cli/init.go +++ b/control-plane/internal/cli/init.go @@ -207,6 +207,11 @@ func NewInitCommand() *cobra.Command { var language string var nonInteractive bool var useDefaults bool + var withDocker bool + var controlPlaneImage string + var controlPlanePort int + var agentPort int + var defaultModel string cmd := &cobra.Command{ Use: "init [project-name]", @@ -327,14 +332,18 @@ Example: // Prepare template data data := templates.TemplateData{ - ProjectName: projectName, - NodeID: nodeID, - GoModule: projectName, // Use project name as Go module - AuthorName: authorName, - AuthorEmail: authorEmail, - CurrentYear: time.Now().Year(), - CreatedAt: time.Now().Format("2006-01-02 15:04:05 MST"), - Language: language, + ProjectName: projectName, + NodeID: nodeID, + GoModule: projectName, // Use project name as Go module + AuthorName: authorName, + AuthorEmail: authorEmail, + CurrentYear: time.Now().Year(), + CreatedAt: time.Now().Format("2006-01-02 15:04:05 MST"), + Language: language, + ControlPlaneImage: controlPlaneImage, + ControlPlanePort: controlPlanePort, + AgentPort: agentPort, + DefaultModel: defaultModel, } // Create project directory @@ -380,6 +389,34 @@ Example: printSuccess(" ✓ %s", destPath) } + // If --docker is set, also generate Docker scaffold templates + if withDocker { + printInfo("🐳 Adding Docker scaffold...") + dockerTemplates := templates.GetDockerTemplateFiles(language) + for tmplPath, destPath := range dockerTemplates { + tmpl, err := templates.GetTemplate(tmplPath) + if err != nil { + printError("Error parsing docker template %s: %v", tmplPath, err) + return fmt.Errorf("error parsing docker template %s: %w", tmplPath, err) + } + + var buf strings.Builder + if err := tmpl.Execute(&buf, data); err != nil { + printError("Error executing docker template %s: %v", tmplPath, err) + return fmt.Errorf("error executing docker template %s: %w", tmplPath, err) + } + + fullDestPath := filepath.Join(projectPath, destPath) + // README.md may already exist from the language template — overwrite with the docker variant. + if err := os.WriteFile(fullDestPath, []byte(buf.String()), 0o644); err != nil { + printError("Error writing docker file %s: %v", fullDestPath, err) + return fmt.Errorf("error writing docker file %s: %w", fullDestPath, err) + } + + printSuccess(" ✓ %s", destPath) + } + } + // Print success message fmt.Println() printSuccess("🚀 Agent '%s' created successfully!", projectName) @@ -445,6 +482,20 @@ Example: cmd.Flags().StringVarP(&authorEmail, "email", "e", "", "Author email for the project") cmd.Flags().BoolVar(&nonInteractive, "non-interactive", false, "Run in non-interactive mode (use defaults)") cmd.Flags().BoolVar(&useDefaults, "defaults", false, "Skip prompts and generate a project with default settings") + // --docker: the only new visible flag. Ships the four infrastructure files + // (Dockerfile, docker-compose.yml, .env.example, .dockerignore). CLAUDE.md + // and README.md are intentionally NOT generated — the skill produces them + // after the agent has written real reasoners. + cmd.Flags().BoolVar(&withDocker, "docker", false, "Also generate a Docker scaffold (Dockerfile, docker-compose.yml, .env.example, .dockerignore)") + cmd.Flags().StringVar(&defaultModel, "default-model", "openrouter/anthropic/claude-3.5-sonnet", "Default AI_MODEL string baked into the docker scaffold (LiteLLM-style, e.g. gpt-4o, anthropic/claude-3-5-sonnet-20241022)") + + // Hidden flags — sensible defaults; only set when you have a real reason. + cmd.Flags().StringVar(&controlPlaneImage, "control-plane-image", "agentfield/control-plane:latest", "") + cmd.Flags().IntVar(&controlPlanePort, "control-plane-port", 8080, "") + cmd.Flags().IntVar(&agentPort, "agent-port", 8001, "") + _ = cmd.Flags().MarkHidden("control-plane-image") + _ = cmd.Flags().MarkHidden("control-plane-port") + _ = cmd.Flags().MarkHidden("agent-port") if err := viper.BindPFlag("language", cmd.Flags().Lookup("language")); err != nil { printError("failed to bind language flag: %v", err) diff --git a/control-plane/internal/cli/root.go b/control-plane/internal/cli/root.go index 06c16c09f..a3fe15f19 100644 --- a/control-plane/internal/cli/root.go +++ b/control-plane/internal/cli/root.go @@ -90,6 +90,9 @@ AI Agent? Run "af agent help" for structured JSON output optimized for programma // Add init command RootCmd.AddCommand(NewInitCommand()) + // Add doctor command — environment introspection for skills/coding agents + RootCmd.AddCommand(NewDoctorCommand()) + // Create service container for framework commands cfg := &config.Config{} // Use default config for now services := application.CreateServiceContainer(cfg, getAgentFieldHomeDir()) diff --git a/control-plane/internal/templates/docker/.dockerignore.tmpl b/control-plane/internal/templates/docker/.dockerignore.tmpl new file mode 100644 index 000000000..edabc43f1 --- /dev/null +++ b/control-plane/internal/templates/docker/.dockerignore.tmpl @@ -0,0 +1,8 @@ +__pycache__ +*.pyc +.pytest_cache +.env +.venv +*.log +.git +.DS_Store diff --git a/control-plane/internal/templates/docker/.env.example.tmpl b/control-plane/internal/templates/docker/.env.example.tmpl new file mode 100644 index 000000000..29e52e091 --- /dev/null +++ b/control-plane/internal/templates/docker/.env.example.tmpl @@ -0,0 +1,24 @@ +# --- Provider keys (set ONE) --- +# OpenRouter (recommended default — single key, many models) +OPENROUTER_API_KEY=sk-or-v1-... + +# Or OpenAI +# OPENAI_API_KEY=sk-... + +# Or Anthropic +# ANTHROPIC_API_KEY=sk-ant-... + +# Or Google +# GOOGLE_API_KEY=... + +# --- Model selection --- +# Must match the provider above. LiteLLM-style model strings. +AI_MODEL={{.DefaultModel}} +# AI_MODEL=gpt-4o +# AI_MODEL=anthropic/claude-3-5-sonnet-20241022 +# AI_MODEL=gemini/gemini-1.5-pro + +# --- Optional overrides --- +AGENT_NODE_ID={{.NodeID}} +AGENT_NODE_PORT={{.AgentPort}} +AGENTFIELD_HTTP_PORT={{.ControlPlanePort}} diff --git a/control-plane/internal/templates/docker/docker-compose.yml.tmpl b/control-plane/internal/templates/docker/docker-compose.yml.tmpl new file mode 100644 index 000000000..32b9a048d --- /dev/null +++ b/control-plane/internal/templates/docker/docker-compose.yml.tmpl @@ -0,0 +1,39 @@ +services: + control-plane: + image: {{.ControlPlaneImage}} + environment: + AGENTFIELD_STORAGE_MODE: local + AGENTFIELD_HTTP_ADDR: 0.0.0.0:8080 + ports: + - "${AGENTFIELD_HTTP_PORT:-{{.ControlPlanePort}}}:8080" + volumes: + - agentfield-data:/data + healthcheck: + test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"] + interval: 3s + timeout: 2s + retries: 20 + + {{.NodeID}}: + build: + context: . + dockerfile: Dockerfile + environment: + AGENTFIELD_SERVER: http://control-plane:8080 + AGENT_CALLBACK_URL: http://{{.NodeID}}:{{.AgentPort}} + AGENT_NODE_ID: ${AGENT_NODE_ID:-{{.NodeID}}} + OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-} + OPENAI_API_KEY: ${OPENAI_API_KEY:-} + ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-} + GOOGLE_API_KEY: ${GOOGLE_API_KEY:-} + AI_MODEL: ${AI_MODEL:-{{.DefaultModel}}} + PORT: ${PORT:-{{.AgentPort}}} + ports: + - "${AGENT_NODE_PORT:-{{.AgentPort}}}:{{.AgentPort}}" + depends_on: + control-plane: + condition: service_healthy + restart: on-failure + +volumes: + agentfield-data: diff --git a/control-plane/internal/templates/docker/python.Dockerfile.tmpl b/control-plane/internal/templates/docker/python.Dockerfile.tmpl new file mode 100644 index 000000000..214196823 --- /dev/null +++ b/control-plane/internal/templates/docker/python.Dockerfile.tmpl @@ -0,0 +1,16 @@ +FROM python:3.11-slim + +ENV PYTHONDONTWRITEBYTECODE=1 \ + PYTHONUNBUFFERED=1 + +WORKDIR /app + +COPY requirements.txt /app/requirements.txt +RUN pip install --no-cache-dir --upgrade pip && \ + pip install --no-cache-dir -r /app/requirements.txt + +COPY . /app/ + +EXPOSE {{.AgentPort}} + +CMD ["python", "main.py"] diff --git a/control-plane/internal/templates/templates.go b/control-plane/internal/templates/templates.go index 1fa6230c6..16c27ac1a 100644 --- a/control-plane/internal/templates/templates.go +++ b/control-plane/internal/templates/templates.go @@ -8,19 +8,24 @@ import ( "text/template" ) -//go:embed python/*.tmpl go/*.tmpl typescript/*.tmpl +//go:embed python/*.tmpl go/*.tmpl typescript/*.tmpl docker/*.tmpl var content embed.FS // TemplateData holds the data to be passed to the templates. type TemplateData struct { - ProjectName string // "my-awesome-agent" - NodeID string // "my-awesome-agent" (same as ProjectName) - GoModule string // "my-awesome-agent" (Go module name) - AuthorName string // "John Doe" - AuthorEmail string // "john@example.com" - CurrentYear int // 2025 - CreatedAt string // "2025-01-05 10:30:00 EST" - Language string // "python", "go", or "typescript" + ProjectName string // "my-awesome-agent" + NodeID string // "my-awesome-agent" (same as ProjectName) + GoModule string // "my-awesome-agent" (Go module name) + AuthorName string // "John Doe" + AuthorEmail string // "john@example.com" + CurrentYear int // 2025 + CreatedAt string // "2025-01-05 10:30:00 EST" + Language string // "python", "go", or "typescript" + // Docker scaffold fields (used only when --docker is set on `af init`) + ControlPlaneImage string // "agentfield/control-plane:latest" + ControlPlanePort int // 8080 + AgentPort int // 8001 + DefaultModel string // "openrouter/anthropic/claude-3.5-sonnet" } // GetTemplate retrieves a specific template by its path. @@ -74,3 +79,25 @@ func ReadTemplateContent(path string) ([]byte, error) { func GetSupportedLanguages() []string { return []string{"python", "go", "typescript"} } + +// GetDockerTemplateFiles returns the minimal Docker infrastructure scaffold for a +// given language. Deliberately scoped to the four files an agent will NEVER need +// to customize: Dockerfile, docker-compose.yml, .env.example, .dockerignore. +// +// CLAUDE.md and README.md are NOT generated here — those are produced by the +// agentfield-multi-reasoner-builder skill AFTER the agent has written the real +// reasoner architecture in main.py, so they can contain real reasoner names, +// real curl examples, and a real architectural justification instead of +// placeholders that ship with TODO markers. +func GetDockerTemplateFiles(language string) map[string]string { + files := map[string]string{ + "docker/docker-compose.yml.tmpl": "docker-compose.yml", + "docker/.env.example.tmpl": ".env.example", + "docker/.dockerignore.tmpl": ".dockerignore", + } + switch language { + case "python": + files["docker/python.Dockerfile.tmpl"] = "Dockerfile" + } + return files +} diff --git a/skills/agentfield-multi-reasoner-builder/SKILL.md b/skills/agentfield-multi-reasoner-builder/SKILL.md new file mode 100644 index 000000000..9614c8bc1 --- /dev/null +++ b/skills/agentfield-multi-reasoner-builder/SKILL.md @@ -0,0 +1,323 @@ +--- +name: agentfield-multi-reasoner-builder +description: Architect and ship a complete multi-agent backend system on AgentField from a one-line user request. Use when the user asks to build, scaffold, design, or ship an agent system, multi-agent pipeline, reasoner network, AgentField project, financial reviewer, research agent, compliance agent, or any LLM composition that should outperform LangChain/CrewAI/AutoGen — especially when they want a runnable Docker-compose stack and a working curl smoke test. +--- + +# AgentField Multi-Reasoner Builder + +You are not a prompt engineer. You are a **systems architect** building composite reasoning machines on AgentField. The intelligence is in the composition, not the components. + +## HARD GATE — READ BEFORE ANYTHING ELSE + +> **Do NOT write any code, generate any file, or scaffold any project until you have:** +> 1. Asked the user the ONE grooming question (below) and received their answer +> 2. Read `references/choosing-primitives.md` (mandatory — sets the philosophy and the real SDK signatures) +> 3. Designed the reasoner topology (which `@app.reasoner` units, who calls whom, which are `.ai` vs deterministic skills, where the dynamic routing happens) +> +> **Do NOT default to a single big reasoner with one `app.ai` call.** That's a CrewAI clone. Decompose. If you cannot draw your system as a non-trivial graph, you have not architected anything. +> +> Violating the letter of this gate is violating the spirit of the gate. There are no exceptions for "simple" use cases. + +## The non-negotiable promise + +Every invocation of this skill must end with the user able to run **two commands** and get a working multi-reasoner system: + +```bash +docker compose up --build +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"...": "..."}}' +``` + +If you cannot deliver that, you have failed. No theoretical architectures. No "here's how you would do it." A running stack and a curl that returns a real reasoned answer. + +**Note the curl body shape: `{"input": {...kwargs...}}`** — the control plane wraps reasoner kwargs in an `input` field. Verified against `control-plane/internal/handlers/execute.go:1000`. Many coding agents get this wrong. + +## Workflow (universal — works for any coding agent) + +1. **Announce** you're using the `agentfield-multi-reasoner-builder` skill. +2. **Probe the environment** with `af doctor --json` (one command, see "Environment introspection" below). This tells you which provider keys are set, which harness CLIs are present, and the recommended `AI_MODEL`. Use this output instead of guessing. +3. **Ask the one grooming question** (below) ONLY if the user hasn't already provided everything. +4. **Read `choosing-primitives.md` ALWAYS.** Read other references when their trigger fires (table below). +5. **Design the topology** before writing files. +6. **Lay down infrastructure** with `af init --language python --docker --defaults --non-interactive --default-model ` (one command, see "Infrastructure scaffold" below). +7. **Customize `main.py` and `reasoners.py`** with the real reasoner architecture per `scaffold-recipe.md`. Generate `CLAUDE.md` (from `project-claude-template.md`) and `README.md` AFTER you know the entry reasoner name and the curl payload. +8. **Validate**: `python3 -m py_compile main.py`, `docker compose config`, ideally `docker compose up --build` + verification ladder. +9. **Hand off** with the output contract below. + +## Environment introspection: `af doctor` + +Run this **once** at the start of every build. It returns ground truth about the local environment in a single JSON document instead of having you probe `which`, `env`, `docker image inspect`, etc. yourself: + +```bash +af doctor --json +``` + +Key fields you'll consume: +- `recommendation.provider` — `openrouter` / `openai` / `anthropic` / `google` / `none` +- `recommendation.ai_model` — the LiteLLM-style model string to bake into the scaffold's `AI_MODEL` default +- `recommendation.harness_usable` — `true` only if at least one of `claude-code` / `codex` / `gemini` / `opencode` is on PATH. **If `false`, do not use `app.harness()` in the scaffold under any circumstance.** +- `recommendation.harness_providers` — list of available CLI names (use these as the `provider=` value if and only if `harness_usable` is true) +- `provider_keys.{name}.set` — boolean per provider (no values leaked) +- `control_plane.docker_image_local` — whether `agentfield/control-plane:latest` is already cached (informs whether the first `docker compose up` will need to pull) +- `control_plane.reachable` — whether a control plane is already running locally (so you can curl test reasoners against it before building your own) + +**Use the doctor's output to set the `--default-model` flag on `af init` and to decide whether `app.harness()` is even an option in the architecture.** Do not hardcode your assumptions about the environment. + +## Infrastructure scaffold: `af init --docker` + +Run this **once** after `af doctor` and your architecture design. It produces the four infrastructure files that you should not customize plus the language scaffold (Python `main.py`, `reasoners.py`, `requirements.txt`): + +```bash +af init --language python --docker --defaults --non-interactive \ + --default-model +``` + +What it generates: +- `Dockerfile` — universal Python 3.11-slim, builds from project dir, no repo coupling +- `docker-compose.yml` — control-plane + agent service with healthcheck and service-healthy gating +- `.env.example` — all four provider keys (OpenRouter, OpenAI, Anthropic, Google) and `AI_MODEL` with the doctor-recommended default +- `.dockerignore` +- `main.py`, `reasoners.py`, `requirements.txt`, `README.md`, `.gitignore` — the standard language scaffold (you'll **rewrite `main.py` and `reasoners.py`** with your real architecture) + +What it does NOT generate (intentionally): +- `CLAUDE.md` — you generate this from `references/project-claude-template.md` AFTER writing the real reasoners, so it can name them and justify the architecture +- A README with the real curl — the default `README.md` is generic; you replace it AFTER picking the entry reasoner so the curl uses real kwargs + +The four infrastructure files are zero-change for the agent: Dockerfile installs `agentfield` from `requirements.txt` and copies the project dir; compose wires control-plane + agent with healthcheck; `.env.example` exposes all providers; `.dockerignore` covers the standard cases. **Do not modify them unless you have a real reason.** + +## Reference table — load when + +| File | Load when | +|---|---| +| `choosing-primitives.md` | **Every invocation** — before any code | +| `architecture-patterns.md` | Designing inter-reasoner flow / picking HUNT→PROVE, parallel hunters, fan-out, streaming, meta-prompting | +| `scaffold-recipe.md` | Actually writing files / docker-compose / Dockerfile | +| `verification.md` | Writing the smoke test ladder or declaring done | +| `project-claude-template.md` | Generating the per-project CLAUDE.md (always) | +| `anti-patterns.md` | When tempted to take a shortcut OR when the user pushes back on a rejection | + +Reference files are one level deep from this file. Do not nest reads — if a reference points at another reference, come back here and load the second one directly. + +## The grooming protocol (1 question, then build) + +Ask **exactly one** question and **one** key request. Nothing else upfront: + +> "Tell me in 1–2 sentences what you want this agent system to do, and paste your provider key. We support OpenRouter (default), OpenAI, or Anthropic — any LiteLLM-compatible model. Example: `OPENROUTER_API_KEY=sk-or-v1-...`" + +**Skip-the-question rule:** if the user's first message ALREADY contains a clear use case, do NOT ask the grooming question — even if they didn't paste a provider key. This is the **"build now, key later"** policy: + +- If the user gives a clear use case AND a provider key → proceed straight to design + build +- If the user gives a clear use case AND says they'll paste the key into `.env` later → ALSO proceed straight to design + build. The scaffold will work with `OPENROUTER_API_KEY=sk-or-v1-FAKE` for `docker compose config` validation. The user runs the real key from `.env` when they're ready +- If the user gives a clear use case AND says nothing about a key → proceed straight to design + build. The `.env.example` you generate makes it obvious where to put the key +- If the user's request is genuinely vague or ambiguous along an architecture-changing axis → THEN ask one question + +The point is to **never block the build on a key the user is going to drop into `.env` themselves**. Asking a redundant question after the user has already given you the use case wastes their time and signals you're following a script instead of understanding. + +Then proceed. Infer everything else from the use case. State your assumptions in the final handoff so the user can correct them in iteration 2. + +**Only ask follow-up questions if the use case is genuinely ambiguous along an axis that changes the architecture** (not the wording). Examples that warrant a follow-up: + +- Input is a 200-page document vs. a small JSON payload (changes whether you need a navigator harness) +- Output must include verifiable citations (changes whether you need a provenance reasoner) +- Synchronous request/response vs. event-driven (pattern 8 vs. patterns 1–7) + +Examples that do **NOT** warrant a follow-up: model preference, file naming, port number, code style, what to call the entry reasoner. Decide and state. + +## The five primitives (cheat sheet — full detail in `choosing-primitives.md`) + +- **`@app.reasoner()`** — every cognitive unit. Schemas come from **type hints** (no `input_schema=` param exists). +- **`@app.skill()`** — deterministic functions. No LLM. Use whenever an LLM call is overkill. +- **`app.ai(system, user, schema, model, tools, ...)`** — single OR multi-turn LLM call. `tools=[...]` makes it stateful. `model="..."` per call overrides AIConfig default. +- **`app.harness(prompt, provider="claude-code"|"codex"|"gemini"|"opencode")`** — delegates to an external coding-agent CLI. **Not** a generic tool-using LLM (that's `app.ai(tools=[...])`). **REQUIRES** the chosen provider's CLI to be installed inside the agent container — see "Harness availability gate" below. +- **`app.call(target, **kwargs)`** — inter-reasoner traffic THROUGH the control plane. Returns `dict`. **No model override param** — thread `model` as a regular reasoner kwarg. + +**The bias:** many small `@app.reasoner()` units. `@app.skill()` for anything code can do. `app.ai()` with explicit prompts. Reserve `app.harness()` for real coding-agent delegation. + +## Harness availability gate (READ BEFORE USING `app.harness()`) + +`app.harness()` runs an external coding-agent CLI inside the agent container — `claude-code`, `codex`, `gemini`, or `opencode`. **The default `python:3.11-slim` Docker image has none of these installed.** A scaffold that uses `app.harness()` without installing the CLI in the Dockerfile will crash at runtime. + +**The check is automated.** `af doctor --json` reports `recommendation.harness_usable` (true/false) and `recommendation.harness_providers` (the list of CLIs on PATH). Use the doctor output as the source of truth — do not assume. + +**Default rule:** scaffolds **MUST NOT** use `app.harness()` at all when `recommendation.harness_usable == false`. Use `app.ai(tools=[...])` for stateful reasoning, or a `@app.reasoner()` that loops `app.ai()` for chunked work. These work in the default container with zero extra setup. + +**You may use `app.harness()` ONLY when ALL of the following are true:** + +1. The use case **genuinely requires a real coding agent** in the loop — i.e. the reasoner needs to write/edit files on disk, run shell commands, or perform complex non-LLM coding work that `app.ai(tools=[...])` cannot do. +2. You modify the Dockerfile to install the chosen provider's CLI. Example for Claude Code: + ```dockerfile + RUN apt-get update && apt-get install -y --no-install-recommends nodejs npm \ + && npm install -g @anthropic-ai/claude-code \ + && rm -rf /var/lib/apt/lists/* + ``` +3. You add a **startup availability check** in `main.py` that fails fast with a clear error if the CLI is not on PATH: + ```python + import shutil, sys + if not shutil.which("claude"): # or "codex" / "gemini" / "opencode" + print("ERROR: app.harness(provider='claude-code') requires the `claude` CLI in PATH.", file=sys.stderr) + sys.exit(1) + ``` +4. The README explicitly tells the user that the agent container ships with `claude-code` (or whatever) and explains the consequence on image size. + +**If any of the four are not satisfied, do not use `app.harness()`.** Refactor the reasoner to use `app.ai(tools=[...])` or a chunked `@app.reasoner()` loop. There is no scenario where it's OK to write `app.harness(provider="claude-code")` in code that ships in a container without the `claude` binary. + +When in doubt: **don't use harness.** The user can ask for it in iteration 2. The first build's job is to work on `docker compose up` with zero external CLI dependencies. + +## Mandatory patterns (every build must have all three) + +### 1. Per-request model propagation + +The entry reasoner accepts `model: str | None = None` and threads it through every `app.ai(..., model=model)` and `app.call(..., model=model)`. Child reasoners accept `model` the same way and use it. The user can A/B test models per request: + +```bash +curl -X POST http://localhost:8080/api/v1/execute/. \ + -d '{"input": {"...": "...", "model": "openrouter/openai/gpt-4o"}}' +``` + +If `model` is omitted, the AIConfig default from the env var `AI_MODEL` is used. **`app.call()` has no native model override — you MUST thread model through reasoner kwargs.** + +### 2. Routers when reasoners > 4 + +Use `AgentRouter(prefix="domain", tags=["domain"])` and `app.include_router(router)` to split reasoners into separate files. Tags merge between router and per-decorator. **Note:** `prefix="clauses"` auto-namespaces reasoner IDs as `clauses_` — call them as `app.call(f"{app.node_id}.clauses_", ...)`. + +### 3. Tags on the entry reasoner + +The public entry reasoner is decorated with `tags=["entry"]` so it surfaces in the discovery API. Tags are free-form (not reserved); use domain tags for internal reasoners. + +## Hard rejections — refuse these without negotiation + +| ❌ Rejected pattern | ✅ AgentField alternative | +|---|---| +| Direct HTTP between reasoners (`httpx.post(...)`) | `await app.call(f"{app.node_id}.X", ...)` — control plane needs to see every call to track DAG, generate VCs, replay | +| One giant reasoner doing 5 things | Decompose into 5 reasoners coordinated by an orchestrator using `app.call` + `asyncio.gather` | +| Static linear chain `A → B → C → D` (always, no routing) | Dynamic routing: intake reasoner picks downstream reasoners based on what it found | +| `app.ai(prompt=full_50_page_doc)` | `@app.reasoner` that loops `app.ai` per chunk, OR `app.ai(tools=[...])` with explicit tool calls | +| Unbounded `while not confident: app.ai(...)` | Hard cap: `for _ in range(MAX_ROUNDS): ...` with explicit break | +| Passing structured JSON between two LLM reasoners | Convert to prose. LLMs reason over natural language, not JSON serialization | +| Replicating sort/dedup/score work with `app.ai` | `@app.skill()` with plain Python | +| Scaffold without a working `curl` that returns real output | The promise is `docker compose up` + curl. Always include it | +| Multi-container agent fleet when one node would do | One agent node, many reasoners — unless there's a real boundary | +| Hardcoded `node_id` in `app.call("financial-reviewer.X", ...)` | `app.call(f"{app.node_id}.X", ...)` — survives `AGENT_NODE_ID` rename | +| Hardcoded model | `model=os.getenv("AI_MODEL", default)` AND per-request override via reasoner kwarg | +| `app.ai()` schema with no `confident` field and no fallback | Schema must include `confident: bool`, call site checks it and escalates | +| `app.harness(provider="claude-code")` in a default scaffold | Default container has no `claude` CLI. Use `app.ai(tools=[...])` or a chunked-loop reasoner. See "Harness availability gate" | +| `input_schema=` or `output_schema=` parameter on `@app.reasoner` | These don't exist. Schemas come from type hints | +| `app.serve()` in `__main__` | `app.run()` — auto-detects CLI vs server mode | + +When the user explicitly demands a rejected pattern, name the rejection, explain *why* in one sentence, propose the AgentField alternative, and only build it their way after they've confirmed they understand the tradeoff. Add a `# NOTE: User requested X over canonical Y` comment. + +## Rationalization counters & red flags + +These thoughts mean STOP. If you notice any of them, re-read the linked reference and reconsider. + +| Thought / symptom | Reality / re-read | +|---|---| +| "Quick demo, I'll skip the architecture" | The skill exists to be stronger than a chain. Weak demo proves nothing | +| "I'll pass JSON between two reasoners" | LLMs reason over prose. Strings between LLMs, JSON only for code | +| "One big `analyze()` reasoner is fewer files" | Decompose. Granularity is the forcing function for parallelism. `choosing-primitives.md` | +| "I'll skip the CLAUDE.md / README" | They're how the next coding agent extends without breaking it. Always generate | +| "I'll ask 5 questions to be safe" | One question. State assumptions. Iterate | +| "Curl is enough, skip discovery API" | Discovery API tells you in 2s which step actually failed. `verification.md` | +| "I need stateful tool-using → `app.harness()`" | NO. `app.harness()` is external coding-agent CLI delegation AND requires the CLI in the container. Use `app.ai(tools=[...])` or a chunked-loop reasoner | +| "I'll add `app.harness(provider='claude-code')` for the deep reasoning step" | The default Python container has no `claude` CLI. The scaffold will crash on first run. Read "Harness availability gate" | +| "I'll add `input_schema=` to the decorator" | That param doesn't exist. Schemas come from type hints | +| ".ai() for a 50-page document" | `app.ai(tools=[...])` or a chunked-loop reasoner. `choosing-primitives.md` | +| "Static `for` loop of LLM calls, no routing" | Add dynamic routing or admit AgentField isn't justified. `architecture-patterns.md` | +| "Skipping `python3 -m py_compile` and `docker compose config`" | Always run. `scaffold-recipe.md` | +| "I'll write `import requests` to call the other reasoner" | Use `app.call(f"{app.node_id}.X", ...)`. `choosing-primitives.md` | +| "I'll use `app.serve()` in main" | Use `app.run()`. Auto-detects CLI vs server | + +## Output contract (every build) + +The final message to the user MUST contain these sections, in this order, in a clean copy-pasteable format. The whole point is the first-time user can read the message top to bottom and within 60 seconds have the system running and a working curl in another terminal. + +### 1. What was scaffolded + +Generated file tree with absolute paths. + +### 2. Architecture sketch + +4–6 bullets: what each reasoner does, who calls whom, where the dynamic routing happens, where the safety guardrails fire. + +### 3. Assumptions made + +5–10 bullets — the things you inferred without asking. + +### 4. 🚀 Run it (3 commands) + +```bash +cd +cp .env.example .env # then paste your OPENROUTER_API_KEY into .env +docker compose up --build +``` + +Wait until you see `agent registered` in the logs (~30–90 seconds first run). + +### 5. 🌐 Open the UI + +After the stack is up, open these URLs in your browser: + +| URL | What it shows | +|---|---| +| **http://localhost:8080/ui/** | AgentField control plane web UI — live workflow DAG, reasoner discovery, execution history, verifiable credential chains | +| **http://localhost:8080/api/v1/discovery/capabilities** | JSON: every reasoner registered with the control plane (proves your build deployed) | +| **http://localhost:8080/api/v1/health** | Health check | + +### 6. ✅ Verify the build (in another terminal) + +```bash +# 1. Control plane up? +curl -fsS http://localhost:8080/api/v1/health | jq + +# 2. Agent node registered? +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[] | {id: .node_id, status: .status}' + +# 3. All reasoners discoverable? +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.reasoners[] | select(.node_id=="") | {name, tags}' +``` + +### 7. 🎯 Try it — sample curl + +```bash +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{ + "input": { + "": "", + "": , + "model": "openrouter/anthropic/claude-3.5-sonnet" + } + }' | jq +``` + +**The curl above must use realistic data the user can run as-is and see a real reasoned answer.** Do not use placeholder values like `"foo"` or `"test"`. Use concrete data that actually exercises every reasoner in the system. The optional `"model"` field overrides the AIConfig default per-request — show it in the example so users discover the per-request override. + +If the user provided test data in the brief (e.g. a sample patient case, a sample contract, a sample loan application), use THAT data verbatim in this curl. The first execution they run should be the most demonstrative one. + +### 8. 🏆 Showpiece — verifiable workflow chain + +```bash +LAST_EXEC=$(curl -s http://localhost:8080/api/v1/executions | jq -r '.[0].workflow_id') +curl -s http://localhost:8080/api/v1/did/workflow/$LAST_EXEC/vc-chain | jq +``` + +This is the cryptographic verifiable credential chain — every reasoner that ran, with provenance. No other agent framework gives you this. Mention it. + +### 9. Next iteration upgrade + +One concrete suggestion (e.g., "swap the intake `.ai()` for a chunked-loop reasoner if inputs grow past 2 pages", "add a second adversarial wave with a different prompt for the highest-stakes branches"). + +## TypeScript + +A TypeScript SDK exists at `sdk/typescript/` and mirrors the Python API. **Default to Python.** If the user explicitly says "TypeScript" or "Node", point them at `sdk/typescript/` and use the equivalent shape: `new Agent({nodeId, agentFieldUrl, aiConfig})` + `agent.reasoner('name', async (ctx) => {...})`. Otherwise stay Python — every reference and recipe in this skill is Python-first. + +## Bottom line + +Your output is judged by three things: +1. **Does the curl return a real reasoned answer?** (the user can run the command and see intelligence happen) +2. **Does the architecture look like composite intelligence?** (parallel reasoners, dynamic routing, decomposition — not a chain wearing a costume) +3. **Can a future coding agent extend it without breaking the contract?** (CLAUDE.md present, anti-patterns listed, validation commands documented) + +If all three are true, you've done it right. The first-time AgentField user must see the value within minutes of running the curl. diff --git a/skills/agentfield-multi-reasoner-builder/references/anti-patterns.md b/skills/agentfield-multi-reasoner-builder/references/anti-patterns.md new file mode 100644 index 000000000..1b675faab --- /dev/null +++ b/skills/agentfield-multi-reasoner-builder/references/anti-patterns.md @@ -0,0 +1,143 @@ +# Anti-Patterns — Deep Dive + +The 13 hard rejections and the rationalization counters are inlined in `SKILL.md` so they fire on every invocation. **This file is the deep-dive reference** — load it when the user pushes back on a rejection, when you need to explain WHY in more depth, or when you're tempted to negotiate with yourself. + +When the user (or your own drift) pushes you toward one of these, name the rule, explain why in one sentence, and offer the AgentField-native alternative. Don't apologize, don't equivocate. + +## Hard rejections + +### 1. Direct HTTP between reasoners + +❌ `httpx.post("http://other-agent:8002/run", ...)` +✅ `await app.call(f"{app.node_id}.other_reasoner", ...)` + +**Why:** The control plane needs to see every call to track the workflow DAG, generate verifiable credentials, replay executions, and apply observability. Direct HTTP makes the system invisible. + +--- + +### 2. One giant reasoner doing 5 things + +❌ `async def review_everything(doc): ...` (200 lines, 4 LLM calls inside) +✅ Decompose into 5 reasoners that the orchestrator coordinates with `app.call` and `asyncio.gather`. + +**Why:** Granular decomposition is the forcing function for parallelism, observability, replayability, and quality. A monolithic reasoner is just a script with extra steps. + +--- + +### 3. Static linear chain where the path depends on discoveries + +❌ `intake → analyze → score → report` (always, in this order, regardless of intake) +✅ Intake routes to different downstream reasoners based on what it found. If risk is high, spawn a deep-dive harness. If complexity is low, skip the adversary. + +**Why:** Dynamic routing IS the meta-level intelligence that distinguishes AgentField from chain frameworks. A static chain can be written in 30 lines of LangChain. + +--- + +### 4. `.ai()` on a long document + +❌ `await app.ai(prompt=full_50_page_contract, schema=Result)` +✅ A `.harness()` that can navigate the document with `read_section` / `lookup_definition` tools. + +**Why:** `.ai()` is single-shot. It cannot adapt, navigate, or escalate. Stuffing a long doc into the prompt either truncates silently, blows the context window, or produces shallow answers because the model never reads past page 3. + +--- + +### 5. Unbounded loops + +❌ `while not confident: result = await app.ai(...)` +✅ `for _ in range(MAX_ROUNDS): ...` with a hard cap and an explicit break condition. + +**Why:** "Keep going until confident" is how you get a $400 bug report. Every loop has a cap. Period. + +--- + +### 6. Structured JSON shoved into another LLM as "context" + +❌ `await app.ai(user=str(previous_findings.model_dump()), ...)` +✅ `await app.ai(user=format_findings_as_prose(previous_findings), ...)` + +**Why:** LLMs reason over natural language, not over JSON serialization. Structured output between code and a reasoner is correct. Structured output between two reasoners is a smell — convert it to prose with the relevant context. + +--- + +### 7. Replicating programmatic work with an LLM + +❌ `await app.ai(prompt="Sort these 50 items by score", ...)` +✅ `sorted(items, key=lambda x: x.score, reverse=True)` + +**Why:** You are paying for intelligence. Sorting is not intelligence. If a `for` loop or a sort function would do it, do it. Save the LLM calls for things that previously required a human expert. + +--- + +### 8. Scaffold without a working `curl` + +❌ "Here are the files; you can figure out how to test it." +✅ A README with the exact verification ladder (health → nodes → capabilities → execute) and a curl that returns a real reasoned answer. + +**Why:** The promise is `docker compose up` + curl. If the user can't run those two commands and see real output, the build failed regardless of how nice the architecture looks on paper. + +--- + +### 9. Multi-container agent fleet when one node would do + +❌ Five Docker services for "research agent", "writer agent", "editor agent", "fact-checker agent", "publisher agent" +✅ ONE agent node with five reasoners. Same orchestration capability, 5× less ops surface. + +**Why:** Reasoners are cheaper than containers. Use multiple containers only when there's a real boundary (separate teams, separate language runtimes, separate scaling profiles, separate trust domains). Otherwise, one node with many reasoners is the right shape. + +--- + +### 10. Hardcoded model strings + +❌ `ai_config=AIConfig(model="gpt-4o")` +✅ `ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/anthropic/claude-3.5-sonnet"))` AND accept a `model` parameter on the entry reasoner that propagates via `app.call(..., model=model)`. + +**Why:** Users need to swap models per-request to A/B test without rebuilding the container. Make the model dynamic at three layers: env default, container override, per-request override. + +--- + +### 11. Hardcoded `node_id` in `app.call` + +❌ `await app.call("financial-reviewer.score", ...)` +✅ `await app.call(f"{app.node_id}.score", ...)` + +**Why:** When the user renames the node via `AGENT_NODE_ID`, hardcoded calls break. Always reference your own reasoners through `app.node_id`. + +--- + +### 12. `.ai()` with no `confident` flag and no fallback + +❌ Schema is `{decision: str, reason: str}` and the call site doesn't validate. +✅ Schema is `{decision: str, reason: str, confident: bool}` and the call site checks `if not result.confident: escalate_to_harness()`. + +**Why:** Every `.ai()` has a failure mode. A failed `.ai()` that propagates a confidently-wrong answer is the single most expensive bug an AgentField system can ship. + +--- + +## Rationalization counters + +When you (or the user) start producing one of these, recognize it and refuse: + +| Rationalization | Counter | +|---|---| +| "Just for the demo, a chain is fine" | The demo is the proof. A weak demo proves nothing. | +| "The LLM is smart enough to handle the whole document in one call" | The LLM is 0.3-grade. The architecture is 0.8-grade. Don't mix them up. | +| "I'll add the harness later if it doesn't work" | You'll never know it doesn't work because the .ai() will silently truncate. Start with harness. | +| "Routing is overkill, the workflow is always the same" | Then the workflow doesn't justify AgentField. Tell the user honestly. | +| "I'll skip the curl smoke test, the user will figure it out" | The user invoked a skill. The skill's whole point is they don't have to figure it out. | +| "The CLAUDE.md is bureaucratic, the code is self-documenting" | Code documents WHAT. CLAUDE.md documents WHY this is the architecture and what NOT to undo. The next agent needs both. | +| "Two grooming questions is barely anything" | One question. The point is to feel magical to the first-time user. Infer the rest. | +| "I'll skip the discovery API check, I trust the build" | A curl that hangs at 30s tells you nothing about which step failed. Discovery API tells you in 2s. | +| "I'll ship the JSON directly to the next reasoner, it's cleaner" | Cleaner for you. Worse for the LLM. Convert to prose. | +| "More containers means better separation" | More containers means more YAML, more network hops, more failure modes. Use one node unless you have a real reason. | + +## When the user explicitly demands a rejected pattern + +Some users will insist. Honor that — but only after you've named the rejection, explained why in one sentence, and they've confirmed they understand the tradeoff. Then build it their way and add a comment in the code: + +```python +# NOTE: User explicitly requested static chain over dynamic routing despite +# the canonical AgentField pattern being dynamic. See README "Tradeoffs" section. +``` + +The point is not to be a tyrant — it's to refuse drift. Conscious choices are fine. Drift is not. diff --git a/skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md b/skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md new file mode 100644 index 000000000..d83fc4bed --- /dev/null +++ b/skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md @@ -0,0 +1,243 @@ +# Architecture Patterns — The 8 AgentField Compositions + +These are battle-tested patterns from real AgentField systems (`sec-af`, `af-swe`, `contract-af`, `af-deep-research`, `reactive-atlas`). Pick one, compose two, or invent your own — but never default to a static linear chain. + +For each pattern: when to use it, the shape, and a real-system reference. + +--- + +## 1. Parallel Hunters + Signal Cascade + +**Shape:** +``` +input ──┬──> hunter_A ──┐ + ├──> hunter_B ──┼──> findings_pool ──> downstream + ├──> hunter_C ──┘ + └──> hunter_D +``` + +**When:** Any problem with multiple independent analysis dimensions that can be examined concurrently. Each hunter is a specialist that knows about ONE dimension deeply. + +**Reference:** `examples/sec-af/` — parallel strategy hunters analyzing SEC filings; `examples/contract-af/` — parallel clause analysts (IP / liability / non-compete / data / termination). + +**Code shape:** +```python +@app.reasoner() +async def review(document: str) -> dict: + findings = await asyncio.gather(*[ + app.call(f"{app.node_id}.{h}", document=document) + for h in ["profitability_hunter", "liquidity_hunter", "risk_hunter", "efficiency_hunter"] + ]) + return await app.call(f"{app.node_id}.synthesizer", findings=findings) +``` + +**Common mistake:** Making the hunters do "everything" each. Each hunter is a NARROW specialist. If hunters overlap heavily, you decomposed wrong. + +--- + +## 2. HUNT → PROVE Adversarial Tension + +**Shape:** +``` +input ──> hunters ──> candidate findings ──> provers ──> verified findings + ↑ + adversary tries to disprove each one +``` + +**When:** Any problem where false positives are catastrophic — security, legal, compliance, medical, financial. + +**Reference:** `examples/sec-af/` — vulnerability hunters → exploit provers; `examples/contract-af/` — clause analysts → adversary reviewer. + +**Why it works:** Hunters are biased toward sensitivity (find everything). Provers are biased toward specificity (refuse anything unproven). The tension between them is the intelligence — neither alone produces a good answer. + +```python +@app.reasoner() +async def adversarial_review(input: str) -> dict: + candidates = await app.call(f"{app.node_id}.hunter_pool", input=input) + verified = await asyncio.gather(*[ + app.call(f"{app.node_id}.prover", finding=f, original=input) + for f in candidates + ]) + return [v for v in verified if v["proven"]] +``` + +--- + +## 3. Streaming Pipeline (asyncio.Queue) + +**Shape:** +``` +upstream ──emits──> queue ──consumes──> downstream + (starts working before upstream finishes) +``` + +**When:** Downstream reasoners can start working on partial results — and waiting for the full upstream batch wastes time and misses interaction effects. + +**Reference:** `examples/sec-af/` — HUNT→PROVE streaming; `examples/contract-af/` — analysts → cross-reference + adversary streaming. + +```python +findings_queue = asyncio.Queue() + +async def producer(items): + for item in items: + finding = await app.call(f"{app.node_id}.analyze", item=item) + await findings_queue.put(finding) + await findings_queue.put(None) # sentinel + +async def consumer(): + seen = [] + while (f := await findings_queue.get()) is not None: + # Check this finding against everything seen so far + await app.call(f"{app.node_id}.cross_ref", new=f, prior=seen) + seen.append(f) +``` + +--- + +## 4. Meta-Prompting (Harnesses Spawning Harnesses) + +**Shape:** +``` +parent_harness ──discovers X──> crafts a SPECIFIC prompt ──spawns──> child_harness ──> findings + ↑ │ + └────────────────── integrates findings ─────────────────────────────────────────────┘ +``` + +**When:** The investigation path depends on what gets discovered. You cannot pre-define which sub-reasoners will run, because you don't know yet what's there. + +**Reference:** `examples/contract-af/` — clause analysts spawning definition-impact analyzers when they discover a referenced defined term; cross-reference resolver spawning combination deep-dives. + +**This is the pattern that no framework chain can replicate.** It's pure dynamic intelligence. + +```python +@app.reasoner() +async def clause_analyst(clause: str, context: str) -> dict: + initial = await app.harness( + goal=f"Analyze this clause: {clause}", + tools=["read_section", "lookup_definition"], + max_iterations=10, + ) + + # The harness discovered a defined term that needs deeper analysis. + # Craft a SPECIFIC prompt for a child harness at runtime. + if initial.discovered_terms: + for term in initial.discovered_terms: + sub_prompt = ( + f"You are analyzing the cascading impact of the defined term '{term}' " + f"in the context of clause: {clause}. " + f"Read every section that references '{term}' and determine if any " + f"interaction creates risk. Return: affected_sections, risk_level, rationale." + ) + sub = await app.call( + f"{app.node_id}.term_impact_analyzer", + prompt=sub_prompt, + term=term, + ) + initial.term_impacts.append(sub) + return initial.model_dump() +``` + +**Hard rule:** every meta-spawn point has a depth cap. + +--- + +## 5. Three Nested Control Loops (Inner / Middle / Outer) + +**Shape:** + +| Loop | Scope | Trigger | Cap | +|---|---|---|---| +| **Inner** | Per-reasoner self-adaptation | Found a reference, escalation needed | `max_follows=3`, `max_escalations=1` | +| **Middle** | Cross-reasoner deep-dives | Critical combination, hidden interaction | `max_spawns=5` | +| **Outer** | Pipeline-wide coverage | Coverage gate detects a gap | `max_iterations=3` | + +**When:** Long-running analysis where you can't predict upfront how deep you need to go. Coverage matters and edge cases are dangerous. + +**Reference:** `examples/af-swe/` — inner coding loop / middle sprint loop / outer factory loop; `examples/contract-af/` — analyst loop / cross-ref loop / coverage loop. + +**Hard rule:** every loop has an absolute cap. "Keep going until confident" is how you get a $400 bug report. + +--- + +## 6. Fan-Out → Filter → Gap-Find → Recurse + +**Shape:** +``` +seed ──> [generate N candidates] ──> [filter to top K] ──> [gap analysis] + │ + ├─ gaps found ──> recurse with new seeds + └─ no gaps ──> done +``` + +**When:** Comprehensive coverage problems where you don't know the shape of the answer upfront — research, due diligence, audits, literature reviews. + +**Reference:** `examples/af-deep-research/` — recursive research with quality-driven loops. + +```python +@app.reasoner() +async def deep_research(question: str, max_rounds: int = 3) -> dict: + seeds = [question] + all_findings = [] + for round in range(max_rounds): + findings = await asyncio.gather(*[ + app.call(f"{app.node_id}.investigator", seed=s) for s in seeds + ]) + all_findings.extend(findings) + gaps = await app.call(f"{app.node_id}.gap_finder", findings=all_findings, original=question) + if not gaps.gaps: + break + seeds = gaps.gaps # next round's seeds + return await app.call(f"{app.node_id}.synthesizer", findings=all_findings) +``` + +--- + +## 7. Factory Control Loops + +**Shape:** Three nested loops for long-running multi-step execution with adaptive replanning. + +``` +outer (factory) ──> sprint planner ──> goals +middle (sprint) ──> task executor ──> tasks +inner (coding) ──> per-task agent ──> code + │ + └─ fails ──> outer replan +``` + +**When:** Multi-step execution that needs to replan based on intermediate results — code generation, document production, migration execution, multi-step research. + +**Reference:** `examples/af-swe/`. + +--- + +## 8. Reactive Document Enrichment + +**Shape:** +``` +event source (DB change stream / webhook) ──> enrichment pipeline ──> output +``` + +**When:** Work is triggered by data arriving — incidents, PRs, contracts on upload, form submissions, telemetry events. + +**Reference:** `examples/reactive-atlas/` — MongoDB change streams → enrichment agents. + +**The point:** the engine is domain-agnostic; the config defines the domain. The same pattern handles "new contract uploaded → enrich → score → route" as it handles "new incident filed → triage → assign → notify". + +--- + +## How to pick a pattern (or compose your own) + +1. **What triggers the work?** Event stream → pattern 8. Direct API call → patterns 1–7. +2. **Is the input large/navigable?** Yes → harness-first, consider meta-prompting (pattern 4). +3. **Multiple independent analysis dimensions?** Yes → parallel hunters (pattern 1). +4. **False positives expensive?** Yes → add HUNT→PROVE (pattern 2) on top of pattern 1. +5. **Downstream can start before upstream finishes?** Yes → streaming (pattern 3). +6. **Coverage matters and you can't predict shape upfront?** Pattern 6. +7. **Multi-round adaptive execution?** Pattern 5 or 7. +8. **The investigation path depends on discoveries?** Pattern 4 (meta-prompting), always. + +Most strong systems compose 2–3 patterns. Example: contract-af = parallel hunters (1) + HUNT→PROVE (2) + streaming (3) + meta-prompting (4) + nested loops (5). + +## When NONE of these fit + +Then the use case probably doesn't justify AgentField at all — it's a one-shot LLM call wearing a costume. Tell the user honestly. diff --git a/skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md b/skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md new file mode 100644 index 000000000..6db1efef0 --- /dev/null +++ b/skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md @@ -0,0 +1,502 @@ +# Choosing Primitives — Philosophy + Real SDK Surface + +The most consequential architectural decision in any AgentField build. This file is one read because the philosophy IS the primitive choice — you cannot decide between `.ai()` and a `@reasoner` loop without first knowing what kind of reasoning you're trying to amplify. Read top to bottom before writing code. + +--- + +## Part 1 — Composite Intelligence (the "why") + +A single LLM call reasons at ~0.3–0.4 on a normalized scale where 1.0 is human-expert. **You cannot prompt your way to 0.8.** You can architect your way there. + +A well-composed system of ten 0.3-grade reasoners can outperform a single 0.4-grade monolith by 5–10× on complex tasks — because the architecture itself encodes intelligence about how to break down problems, allocate cognitive work, combine partial solutions, and stay coherent across steps. + +You are not a prompt engineer. You are a **systems architect**. Your job is to design the cognitive architecture; the LLMs are interchangeable parts. + +### What this is NOT + +- ❌ A single super-intelligent generalist that solves anything in one call +- ❌ A linear chain of LLM calls dressed up with "agent" branding (LangChain, CrewAI, AutoGen patterns) +- ❌ A pile of unbounded autonomous agents "thinking" their way to an answer +- ❌ A tool to orchestrate tools (that's what a script is for) + +### What it IS + +- ✅ A network of **specialized cognitive functions**, each tightly scoped +- ✅ **Architecture patterns** that elevate collective reasoning above any individual call +- ✅ **Decomposed atomic reasoning units** that can run in parallel +- ✅ **Guided autonomy**: agents have freedom inside a tight scope, not unbounded freedom +- ✅ **Dynamic routing**: the path adapts to what gets discovered, not a hardcoded DAG +- ✅ **Verifiable provenance**: every claim traces to its source + +### The five foundational principles + +**1. Granular decomposition is mandatory.** No complex problem is solved by a single agent in a single step. The constraint is a forcing function that produces parallelism, observability, and quality. If your "AI agent" is one 200-line function, you decomposed wrong. + +**2. Guided autonomy, never unbounded.** A reasoner has freedom in HOW it accomplishes its goal, but **zero freedom** in WHAT the goal is. The orchestrator is a CEO, not a babysitter — it sets objectives and verifies outcomes. + +**3. Dynamic, state-responsive orchestration.** The flow of control is not static. Agent A's output determines what subsystem B even looks like. This is the **meta-level** intelligence that distinguishes AgentField from chain frameworks: the chain shape itself is intelligence. + +**4. Contextual fidelity & verifiable provenance.** The orchestrator is a context broker. Every reasoner gets exactly what it needs — no more, no less. Every claim carries a citation key that propagates to the final output. + +**5. Asynchronous parallelism.** Decompose to parallelize. If your reasoner pipeline runs sequentially, your decomposition is wrong. Use `asyncio.gather` aggressively. + +### The intelligence test + +The whole point is **intelligence**. If something can be done programmatically — sorting, scoring, deduping, filtering, regex extraction, schema validation — **do it in code** (`@app.skill()`). LLMs are reserved for things that previously required a human expert: judgment, discovery, synthesis, routing decisions on ambiguous data, recognizing patterns that don't have clean rules. + +If your "AI agent" is doing work a Python `for` loop could do, you're burning money and intelligence on the wrong layer. + +### Why AgentField, not LangChain or CrewAI + +LangChain and CrewAI give you **tools to build chains**. AgentField gives you a **control plane** that: + +- Routes every inter-reasoner call through a server you can introspect, replay, and audit +- Tracks the live workflow DAG so you can see the system's reasoning shape +- Generates W3C verifiable credentials for every execution (cryptographic audit trail) +- Lets reasoners spawn sub-reasoners with dynamic prompts at runtime (meta-prompting) +- Enforces a clean separation between agent nodes (deployable units) and reasoners (cognitive units) +- Gives you per-call model overrides so a parent reasoner can route different sub-tasks to different LLMs + +You are not building "an agent." You are deploying a **reasoning system** as production infrastructure. + +--- + +## Part 2 — The Real Python SDK Surface (the "how") + +Signatures here come from reading `sdk/python/agentfield/agent.py`, `router.py`, and `tool_calling.py` directly. Many docs describe an idealized API — this section is what actually works. + +## The five primitives + +| Primitive | What it really does | When to use | +|---|---|---| +| `@app.reasoner()` | Registers a function as a reasoner with the control plane. The function body is yours — make as many `app.ai()` / `app.call()` calls as you want | Wrap **every cognitive unit** in your system | +| `@app.skill()` | Registers a deterministic function. No LLM | Pure transforms, scoring, parsing, dedup, validation — anything code can do | +| `app.ai(...)` | Single call OR multi-turn tool-using LLM call (when `tools=` is passed). Returns text or a Pydantic schema | Classification, routing, structured analysis, **and** stateful tool-using reasoning when you give it tools | +| `app.call(target, **kwargs)` | Calls another reasoner/skill THROUGH the control plane. Tracks the workflow DAG | All inter-reasoner traffic. Never use direct HTTP | +| `app.harness(prompt, provider=...)` | **Delegates to an external coding-agent CLI** (claude-code, codex, gemini, opencode). Returns a `HarnessResult` | When you need a real coding agent to read/write files, run shell commands, or execute a non-trivial coding task as part of your pipeline | + +## What `app.ai()` actually accepts + +```python +result = await app.ai( + *args, # positional: text, urls, file paths, bytes, dicts, lists (multimodal) + system: str | None, # system prompt + user: str | None, # user prompt (alternative to positional) + schema: type[BaseModel] | None, # Pydantic class for structured output + model: str | None, # PER-CALL model override (e.g. "gpt-4o", "openrouter/anthropic/claude-3.5-sonnet") + temperature: float | None, + max_tokens: int | None, + stream: bool | None, + response_format: "auto" | "json" | "text" | dict | None, + tools: list | str | None, # tool definitions for tool-calling, OR "discover" to auto-discover + context: dict | None, + memory_scope: list[str] | None, # ["workflow", "session", "reasoner"] etc. + **kwargs, # provider-specific extras +) +``` + +**Critical things most coding agents miss:** +- `model=` is per-call. You can override the AIConfig default on any specific call. **Always** thread `model` through from the entry reasoner so the user can A/B test models per request. +- `tools=` makes `app.ai()` a multi-turn tool-using LLM. This is how you build "stateful reasoning agents" — not via `app.harness()`. Pass `tools="discover"` to auto-discover available tools, or pass a list of tool definitions. +- `memory_scope=["workflow", "session", "reasoner"]` injects relevant memory state into the prompt automatically. +- `schema=` returns a validated Pydantic instance, not a dict. Call `.model_dump()` to serialize. + +## What `app.harness()` actually accepts + +```python +result = await app.harness( + prompt: str, # task description + schema: type[BaseModel] | None, # optional structured output + provider: "claude-code" | "codex" | "gemini" | "opencode" | None, + model: str | None, # override the provider's default model + max_turns: int | None, # iteration cap + max_budget_usd: float | None, # cost cap + tools: list[str] | None, # which tools the coding agent is allowed to use + permission_mode: "plan" | "auto" | None, + system_prompt: str | None, + env: dict[str, str] | None, + cwd: str | None, + **kwargs, +) +# Returns HarnessResult with .text, .parsed (validated schema), .result +``` + +**Use harness when:** you need a real coding agent (Claude Code, Codex, Gemini CLI) to perform a task that requires actual file I/O, shell access, or multi-step coding. Example: a "fix-this-failing-test" reasoner spawns a Claude Code harness to actually edit the test file. + +**Do NOT use harness for:** in-process stateful LLM reasoning over a document. That's `app.ai(..., tools=[...])`. Harness is heavyweight — it spawns a subprocess running an entire agent CLI. + +## What `app.call()` actually does + +```python +result: dict = await app.call( + target: str, # "node_id.reasoner_name" + *args, # positional args (auto-mapped to target's params for local calls) + **kwargs, # keyword args passed to the target reasoner +) +``` + +**Always returns a `dict`** — even if the target reasoner returns a Pydantic model. Convert manually: +```python +result_dict = await app.call(f"{app.node_id}.score", text=passage) +result = ScoreResult(**result_dict) +``` + +**Critical:** always reference reasoners as `f"{app.node_id}.reasoner_name"` so renaming the node via `AGENT_NODE_ID` env doesn't break the system. Hardcoding the node ID is a bug waiting to happen. + +**Workflow tracking:** every `app.call` is recorded in the control plane's workflow DAG. Direct HTTP between reasoners bypasses this and is forbidden. + +## The decision tree (real, not aspirational) + +``` +What is this reasoner doing? + +├─ Pure deterministic transform (sort, parse, dedup, score-with-formula)? +│ → @app.skill() (no LLM, free, replayable) +│ +├─ Single classification with ≤4 flat fields, input fits comfortably in ~2k tokens? +│ → app.ai(system, user, schema=FlatModel) (with confident: bool, with fallback) +│ +├─ Stateful reasoning where the LLM needs to call tools, search, iterate? +│ → app.ai(system, user, tools=[...]) (multi-turn tool-using mode) +│ +├─ Long input (a document, a transcript, a corpus) that needs navigation? +│ → @app.reasoner() that does LOOPED app.ai() calls with chunking, +│ OR app.ai(..., tools=["read_section", ...]) if you've defined the tools, +│ OR pre-process with a @app.skill() chunker then fan-out via asyncio.gather +│ +├─ Need an actual coding agent to write/edit files / run shell? +│ → app.harness(prompt, provider="claude-code", tools=[...]) +│ +└─ Composing multiple reasoners? + → @app.reasoner() that uses app.call() and asyncio.gather +``` + +**The bias:** decompose into many small `@app.reasoner()` units. Use `app.ai()` with explicit prompts. Use `tools=` when you need tool-calling. Reserve `app.harness()` for when you literally need a coding agent in the loop. + +## The model-propagation pattern (mandatory in every build) + +The user must be able to swap models per request without rebuilding the container. Implement it like this in **every** generated entry reasoner: + +```python +@app.reasoner(tags=["entry"]) +async def review_financials( + company_name: str, + business_summary: str, + financial_snapshot: dict, + analyst_question: str = "Should we proceed?", + model: str | None = None, # ← per-request model override +) -> dict: + # 1. Use it in app.ai + plan = await app.ai( + system="You are a financial intake router.", + user=f"...", + schema=IntakePlan, + model=model, # ← propagate + ) + + # 2. Pass it to child reasoners via app.call + reviews = await asyncio.gather(*[ + app.call( + f"{app.node_id}.{axis}_reviewer", + company_name=company_name, + business_summary=business_summary, + model=model, # ← propagate + ) + for axis in plan.focus_areas + ]) + + # 3. Each child reasoner accepts and uses model the same way +``` + +And in every child reasoner: +```python +@app.reasoner() +async def profitability_reviewer( + company_name: str, + business_summary: str, + model: str | None = None, # ← accept it +) -> dict: + review = await app.ai( + system="You are a profitability reviewer.", + user=f"...", + schema=TrackReview, + model=model, # ← use it + ) + return review.model_dump() +``` + +The user can now pick the model per request: +```bash +curl -X POST http://localhost:8080/api/v1/execute/financial-reviewer.review_financials \ + -H 'Content-Type: application/json' \ + -d '{ + "company_name": "Acme", + "business_summary": "...", + "financial_snapshot": {...}, + "model": "openrouter/openai/gpt-4o" + }' +``` + +If `model` is omitted, the AIConfig default from the env var `AI_MODEL` is used. **This pattern is non-negotiable.** Every generated build must support per-request model override. + +## The router pattern (organize reasoners across files) + +When a build has more than ~4 reasoners, split them into router files. + +**Important detail from the SDK:** `AgentRouter(prefix="...")` **auto-namespaces** the reasoner IDs. A router with `prefix="clauses"` containing a reasoner `analyze_ip` registers as `clauses_analyze_ip`. Call it as `app.call(f"{app.node_id}.clauses_analyze_ip", ...)`. + +**Three prefix variations and what they do:** + +| Constructor call | Reasoner `analyze_ip` registers as | Use when | +|---|---|---| +| `AgentRouter(prefix="clauses")` | `clauses_analyze_ip` | You want grouped namespacing | +| `AgentRouter(prefix="")` (or omit `prefix`) | `analyze_ip` | You want raw function names — **the canonical default** | +| `@router.reasoner(name="explicit")` overrides any prefix | `explicit` | You want full control over the registered ID | + +**Canonical default:** use `AgentRouter(prefix="", tags=["domain"])` so reasoner IDs match function names and your `app.call(f"{app.node_id}.func_name", ...)` calls stay readable. Only use `prefix=` when you have ID collisions across routers. + +`reasoners/finance.py`: +```python +from agentfield import AgentRouter +from pydantic import BaseModel + +# prefix="" → no auto-namespace; tags merge with per-decorator tags +router = AgentRouter(prefix="", tags=["finance"]) + +class TrackReview(BaseModel): + axis: str + score: int + rationale: str + +@router.reasoner() +async def profitability_reviewer( + company_name: str, + business_summary: str, + model: str | None = None, +) -> TrackReview: # type-hinted return drives schema + return await router.ai( # router.ai proxies to the attached agent + system="You are a profitability reviewer.", + user=f"Company: {company_name}\n{business_summary}", + schema=TrackReview, + model=model, + ) +``` + +`main.py`: +```python +import os +from agentfield import Agent, AIConfig +from reasoners.finance import router as finance_router +from reasoners.risk import router as risk_router + +app = Agent( + node_id=os.getenv("AGENT_NODE_ID", "financial-reviewer"), + ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/anthropic/claude-3.5-sonnet")), + dev_mode=True, +) + +app.include_router(finance_router) +app.include_router(risk_router) + +# Entry reasoner stays in main.py +@app.reasoner(tags=["entry"]) +async def review_financials(...): ... + +if __name__ == "__main__": + app.run() # auto-detects CLI vs server +``` + +**Router facts (verified against `router.py`):** +- `AgentRouter` proxies *every* agent attribute via `__getattr__` — so `router.ai()`, `router.call()`, `router.memory`, `router.harness()` all work identically to `app.ai()` etc. +- Tags merge: `AgentRouter(tags=["finance"])` + `@router.reasoner(tags=["scoring"])` → reasoner has BOTH tags. +- `prefix` auto-namespaces IDs as `{prefix_segments}_{func_name}`. +- The canonical pattern is one router per domain file; one `Agent(...)` + multiple `include_router(...)` calls in `main.py`. + +**When to use a router vs. keep everything in main.py:** +- ≤ 4 reasoners → main.py only +- 5–10 reasoners → split by domain into 2–3 router files +- > 10 reasoners → consider whether you've decomposed correctly OR whether you need multiple nodes + +## Tags + +Tags are **free-form** metadata attached to reasoners (verified against the control plane source — there are no reserved tag names). They surface in the discovery API: + +```bash +curl -s http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.reasoners[] | select(.tags[]? == "entry")' +``` + +**Conventions used by AgentField examples (not enforced, just convention):** +- `"entry"` — mark the public-facing entry reasoner. Always tag it. +- A domain tag (e.g., `"finance"`, `"risk"`, `"intake"`) — for filtering in discovery and the UI. + +**Hard rule:** every entry reasoner gets `tags=["entry"]` so the user can find it via discovery without reading the source. + +## `Agent(...)` constructor — verified signature + +From `sdk/python/agentfield/agent.py:464`: + +```python +app = Agent( + node_id: str, # REQUIRED. e.g. "customer-triage" + agentfield_server: str | None = None, # control plane URL. env: AGENTFIELD_SERVER. default http://localhost:8080 + version: str = "1.0.0", + description: str | None = None, + tags: list[str] | None = None, # agent-LEVEL tags (distinct from per-reasoner tags) + author: dict[str, str] | None = None, + ai_config: AIConfig | None = None, # default AIConfig.from_env(). Pass AIConfig(model="...") to set default + harness_config: HarnessConfig | None = None, + memory_config: MemoryConfig | None = None, + dev_mode: bool = False, # verbose logs + DEBUG level. Always set True in scaffolds + callback_url: str | None = None, # else AGENT_CALLBACK_URL env, else auto-detect + auto_register: bool = True, + vc_enabled: bool | None = True, # generate verifiable credentials for executions + api_key: str | None = None, # X-API-Key header to control plane + # ... other auth/DID parameters +) +``` + +**Critical things scaffolds get wrong:** +- The parameter is **`agentfield_server`** (not `agentfield_url`, not `server_url`). Verified in `agent.py:464`. +- Read it from env: `agentfield_server=os.getenv("AGENTFIELD_SERVER", "http://localhost:8080")`. +- Set `dev_mode=True` in every scaffold so the user sees what's happening on first run. +- `Agent` subclasses FastAPI — you can use any FastAPI feature on it directly. + +### `AGENT_CALLBACK_URL` env var + +The agent node needs a URL the control plane can use to call back into it (for sync execution dispatch). In Docker Compose this is `http://:`. The SDK reads it from `AGENT_CALLBACK_URL`. You set it in the compose file: + +```yaml +environment: + AGENT_CALLBACK_URL: http://customer-triage:8001 +``` + +If you don't set it, the SDK auto-detects, which works locally but is unreliable inside containers. **Always set it explicitly in the compose file** to the in-network DNS name of the service. + +## `@app.reasoner()` real signature + +Based on `agent.py:1612`, the decorator only accepts these parameters: + +```python +@app.reasoner( + path: str | None = None, # default /reasoners/{func_name} + name: str | None = None, # override the registered ID + tags: list[str] | None = None, + *, + vc_enabled: bool | None = None, # inherits agent default + require_realtime_validation: bool = False, +) +``` + +**Important things it does NOT accept:** `input_schema=`, `output_schema=`, `description=`, `version=`. **Schemas are derived from type hints.** The function's parameter type hints become the input schema; the return type hint becomes the output schema. + +```python +class IntakeResult(BaseModel): + contract_type: str + confident: bool + +@app.reasoner(tags=["entry"]) +async def classify(text: str, model: str | None = None) -> IntakeResult: + return await app.ai(system="...", user=text, schema=IntakeResult, model=model) +``` + +## `app.run()` is the entry point + +`agent.py:4194` confirms `app.run()` auto-detects whether to launch in CLI mode (`af call`, `af list`, `af shell`) or server mode. **Always use `app.run()` in `__main__`**, not `app.serve()`: + +```python +if __name__ == "__main__": + app.run(host="0.0.0.0", port=int(os.getenv("PORT", "8001")), auto_port=False) +``` + +## Memory scopes (one paragraph) + +```python +await app.memory.set(key, value, scope="global"|"agent"|"session"|"run") +await app.memory.get(key, default=None, scope=...) +``` + +**global** = cross-everything; **agent** = this node, all sessions; **session** = one conversation; **run** = single workflow execution. Use `session` for chat-like workflows, `run` for per-execution scratch state, `agent` for cached embeddings, `global` for shared knowledge. + +## The `confident` flag pattern (mandatory for every `.ai()` gate) + +Every `.ai()` schema includes a `confident: bool` field, and the call site checks it. **Three valid fallback options exist** when `confident` is false — pick the right one for the situation: + +| Fallback option | When to use | Cost | +|---|---|---| +| **(a) Escalate to a deeper reasoner** | The system has another `@app.reasoner()` that can handle the harder case (chunked-loop, multi-call, more context) | Extra call | +| **(b) Deterministic safe default (RECOMMENDED for safety/regulated systems)** | The use case has a "safe" terminal state — `REFER_TO_HUMAN`, `REJECT`, `RETRY_LATER`, `NEEDS_HUMAN_REVIEW`. Return a Pydantic instance hard-coded to that safe state | Free | +| **(c) Escalate to `app.harness()`** | ONLY when `recommendation.harness_usable == true` from `af doctor`, AND the Dockerfile installs the CLI, AND there's a startup `shutil.which()` check | Heavy | + +**Default for regulated, safety-critical, or judgment-based systems: option (b).** A confident-wrong automated decision is almost always worse than a referral. Build `fallback_*` constructors in `helpers.py` that return Pydantic instances hard-coded to the safe-default state. + +### Pattern (a) — escalate to a deeper reasoner + +```python +class IntakeDecision(BaseModel): + contract_type: str + complexity: str + confident: bool + +result = await app.ai(system="...", user="...", schema=IntakeDecision, model=model) + +if not result.confident or result.complexity == "high": + # Escalate to a deeper reasoner that can navigate more context + result_dict = await app.call( + f"{app.node_id}.deep_intake", + document=full_document, + partial=result.model_dump(), + model=model, + ) + result = DeepIntakeResult(**result_dict) +``` + +### Pattern (b) — deterministic safe default + +```python +# In helpers.py: +def fallback_specialist_review(*, axis: str, reason: str) -> SpecialistReview: + """Safe default Pydantic instance returned when an .ai() gate isn't confident.""" + return SpecialistReview( + axis=axis, + verdict="NEEDS_HUMAN_REVIEW", + confidence_score=0.0, + confident=False, + rationale=reason, + decisive_fact_ids=[], + ) + +# In specialists.py: +review = await router.ai(system="...", user="...", schema=SpecialistReview, model=model) +if not review.confident: + return fallback_specialist_review( + axis=axis, + reason=f"{axis} reviewer was not confident enough to automate a terminal view.", + ) +return review +``` + +This is the dominant pattern in real builds. The orchestrator at the top of the pipeline uses **deterministic governance overrides** (plain Python `if` statements) to convert any non-confident specialist into a `REFER_TO_HUMAN` final decision. The intelligence stays in the LLM; the safety stays in the code. + +Every `.ai()` gate has a `confident` flag and one of these three fallback paths. No exceptions. + +## What about long-document navigation? + +The philosophy doc talks about "navigating documents" with a harness that has tools. In the actual SDK, you have three real options: + +**Option A — `app.ai(tools=[...])` with custom tool definitions.** Define tools (e.g., `read_section(section_id)`, `search_document(query)`) the LLM can call iteratively. The `app.ai()` call becomes multi-turn automatically. + +**Option B — Loop yourself in a `@app.reasoner()`.** Chunk the document with a `@app.skill()`, fan out `app.ai()` calls per chunk via `asyncio.gather`, then synthesize. + +**Option C — `app.harness(provider="claude-code", tools=["read", "grep"])`.** Spawn a real coding agent CLI to navigate the document on the filesystem. Most powerful, also the most expensive. + +Pick A for in-process tool-calling, B for embarrassingly-parallel chunked analysis, C for "I need a real agent to do file system work". + +## The cost-of-being-wrong test + +Before choosing `.ai()` without tools, ask: **"What does it cost the system if this call gets the wrong answer?"** + +- Cheap to be wrong (a routing hint that gets corrected) → plain `.ai()` with `confident` flag +- Expensive to be wrong (a verdict the system commits to) → `.ai(tools=[...])` for iterative reasoning, or decompose into multiple narrower `.ai()` calls with adversarial verification + +The financial cost of more reasoner calls is real but bounded. The reputation cost of a confidently-wrong answer propagating through your pipeline is unbounded. diff --git a/skills/agentfield-multi-reasoner-builder/references/project-claude-template.md b/skills/agentfield-multi-reasoner-builder/references/project-claude-template.md new file mode 100644 index 000000000..415252390 --- /dev/null +++ b/skills/agentfield-multi-reasoner-builder/references/project-claude-template.md @@ -0,0 +1,118 @@ +# Project `CLAUDE.md` Template + +Every generated AgentField project ships with a `CLAUDE.md` at its root. This file is the contract that any *future* coding agent (including a fresh Claude Code session next week) must follow when extending the project. + +Without this file, the next agent will refactor the system back into a CrewAI-style chain. With it, the architecture survives. + +## Required structure + +Generate a `CLAUDE.md` with these exact sections, customized to the specific build. + +```markdown +# CLAUDE.md — + +## Mission + + + +External callers should hit `.` first. + +## Architecture at a glance + +- **Pattern(s):** +- **Topology:** one AgentField node (``) with N reasoners +- **Entry reasoner:** `` — orchestrates the full pipeline +- **Internal reasoners:** + - `` (`.ai()` / `.harness()`) — + - `` (`.ai()` / `.harness()`) — + - … +- **Inter-reasoner traffic:** all internal calls go through `app.call(".X", ...)`. Never direct HTTP. + +## Why this architecture (not a chain) + +<2–3 sentences explaining what makes this composite intelligence rather than a linear chain. Cite the dynamic-routing decisions, the parallelism, the harness/ai split. This is the "do not undo this" justification for the next agent.> + +## Primitive selection rules (binding) + +- `.ai()` is used ONLY at gates and routers (currently: ``). Every `.ai()` here has a `confident` field and a `.harness()` fallback. +- `.harness()` is used for ``. Each has hard caps on iterations and cost. +- `@app.skill()` is used for deterministic transforms (``). +- New reasoners default to `.harness()`. To use `.ai()`, prove the input fits in <2k tokens AND output fits in 4 flat fields AND there's a fallback. + +## Data-flow rules + +- Structured JSON between code and reasoners (when code branches on the result). +- Natural-language strings between reasoners that feed each other context. +- Hybrid only when both consumers exist. Do not use hybrid by default. + +## Model selection + +- Default model: `` via `AI_MODEL` env. +- The entry reasoner accepts an OPTIONAL `model` parameter in the request body. When present, it propagates to all child reasoners via `app.call(..., model=model)`. This lets users A/B models per request without redeploying. +- Provider keys: `OPENROUTER_API_KEY` (default), `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` — any LiteLLM-compatible model works. + +## Runtime contract + +- Local runtime is `docker-compose.yml` in this directory. +- One container: `agentfield/control-plane:latest` (local mode, SQLite/BoltDB). +- One container: this Python agent node, built from `Dockerfile`. +- The agent node depends on the control plane being healthy before it boots. +- Default ports: control plane `8080`, agent node `8001`. Override via env if needed. + +## Delivery contract — every change must preserve + +- ✅ A runnable `docker compose up --build` (validated with `docker compose config`) +- ✅ A valid `.env.example` listing all required keys +- ✅ A `README.md` with the exact verification ladder (health → nodes → capabilities → execute) +- ✅ The canonical curl smoke test in the README — body shape `{"input": {...kwargs...}}`, returns a real reasoned answer not a stub +- ✅ This `CLAUDE.md` + +## Validation commands (run after every change) + +```bash +python3 -m py_compile main.py +docker compose config > /dev/null +docker compose up --build -d +sleep 8 +curl -fsS http://localhost:8080/api/v1/health +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[].node_id' +curl -fsS http://localhost:8080/api/v1/discovery/capabilities | jq '.reasoners | map(select(.node_id=="")) | map(.name)' +# the canonical curl from README.md +docker compose down +``` + +If any of those fail, the change is not done. + +## Anti-patterns (reject these) + +- ❌ Direct HTTP between reasoners. All internal traffic uses `app.call`. +- ❌ Replacing a `.harness()` with `.ai()` "for speed" without proving the input fits. +- ❌ Adding a new reasoner without registering it through the entry reasoner OR through a router that's included in `main.py`. +- ❌ Removing the smoke test from README "because it's obvious." +- ❌ Hardcoding `node_id` in `app.call`. Always use `f"{app.node_id}.X"` so renaming the node doesn't break the system. +- ❌ Hardcoding the model. Always read from env (`AI_MODEL`) and accept a per-request override. +- ❌ Replacing the dynamic routing in `` with a static `for` loop. +- ❌ Unbounded loops or recursive harness spawns without explicit caps. +- ❌ Removing the `confident` field from a `.ai()` schema without replacing the validation check. + +## Extension points (where to safely add work) + +<3–5 bullets specific to the architecture. Examples:> +- Add a new analysis dimension: create a new `@app.reasoner()` that takes the same inputs as the existing dimension reviewers, and add it to the dispatch list in ``. +- Switch from `.ai()` intake to `.harness()` intake when inputs grow past 2 pages: replace `intake_router` with `intake_navigator` per `references/primitives.md` in the skill. +- Add provenance: have each dimension reviewer return citation keys, then add a `provenance_collector` that aggregates them into the final response. + +## Owner + +This system was scaffolded by the `agentfield-multi-reasoner-builder` skill. To rebuild, run that skill again with the same use case description. To extend, follow this CLAUDE.md. +``` + +## Generation rules + +When you write the actual `CLAUDE.md` for a build: + +1. **Fill in every ``.** Do not ship a CLAUDE.md with `` still in it. +2. **List every reasoner you actually generated** with its primitive (`.ai()` or `.harness()`) and one-line role. +3. **Justify the architecture** in 2–3 sentences. The "Why this architecture" section is the most important part — it tells the next agent what NOT to undo. +4. **Customize the extension points** to the specific build. Don't copy the generic examples. +5. **Match the validation commands to the actual reasoners and node ID.** No `` placeholders in the final file. diff --git a/skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md b/skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md new file mode 100644 index 000000000..5368a7a08 --- /dev/null +++ b/skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md @@ -0,0 +1,502 @@ +# Scaffold Recipe — Exact Files to Generate + +This is the file-by-file generation contract. Every AgentField multi-reasoner build produces ALL of these files. No omissions, no "I'll add that later." + +## Where it goes + +``` +examples/python_agent_nodes// +├── main.py +├── reasoners.py # if the system has > 4 reasoners +├── Dockerfile +├── docker-compose.yml +├── .env.example +├── .dockerignore +├── requirements.txt +├── README.md +└── CLAUDE.md +``` + +`` is lowercase-hyphenated, derived from the use case (e.g., `financial-reviewer`, `clinical-triage`, `sec-filing-auditor`). + +## Step 0: Use `af init` if it speeds you up, then layer on top + +```bash +cd /Users/santoshkumarradha/Documents/agentfield/code/platform/agentfield +go run ./control-plane/cmd/af init --language python --defaults --non-interactive +``` + +This produces `main.py`, `reasoners.py`, `requirements.txt`, `README.md`, `.gitignore`. You then **rewrite `main.py` and `reasoners.py`** with your real architecture and **add** the Docker / compose / CLAUDE.md / .env files. + +If `af init` gets in the way, just generate the files directly. The output matters, not the path. + +## File 1: `main.py` + +```python +""". + +Entry reasoner: `.` +Architecture: +""" +import asyncio +import os +from typing import Any + +from agentfield import Agent, AIConfig +from pydantic import BaseModel, Field + + +# ---- Schemas (type-hinted; AgentField derives them automatically) ---- + +class IntakePlan(BaseModel): + focus_areas: list[str] + confident: bool # MANDATORY on every .ai gate + +class TrackReview(BaseModel): + axis: str + score: int = Field(ge=1, le=10) + rationale: str + +class FinalVerdict(BaseModel): + overall: str + strengths: list[str] + risks: list[str] + + +# ---- Agent ---- + +app = Agent( + node_id=os.getenv("AGENT_NODE_ID", ""), + agentfield_server=os.getenv("AGENTFIELD_SERVER", "http://localhost:8080"), + ai_config=AIConfig( + model=os.getenv("AI_MODEL", "openrouter/anthropic/claude-3.5-sonnet"), + ), + dev_mode=True, +) + + +# ---- Internal reasoners ---- + +@app.reasoner() +async def intake_router( + payload: dict, + model: str | None = None, # propagate model +) -> IntakePlan: + plan = await app.ai( + system="You classify the input and pick the smallest set of analysis tracks needed.", + user=str(payload), + schema=IntakePlan, + model=model, + ) + if not plan.confident or not plan.focus_areas: + # FALLBACK: escalate (could be a chunked-loop reasoner or a deeper pass) + plan.focus_areas = ["default_a", "default_b"] + return plan + + +@app.reasoner() +async def dimension_reviewer( + payload: dict, + axis: str, + model: str | None = None, +) -> TrackReview: + return await app.ai( + system=f"You are a {axis} reviewer. Score and rationalize.", + user=f"Axis: {axis}\nPayload: {payload}", + schema=TrackReview, + model=model, + ) + + +# ---- Entry reasoner (the public surface) ---- + +@app.reasoner(tags=["entry"]) +async def review( + payload: dict, + model: str | None = None, # per-request model override +) -> dict: + plan_dict = await app.call( + f"{app.node_id}.intake_router", + payload=payload, + model=model, + ) + plan = IntakePlan(**plan_dict) + + # Parallel fan-out across selected dimensions + review_dicts = await asyncio.gather(*[ + app.call( + f"{app.node_id}.dimension_reviewer", + payload=payload, + axis=axis, + model=model, + ) + for axis in plan.focus_areas + ]) + + # Synthesize via another LLM reasoner — pass prose, not JSON + review_prose = "\n".join( + f"- [{r['axis']}] score={r['score']} — {r['rationale']}" + for r in review_dicts + ) + verdict = await app.ai( + system="You are the lead reviewer. Synthesize the dimension findings into a verdict.", + user=review_prose, + schema=FinalVerdict, + model=model, + ) + + return { + "plan": plan.model_dump(), + "reviews": review_dicts, + "verdict": verdict.model_dump(), + } + + +if __name__ == "__main__": + # app.run() auto-detects CLI vs server mode (verified at sdk/python/agentfield/agent.py:4194) + app.run(host="0.0.0.0", port=int(os.getenv("PORT", "8001")), auto_port=False) +``` + +**Hard requirements:** +- `node_id`, `agentfield_server`, `model` all read from env with sensible defaults +- `auto_port=False` so the port is deterministic and the curl works +- Exactly ONE entry reasoner with `tags=["entry"]` for discovery +- Schemas are derived from **type hints** — do NOT pass `input_schema=` or `output_schema=` to `@app.reasoner` (those parameters do not exist) +- Every `.ai()` gate has a `confident: bool` field in its schema and a fallback path +- Every reasoner that calls `.ai()` accepts an optional `model: str | None = None` parameter and threads it through `app.ai(model=model)` +- The entry reasoner accepts `model` and propagates it via `app.call(..., model=model)` to all children +- All inter-reasoner calls use `app.call(f"{app.node_id}.X", ...)` — never hardcoded node IDs +- Never `requests.post()` to another reasoner. Use `app.call` +- Use `app.run()` in `__main__`, not `app.serve()` + +## File 2: the `reasoners/` package (canonical layout for non-trivial systems) + +When the system has more than 4 reasoners, **use this canonical 4-file router package layout**. It separates concerns cleanly and makes the build extensible without breaking the orchestrator: + +``` +/ +├── main.py # Agent + entry reasoner + orchestration +└── reasoners/ + ├── __init__.py # Re-exports the routers so main.py can include them + ├── models.py # Pydantic schemas — every BaseModel used by every reasoner + ├── helpers.py # Plain Python utilities: math, prose renderers, fact registry, fallbacks + ├── specialists.py # AgentRouter for the parallel "hunter" / specialist reasoners + └── committee.py # AgentRouter for the orchestration-layer reasoners (intake router, adversarial reviewer, synthesizer) +``` + +**`reasoners/__init__.py`:** +```python +from .committee import router as committee_router +from .specialists import router as specialists_router + +__all__ = ["committee_router", "specialists_router"] +``` + +**`reasoners/models.py`** — every Pydantic schema in one place. Includes the input application schema, the per-specialist review schema (with `confident: bool` mandatory), the routing plan schema, the adversarial review schema, the final decision schema, and any deterministic-metric schemas. Keeping these in one file makes type-checking trivial and prevents circular imports between routers. + +**`reasoners/helpers.py`** — plain Python (NOT decorated with `@app.skill`) for: deterministic math (DTI, payment amount, employment-gap calc), `render_specialist_review()` and similar **prose renderers** that convert Pydantic instances to natural-language strings before passing them to another LLM, the fact-registry builder for citation IDs, and **fallback constructors** like `fallback_specialist_review(axis, reason)` that produce safe-default Pydantic instances when an `.ai()` call returns `confident=False`. + +> **Why plain helpers vs `@app.skill()`?** `@app.skill()` makes a function discoverable and callable through `app.call`. Use it when the deterministic function is something the system might call from a reasoner OR something an external caller might want to invoke directly through the control plane. For purely internal helpers used inside reasoner bodies (math, prose rendering, schema construction), plain Python is cleaner — no decorator overhead, no registration ceremony. Promote a helper to `@app.skill()` only when you actually want to call it via `app.call`. + +**`reasoners/specialists.py`** — one `AgentRouter(prefix="", tags=["specialist"])`, one `@router.reasoner` per analysis dimension. Often these specialists share a `_run_specialist_review()` private helper that takes a system prompt + focus prompt as parameters, runs `router.ai(...)`, and applies the `confident=False` fallback. This keeps each specialist body to ~5 lines of configuration. + +**`reasoners/committee.py`** — one `AgentRouter(prefix="", tags=["committee"])` with the orchestration-layer reasoners: `intake_router` (decides which specialists to run), `adversarial_challenger` (the HUNT→PROVE counterpart), `committee_reconciler` (synthesizes specialists + adversarials → final decision). + +**`main.py`** does three things: +1. Construct `Agent(...)` with `node_id`, `agentfield_server`, `ai_config` +2. `app.include_router(committee_router)` and `app.include_router(specialists_router)` +3. Define the public **entry reasoner** with `tags=["entry"]` that orchestrates the full pipeline using `app.call(f"{app.node_id}.X", ...)` and `asyncio.gather` for parallel fan-out, plus deterministic governance overrides at the end + +**This is the layout that emerges naturally** when you decompose a real composite-intelligence system. If your build has fewer than 4 reasoners, keep everything in `main.py` and skip the package. If it has more, use this layout. Do not invent a different layout. + +### Smaller systems (≤4 reasoners): keep everything in `main.py` + +For trivial builds, skip the package and inline everything. Use `@app.reasoner()` directly on `app`. Don't create a router with one reasoner in it. + +## File 3: `Dockerfile` + +**Use `af init --docker` to generate this. The command produces the universal shape below — do not customize.** + +```dockerfile +FROM python:3.11-slim + +ENV PYTHONDONTWRITEBYTECODE=1 \ + PYTHONUNBUFFERED=1 + +WORKDIR /app + +COPY requirements.txt /app/requirements.txt +RUN pip install --no-cache-dir --upgrade pip && \ + pip install --no-cache-dir -r /app/requirements.txt + +COPY . /app/ + +EXPOSE 8001 + +CMD ["python", "main.py"] +``` + +**Key properties of this Dockerfile (verified against `af init --docker`):** +- **Universal — no repo coupling.** The build context is the project directory itself (`docker-compose.yml` uses `context: .`), so the same scaffold works whether the project lives inside the agentfield repo at `examples/python_agent_nodes//` or completely standalone at `/tmp/my-build/`. +- The SDK is installed via `pip install -r requirements.txt`, where `requirements.txt` lists `agentfield`. **Do not** add `COPY sdk/python /tmp/python-sdk` — that's the old repo-coupled pattern, and it breaks for out-of-repo builds. +- `requirements.txt` must contain at least `agentfield` (one line). Add `pydantic>=2,<3` and any libraries the reasoners actually need. + +## File 4: `docker-compose.yml` + +**Use `af init --docker` to generate this. The command produces the universal shape below — do not customize unless you have a specific reason.** + +```yaml +services: + control-plane: + image: agentfield/control-plane:latest + environment: + AGENTFIELD_STORAGE_MODE: local + AGENTFIELD_HTTP_ADDR: 0.0.0.0:8080 + ports: + - "${AGENTFIELD_HTTP_PORT:-8080}:8080" + volumes: + - agentfield-data:/data + healthcheck: + test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"] + interval: 3s + timeout: 2s + retries: 20 + + : + build: + context: . + dockerfile: Dockerfile + environment: + AGENTFIELD_SERVER: http://control-plane:8080 + AGENT_CALLBACK_URL: http://:8001 + AGENT_NODE_ID: ${AGENT_NODE_ID:-} + OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-} + OPENAI_API_KEY: ${OPENAI_API_KEY:-} + ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-} + GOOGLE_API_KEY: ${GOOGLE_API_KEY:-} + AI_MODEL: ${AI_MODEL:-openrouter/anthropic/claude-3.5-sonnet} + PORT: ${PORT:-8001} + ports: + - "${AGENT_NODE_PORT:-8001}:8001" + depends_on: + control-plane: + condition: service_healthy + restart: on-failure + +volumes: + agentfield-data: +``` + +**Build context is `.` (the project directory itself), not `../../..`.** This makes the scaffold portable to any location on disk. All four provider env vars are exposed with `:-` defaults so missing keys don't crash compose validation. + +**Hard requirements:** +- Control plane has a healthcheck so the agent only starts after the control plane is ready +- Agent uses `depends_on: condition: service_healthy` (not just `depends_on: [control-plane]`) +- All three common provider env vars (`OPENROUTER_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) are exposed so the user can swap providers without editing compose +- Default model is OpenRouter Claude 3.5 Sonnet (most reliable for reasoning) but trivially overridable via `AI_MODEL` +- Port 8080 = control plane, port 8001 = agent node, never co-located + +## File 5: `.env.example` + +```bash +# Required: pick ONE provider +OPENROUTER_API_KEY=sk-or-v1-... +# OPENAI_API_KEY=sk-... +# ANTHROPIC_API_KEY=sk-ant-... + +# Model — must match the provider above +AI_MODEL=openrouter/anthropic/claude-3.5-sonnet +# AI_MODEL=gpt-4o +# AI_MODEL=anthropic/claude-3-5-sonnet-20241022 + +# Optional overrides +AGENT_NODE_ID= +AGENT_NODE_PORT=8001 +AGENTFIELD_HTTP_PORT=8080 +``` + +## File 6: `requirements.txt` + +``` +# The agentfield SDK is installed from the local repo by the Dockerfile. +# Keep this file to additional runtime deps the reasoners need. +pydantic>=2.0 +``` + +Add libraries the reasoners actually use (httpx, beautifulsoup4, pdfplumber, etc.). Don't list `agentfield` here — it comes from the local SDK copy. + +## File 7: `.dockerignore` + +``` +__pycache__ +*.pyc +.pytest_cache +.env +.venv +*.log +``` + +## File 8: `README.md` + +```markdown +# + +<2-sentence description.> + +## Architecture + +- **Entry reasoner:** `.` +- **Pattern(s):** +- **Reasoners:** + - `intake_router` — `.ai()` gate that classifies inputs and selects active dimensions + - `_reviewer` — analyzer for dimension A (parallel) + - `_reviewer` — analyzer for dimension B (parallel) + - `synthesizer` — combines dimension findings into a final verdict + +## Run + +```bash +cp .env.example .env +# edit .env and set OPENROUTER_API_KEY (or your provider of choice) +docker compose up --build +``` + +Wait until you see `agent registered` in the logs. + +## Verify (run in another terminal) + +```bash +# 1. Control plane is up +curl -fsS http://localhost:8080/api/v1/health | jq + +# 2. Agent node has registered +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[] | {id: .node_id, status: .status}' + +# 3. All reasoners are discoverable (look for tags=["entry"]) +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.reasoners[] | select(.node_id=="") | {name, tags}' +``` + +## Run a real reasoned answer + +**Important:** the control plane wraps reasoner kwargs in an `input` field. Body shape is `{"input": {...kwargs...}}` — verified against `control-plane/internal/handlers/execute.go`. + +```bash +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{ + "input": { + "": "", + "": , + "model": "openrouter/anthropic/claude-3.5-sonnet" + } + }' | jq +``` + +The optional `"model"` field overrides the AIConfig default for THIS request. Try different models: + +```bash +# Same request, different model +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"": "...", "model": "openrouter/openai/gpt-4o"}}' | jq +``` + +## Showpiece — see the cryptographic workflow trail + +```bash +LAST_EXEC=$(curl -s http://localhost:8080/api/v1/executions | jq -r '.[0].workflow_id') +curl -s http://localhost:8080/api/v1/did/workflow/$LAST_EXEC/vc-chain | jq +``` + +This is the verifiable credential chain — every reasoner that ran, with cryptographic provenance. No other agent framework gives you this. + +## Stop + +```bash +docker compose down +docker compose down --volumes # also clears local control-plane state +``` +``` + +## File 9: `CLAUDE.md` + +See `references/project-claude-template.md` for the template. Generate it specific to this build. + +## Generation order (do these in this order) + +1. Decide the architecture (pattern + reasoner roles + which are `.ai()` vs `.harness()`) +2. Create the directory `examples/python_agent_nodes//` +3. Write `main.py` with real reasoners (NOT a placeholder) +4. Write `requirements.txt`, `Dockerfile`, `.dockerignore` +5. Write `docker-compose.yml` +6. Write `.env.example` +7. Write `CLAUDE.md` (use the template from `references/project-claude-template.md`) +8. Write `README.md` with the actual curl payload for THIS use case +9. Validate (see next section) + +## Validation (every build) + +### Online validation (when Docker can pull images and you have a key) + +```bash +# 1. Python syntax — must pass +python3 -m py_compile examples/python_agent_nodes//main.py +# Plus any reasoner files if you split with routers: +python3 -m py_compile examples/python_agent_nodes//reasoners/*.py + +# 2. Compose file is valid +cd examples/python_agent_nodes/ +OPENROUTER_API_KEY=sk-or-v1-FAKE docker compose config > /dev/null + +# 3. Start the stack and run the smoke test +docker compose up --build -d +sleep 10 && curl -fs http://localhost:8080/api/v1/health +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"...": "..."}}' +docker compose logs --tail=50 +docker compose down +``` + +### Offline validation (sandbox / CI / no docker pull) + +If the environment cannot pull `agentfield/control-plane:latest` or doesn't have a real provider key, you **still validate**. These are the static checks that count as "validated": + +```bash +# Syntax check +python3 -m py_compile examples/python_agent_nodes//main.py +python3 -m py_compile examples/python_agent_nodes//reasoners/*.py 2>/dev/null || true + +# Compose syntax check (no image pull required) +cd examples/python_agent_nodes/ +OPENROUTER_API_KEY=sk-or-v1-FAKE docker compose config > /dev/null +``` + +Then **run this visual-invariant checklist** against the generated files. Every box must be checked: + +- [ ] `app.run(...)` in `__main__` (NOT `app.serve(...)`) +- [ ] Entry reasoner has `tags=["entry"]` +- [ ] Every `app.ai(...)` call's schema includes a `confident: bool` field if used as a gate, AND the call site has a fallback path +- [ ] Every reasoner that calls `app.ai(...)` accepts `model: str | None = None` and threads `model=model` +- [ ] Entry reasoner accepts `model` and propagates via `app.call(..., model=model)` to every child +- [ ] All `app.call(...)` use `f"{app.node_id}.X"` — no hardcoded node IDs +- [ ] No `requests.post()` / `httpx.post()` between reasoners (use `app.call`) +- [ ] No `app.harness(provider="...")` unless the Dockerfile installs the CLI AND main.py has a startup `shutil.which()` check +- [ ] No `input_schema=` / `output_schema=` parameters on `@app.reasoner()` +- [ ] README curl uses body shape `{"input": {...kwargs...}}` (NOT raw kwargs at top level) +- [ ] `Agent(agentfield_server=os.getenv("AGENTFIELD_SERVER", ...))` — exact parameter name +- [ ] `AGENT_CALLBACK_URL` set in compose to the in-network DNS name (`http://:8001`) +- [ ] Control plane has a healthcheck and the agent service uses `condition: service_healthy` +- [ ] `auto_port=False` in `app.run()` so the port is deterministic +- [ ] CLAUDE.md exists with no `` tokens left in it +- [ ] `.env.example` lists `OPENROUTER_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` +- [ ] If reasoners are split across files: routers use `prefix=""` (or document the namespacing in the curl path) +- [ ] LLM-to-LLM context is passed as natural-language strings, not raw JSON dicts +- [ ] Returning `dict` from an orchestrator reasoner is fine — Pydantic model returns are also fine — both work because schemas come from type hints + +If any box fails, **fix before handing off**. A "scaffold that almost works" is worth zero. + +### Return-type note + +Orchestrator reasoners that return heterogeneous results (e.g. `{"plan": ..., "reviews": [...], "verdict": ...}`) should declare `-> dict` as the return type. Single-purpose reasoners that produce one validated result should declare `-> SomePydanticModel`. Both work — schemas are derived from the type hint either way. diff --git a/skills/agentfield-multi-reasoner-builder/references/verification.md b/skills/agentfield-multi-reasoner-builder/references/verification.md new file mode 100644 index 000000000..4b0a6163d --- /dev/null +++ b/skills/agentfield-multi-reasoner-builder/references/verification.md @@ -0,0 +1,108 @@ +# Verification — Prove the Build Is Real + +A scaffold that "looks right" but isn't actually wired up is worse than no scaffold. The control plane exposes a discovery API that lets you prove the system works in seconds. Use it. + +## The verification ladder (run all four, in order) + +```bash +# 1. Control plane health +curl -fsS http://localhost:8080/api/v1/health | jq + +# 2. Agent node has registered itself with the control plane +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[] | {id: .node_id, status, last_seen}' + +# 3. Every reasoner you defined is discoverable +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq --arg slug "" '.reasoners[] | select(.node_id==$slug) | {name, tags, description}' + +# 4. The entry reasoner produces a real reasoned answer +# NOTE: control plane wraps kwargs in {"input": {...}} (verified at execute.go:1000) +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{ + "input": { + "": "", + "": , + "model": "openrouter/anthropic/claude-3.5-sonnet" + } + }' | jq +``` + +If any step fails, **do not hand off**. Diagnose and fix. + +## Common failures and fast diagnosis + +| Symptom | Likely cause | Fix | +|---|---|---| +| `/api/v1/health` hangs or refuses connection | Control plane container is still booting | Wait 5–10s, retry. If still failing, `docker compose logs control-plane` | +| `/api/v1/nodes` returns `[]` | Agent node hasn't registered. Network issue or agent crashed at boot | `docker compose logs ` — look for `OPENROUTER_API_KEY` missing, import errors, or `agent registered` | +| Node listed but no reasoners in `/discovery/capabilities` | The Python file imported, but the `@app.reasoner()` decorators didn't run (e.g., reasoners are in a router that wasn't included) | Verify `app.include_router(...)` is called in `main.py` before `app.run()` | +| Reasoners present but execute hangs | Reasoner is making an LLM call that's failing silently | `docker compose logs --follow` while running curl. Look for litellm errors | +| Execute returns 500 with "model not found" | `AI_MODEL` env var doesn't match the provider key you set | Check `.env` — `OPENROUTER_API_KEY` requires `openrouter/...` model names, etc. | +| Execute returns 200 but the output is empty/garbage | The reasoner ran but the architecture is wrong (e.g., `.ai()` got truncated input) | Look at logs to see what input each reasoner actually got | + +## Useful introspection endpoints + +| Endpoint | What it tells you | +|---|---| +| `GET /api/v1/health` | Control plane up | +| `GET /api/v1/nodes` | Which agent nodes have registered | +| `GET /api/v1/nodes/:node_id` | Details of one node | +| `GET /api/v1/discovery/capabilities` | All reasoners and skills across all nodes | +| `GET /api/v1/agentic/discover?q=` | Search the API catalog by keyword (use to find an endpoint you forgot) | +| `POST /api/v1/execute/:target` | Sync execute a reasoner. Body is the kwargs dict | +| `POST /api/v1/execute/async/:target` | Async execute, returns an execution_id | +| `GET /api/v1/executions/:id` | Status of an async execution | +| `GET /api/v1/did/workflow/:workflow_id/vc-chain` | Verifiable credential chain for an executed workflow (the AgentField superpower no other framework has) | + +## Inspect the live workflow DAG + +After running an execution, hit: + +```bash +# Get the most recent executions +curl -s http://localhost:8080/api/v1/executions | jq '.[0:3]' + +# Get the VC chain for one — this shows you the full reasoning DAG with cryptographic provenance +curl -s http://localhost:8080/api/v1/did/workflow//vc-chain | jq +``` + +This is the **single best demo** of why AgentField beats CrewAI: you get a cryptographic, replayable, introspectable record of every reasoner that ran, what it called, and what came back. Show the user this output in the handoff — it makes the "this is composite intelligence as production infrastructure" case for itself. + +## The smoke-test contract (every build) + +In the README, give the user EXACTLY these commands in this order. Do not abbreviate. Do not say "and so on." + +```bash +# After docker compose up, in another terminal: + +# 1. Health +curl -fsS http://localhost:8080/api/v1/health + +# 2. Node registered? +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[].node_id' + +# 3. Reasoners discoverable? +curl -fsS http://localhost:8080/api/v1/discovery/capabilities | jq '.reasoners | map(select(.node_id=="")) | map(.name)' + +# 4. THE BIG ONE — run the entry reasoner with real data +# Body shape: {"input": {...kwargs...}} — kwargs are NEVER raw at the top level +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"": "", "model": "openrouter/anthropic/claude-3.5-sonnet"}}' | jq + +# 5. (Optional showpiece) the full verifiable workflow chain +LAST_EXEC=$(curl -s http://localhost:8080/api/v1/executions | jq -r '.[0].workflow_id') +curl -s http://localhost:8080/api/v1/did/workflow/$LAST_EXEC/vc-chain | jq +``` + +## When you cannot run docker locally + +If the environment running the skill doesn't have Docker, you can still: + +1. `python3 -m py_compile main.py` — catches syntax errors +2. `docker compose config` — catches compose errors +3. Read the generated files back with `cat` to spot obvious issues +4. Provide the verification commands in the README as a checklist for the user to run themselves + +You **must** still validate the Python and the compose file syntactically. "I generated it but didn't check" is a failure mode. From 3ae3c86c93e7a2c2d472a1884c5285131ce0fe55 Mon Sep 17 00:00:00 2001 From: Santosh Date: Wed, 8 Apr 2026 13:32:22 +0530 Subject: [PATCH 2/4] fix(skill+cli): 5 bugs from end-to-end test + reasoner-as-API philosophy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Surfaced by the first end-to-end docker test of a codex-built medical-triage backend. Fixes 5 real bugs that hid behind py_compile + docker compose config validation, plus pushes the architecture philosophy from "flat orchestrator fans out specialists" to "deep DAG of reasoners as software APIs". ## Bugs fixed 1. **Broken healthcheck — agentfield/control-plane:latest is distroless.** The image has no /bin/sh, no wget, no curl. The CMD-based healthcheck ["wget", "--quiet", ...] always failed, blocking every first build with "dependency failed to start: container is unhealthy". Drop the healthcheck entirely + switch depends_on to condition: service_started. The agent SDK already retries connection on startup. File: control-plane/internal/templates/docker/docker-compose.yml.tmpl 2. **Dead default model — openrouter/anthropic/claude-3.5-sonnet returns 404 from OpenRouter** (litellm.NotFoundError: No endpoints found for anthropic/claude-3.5-sonnet). Every previously generated example would crash on first real curl. Replace with openrouter/google/gemini-2.5-flash (verified working in the live test) across: - SKILL.md, all 6 reference files - control-plane/internal/cli/doctor.go (Recommendation block) - control-plane/internal/cli/init.go (--default-model default) - control-plane/internal/templates/templates.go (TemplateData doc comment) - control-plane/internal/templates/python/main.py.tmpl (env default) 3. **90s sync execute timeout undocumented.** The control plane has a hard 90-second timeout on POST /api/v1/execute/. Slow models (minimax- m2.7, Claude Sonnet, o1) and large fan-outs blow it. Generated systems would hit HTTP 400 {"error":"execution timeout after 1m30s"} with no guidance. Document the limit + the async fallback path (POST /api/v1/execute/async) in verification.md, plus point at gemini-2.5-flash as the recommended fast default. 4. **Discovery API curl shape was wrong everywhere.** The skill teaches `.reasoners[] | select(.node_id=="X") | .name` but the actual response is `.capabilities[].reasoners[]` with `agent_id` (not `node_id`) and `id` (not `name`). Same for /api/v1/nodes — its default ?health_status= active filter hides healthy nodes that haven't reported "active" yet, so use ?health_status=any. Fix in SKILL.md and verification.md. 5. **Python init template violated the skill's own hard rules.** The scaffold from `af init` was using app.serve(auto_port=True) and hardcoding agentfield_server, which the skill explicitly rejects. Codex had to fully rewrite main.py on every build. Update the template to use app.run(auto_port=False), env-driven AGENT_NODE_ID/AGENTFIELD_SERVER/ AI_MODEL/PORT, and a real AIConfig. The scaffold is now consistent with the skill's mandatory patterns out of the box. ## New philosophy: reasoners as software APIs Codex's first build (and the loan-underwriter before it) produced a "fat orchestrator + flat specialists" star pattern: depth-2 DAG, single-layer parallelism, every specialist has a 50-line .ai() prompt, no reuse across branches. That's basically asyncio.gather([llm_call_1, llm_call_2, ...]) with extra ceremony. The right shape is **deep composition cascade**: each reasoner has a single cognitive responsibility, the orchestrator pushes calls DOWN into sub-reasoners, parallelism happens at multiple depths, common sub-reasoners get reused across branches. Each reasoner has a one-line API contract you could write down — they are software APIs. Added to the skill: - New mandatory section "The unit of intelligence is the reasoner — treat them as software APIs" in SKILL.md, with bad/good shape ASCII diagrams, concrete decomposition rules (30-line ceiling, single-judgment rule, reuse-signal extraction), and depth ≥ 3 minimum - New "Reasoner Composition Cascade" pattern (#8) in architecture-patterns.md marked as the master pattern that every other pattern layers onto - Updated "How to pick a pattern" picker to start from cascade as the backbone instead of treating it as one option among many - HARD GATE updated: "If you cannot draw your system as a non-trivial graph with depth ≥ 3, you have not architected anything" - Grooming rule conflict resolved: the skip-the-question rule now lives inside the HARD GATE block so agents see them together, not as competing instructions in separate sections ## Tested end-to-end Live test of the v1 medical-triage build: - docker compose up --build → both containers up - 9 reasoners discovered through /api/v1/discovery/capabilities - Real curl with the Maria Hernandez patient case → CALL_911_NOW with full provenance, 17 second wall clock, HTTP 200, 16KB structured response - The adversarial reviewer correctly steel-manned Pulmonary Embolism (because the chest pain is pleuritic) on top of the AMI primary concern - Deterministic governance overrides fired correctly when committee confidence dipped — the safe-default fallback pattern works in production The build only succeeded after the manual healthcheck patch + the model swap to gemini-2.5-flash. Both fixes are now baked into the templates so the next codex run will produce a working build on first try. Co-Authored-By: Claude Opus 4.6 (1M context) --- control-plane/internal/cli/doctor.go | 4 +- control-plane/internal/cli/init.go | 2 +- .../templates/docker/docker-compose.yml.tmpl | 10 +-- .../internal/templates/python/main.py.tmpl | 35 +++++--- control-plane/internal/templates/templates.go | 2 +- .../SKILL.md | 86 ++++++++++++++++-- .../references/anti-patterns.md | 2 +- .../references/architecture-patterns.md | 89 +++++++++++++++++-- .../references/choosing-primitives.md | 4 +- .../references/project-claude-template.md | 2 +- .../references/scaffold-recipe.md | 8 +- .../references/verification.md | 36 +++++--- 12 files changed, 222 insertions(+), 58 deletions(-) diff --git a/control-plane/internal/cli/doctor.go b/control-plane/internal/cli/doctor.go index 6abcae296..85d641c36 100644 --- a/control-plane/internal/cli/doctor.go +++ b/control-plane/internal/cli/doctor.go @@ -70,7 +70,7 @@ var providerEnvVars = []struct { EnvVar string Model string // suggested default model when this provider is the chosen one }{ - {Name: "openrouter", EnvVar: "OPENROUTER_API_KEY", Model: "openrouter/anthropic/claude-3.5-sonnet"}, + {Name: "openrouter", EnvVar: "OPENROUTER_API_KEY", Model: "openrouter/google/gemini-2.5-flash"}, {Name: "anthropic", EnvVar: "ANTHROPIC_API_KEY", Model: "claude-3-5-sonnet-20241022"}, {Name: "openai", EnvVar: "OPENAI_API_KEY", Model: "gpt-4o"}, {Name: "google", EnvVar: "GOOGLE_API_KEY", Model: "gemini-1.5-pro"}, @@ -178,7 +178,7 @@ func buildDoctorReport(controlPlaneURL string) DoctorReport { notes := []string{} if chosenProvider == "" { chosenProvider = "none" - chosenModel = "openrouter/anthropic/claude-3.5-sonnet" + chosenModel = "openrouter/google/gemini-2.5-flash" notes = append(notes, "No provider API key detected. Set OPENROUTER_API_KEY (recommended) or OPENAI_API_KEY / ANTHROPIC_API_KEY before building.") } else { notes = append(notes, fmt.Sprintf("Provider key detected: %s. Default model: %s", chosenProvider, chosenModel)) diff --git a/control-plane/internal/cli/init.go b/control-plane/internal/cli/init.go index 5434c6fc9..91775a044 100644 --- a/control-plane/internal/cli/init.go +++ b/control-plane/internal/cli/init.go @@ -487,7 +487,7 @@ Example: // and README.md are intentionally NOT generated — the skill produces them // after the agent has written real reasoners. cmd.Flags().BoolVar(&withDocker, "docker", false, "Also generate a Docker scaffold (Dockerfile, docker-compose.yml, .env.example, .dockerignore)") - cmd.Flags().StringVar(&defaultModel, "default-model", "openrouter/anthropic/claude-3.5-sonnet", "Default AI_MODEL string baked into the docker scaffold (LiteLLM-style, e.g. gpt-4o, anthropic/claude-3-5-sonnet-20241022)") + cmd.Flags().StringVar(&defaultModel, "default-model", "openrouter/google/gemini-2.5-flash", "Default AI_MODEL string baked into the docker scaffold (LiteLLM-style, e.g. gpt-4o, anthropic/claude-3-5-sonnet-20241022)") // Hidden flags — sensible defaults; only set when you have a real reason. cmd.Flags().StringVar(&controlPlaneImage, "control-plane-image", "agentfield/control-plane:latest", "") diff --git a/control-plane/internal/templates/docker/docker-compose.yml.tmpl b/control-plane/internal/templates/docker/docker-compose.yml.tmpl index 32b9a048d..7f61f68e7 100644 --- a/control-plane/internal/templates/docker/docker-compose.yml.tmpl +++ b/control-plane/internal/templates/docker/docker-compose.yml.tmpl @@ -8,11 +8,9 @@ services: - "${AGENTFIELD_HTTP_PORT:-{{.ControlPlanePort}}}:8080" volumes: - agentfield-data:/data - healthcheck: - test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"] - interval: 3s - timeout: 2s - retries: 20 + # NOTE: agentfield/control-plane:latest is a distroless image — no sh, no wget, no curl. + # A CMD-based healthcheck cannot run inside the container. The agent SDK retries + # connection to the control plane on startup, so service_started is sufficient. {{.NodeID}}: build: @@ -32,7 +30,7 @@ services: - "${AGENT_NODE_PORT:-{{.AgentPort}}}:{{.AgentPort}}" depends_on: control-plane: - condition: service_healthy + condition: service_started restart: on-failure volumes: diff --git a/control-plane/internal/templates/python/main.py.tmpl b/control-plane/internal/templates/python/main.py.tmpl index db91d49fc..5e7b84e98 100644 --- a/control-plane/internal/templates/python/main.py.tmpl +++ b/control-plane/internal/templates/python/main.py.tmpl @@ -1,25 +1,32 @@ +"""{{.ProjectName}} — AgentField agent node. + +When you build a multi-reasoner system, REWRITE this file and the reasoners +package per the agentfield-multi-reasoner-builder skill's scaffold-recipe. +This template ships with one minimal entry reasoner so the scaffold runs +end-to-end on day one. +""" + +import os + from agentfield import Agent, AIConfig + from reasoners import reasoners_router -# Basic agent setup - works immediately app = Agent( - node_id="{{.NodeID}}", - agentfield_server="http://localhost:8080", + node_id=os.getenv("AGENT_NODE_ID", "{{.NodeID}}"), + agentfield_server=os.getenv("AGENTFIELD_SERVER", "http://localhost:8080"), version="1.0.0", + ai_config=AIConfig( + # LiteLLM auto-detects provider from the model string. + # Override per-request by passing model="..." to your entry reasoner. + model=os.getenv("AI_MODEL", "openrouter/google/gemini-2.5-flash"), + ), dev_mode=True, - - # 🔧 Uncomment to enable AI features: - # ai_config=AIConfig( - # model="gpt-4o", # LiteLLM auto-detects provider from model name - # # Optional: api_key=os.getenv("OPENAI_API_KEY"), # or set OPENAI_API_KEY env var - # # temperature=0.7, - # # max_tokens=4096, - # ), ) -# Include reasoners from separate file app.include_router(reasoners_router) if __name__ == "__main__": - # Auto-discover available port starting from 8000 - app.serve(auto_port=True, dev=True, reload=False) + # app.run() auto-detects CLI vs server mode (sdk/python/agentfield/agent.py:4194). + # auto_port=False keeps the port deterministic so the README curl works. + app.run(host="0.0.0.0", port=int(os.getenv("PORT", "8001")), auto_port=False) diff --git a/control-plane/internal/templates/templates.go b/control-plane/internal/templates/templates.go index 16c27ac1a..03ec78372 100644 --- a/control-plane/internal/templates/templates.go +++ b/control-plane/internal/templates/templates.go @@ -25,7 +25,7 @@ type TemplateData struct { ControlPlaneImage string // "agentfield/control-plane:latest" ControlPlanePort int // 8080 AgentPort int // 8001 - DefaultModel string // "openrouter/anthropic/claude-3.5-sonnet" + DefaultModel string // "openrouter/google/gemini-2.5-flash" } // GetTemplate retrieves a specific template by its path. diff --git a/skills/agentfield-multi-reasoner-builder/SKILL.md b/skills/agentfield-multi-reasoner-builder/SKILL.md index 9614c8bc1..c2fa3a97d 100644 --- a/skills/agentfield-multi-reasoner-builder/SKILL.md +++ b/skills/agentfield-multi-reasoner-builder/SKILL.md @@ -10,14 +10,83 @@ You are not a prompt engineer. You are a **systems architect** building composit ## HARD GATE — READ BEFORE ANYTHING ELSE > **Do NOT write any code, generate any file, or scaffold any project until you have:** -> 1. Asked the user the ONE grooming question (below) and received their answer +> 1. Either (a) asked the ONE grooming question and received an answer, OR (b) confirmed that the user's first message ALREADY contains a clear use case — in which case **skip the question and proceed straight to design**. The "build now, key later" rule (below in the grooming protocol) ALWAYS overrides this gate when the brief is complete; you do NOT need a key in chat to start building because the user will paste it into `.env` themselves > 2. Read `references/choosing-primitives.md` (mandatory — sets the philosophy and the real SDK signatures) -> 3. Designed the reasoner topology (which `@app.reasoner` units, who calls whom, which are `.ai` vs deterministic skills, where the dynamic routing happens) +> 3. Designed the reasoner topology with **depth, not just breadth** (see "Reasoners are software APIs" below) — which `@app.reasoner` units, who calls whom, which are `.ai` vs deterministic skills, where the dynamic routing happens > -> **Do NOT default to a single big reasoner with one `app.ai` call.** That's a CrewAI clone. Decompose. If you cannot draw your system as a non-trivial graph, you have not architected anything. +> **Do NOT default to a single big reasoner with one `app.ai` call.** That's a CrewAI clone. Decompose. +> +> **Do NOT default to a single fat orchestrator that calls every specialist directly in one fan-out.** That's a star pattern, also a CrewAI clone wearing a different costume. Build deep call chains (see below). +> +> If you cannot draw your system as a non-trivial graph **with depth ≥ 3**, you have not architected anything. > > Violating the letter of this gate is violating the spirit of the gate. There are no exceptions for "simple" use cases. +## The unit of intelligence is the reasoner — treat them as software APIs + +This is the most important framing in the entire skill. **Each reasoner is a microservice. Reasoners call other reasoners the way one REST API calls another.** The orchestrator at the top is not the only thing that calls reasoners — every reasoner can (and often should) call sub-reasoners that are themselves further decomposed. + +**Bad shape — flat star (the default a coding agent will reach for):** +``` +entry_orchestrator +├── specialist_1 ──┐ +├── specialist_2 ──┤ +├── specialist_3 ──┼── all called once, in parallel, by the orchestrator +├── specialist_4 ──┤ +└── specialist_5 ──┘ + │ + v + synthesizer +``` + +This is depth = 2 (entry → specialist → done). It's basically `asyncio.gather([llm_call_1, llm_call_2, ...])` with extra ceremony. Easy to write, but it doesn't earn the AgentField label. + +**Good shape — composition cascade (depth ≥ 3, parallelism at multiple levels):** +``` +triage_case (entry) +├── case_classifier ─────────────┐ +│ └── chief_complaint_parser │ +│ └── medical_term_normalizer +│ +├── ami_assessor │ ← all parallel +│ ├── cardiac_risk_calculator (deterministic skill) +│ ├── ami_pattern_matcher (.ai) +│ │ └── ecg_finding_classifier (.ai called by ami_pattern_matcher when needed) +│ └── biomarker_predictor (.ai) +│ +├── pe_assessor │ +│ ├── wells_score_calculator (deterministic skill) +│ ├── dyspnea_grader (.ai) +│ └── dvt_history_checker (.ai) +│ +├── stroke_assessor │ +│ ├── fast_screen (.ai) +│ └── nihss_estimator (.ai called only if fast_screen positive) +│ +└── adversarial_synthesizer ─────┘ + ├── steel_man_alternative_dx (.ai called once per primary assessment) + └── confidence_reconciler (.ai) + └── deterministic_safety_overrides (plain Python) +``` + +This system has depth 4, runs **at least three parallelism waves**, and each "specialist" is itself composed of 2–4 sub-reasoners that may call each other. **Each reasoner has a single cognitive responsibility you could write a one-line API contract for.** Reasoners that always co-execute become one reasoner; reasoners that have distinct judgment surfaces stay separate. + +**Why this matters:** +1. **Each reasoner is replaceable.** Want to swap `wells_score_calculator` for a more accurate one? Change one file. The flat-star pattern would have that logic buried inside a 200-line `pe_assessor` reasoner. +2. **Each reasoner is testable in isolation.** You can `curl /api/v1/execute/medical-triage.wells_score_calculator` directly with a synthetic input. The flat-star pattern only exposes the entry reasoner. +3. **Each reasoner is reusable.** `medical_term_normalizer` can be called from `chief_complaint_parser` AND from `comorbidity_amplifier` AND from a future `discharge_summary_generator`. The flat-star pattern duplicates logic across specialists. +4. **Each reasoner is observable.** The control plane workflow DAG shows the full call tree, not just a single `gather`. The verifiable credential chain has structure. +5. **Parallelism happens at multiple levels.** The flat-star fan-outs N specialists once. The deep DAG fans out N specialists × M sub-calls each, with the orchestration `asyncio.gather`-ing at each layer. Total wall-clock time goes down even though total calls go up. + +**Concrete rules:** +- If a reasoner has more than ~30 lines of body code, it's probably 2 reasoners +- If two reasoners always call each other in sequence, they should be one reasoner (or one reasoner with a deterministic helper) +- If your entry reasoner is the ONLY thing that calls `app.call`, the architecture is too flat — push the calls down into the specialists +- If your topology can be drawn as a literal star, throw it out and design for depth +- A reasoner should have a clear API contract you could write in one sentence: *"Given X, return Y. Calls Z, W."* + +**The unit of intelligence is the reasoner. Treat them like software APIs and the system writes itself.** + ## The non-negotiable promise Every invocation of this skill must end with the user able to run **two commands** and get a working multi-reasoner system: @@ -268,14 +337,15 @@ After the stack is up, open these URLs in your browser: ```bash # 1. Control plane up? -curl -fsS http://localhost:8080/api/v1/health | jq +curl -fsS http://localhost:8080/api/v1/health | jq '.status' -# 2. Agent node registered? -curl -fsS http://localhost:8080/api/v1/nodes | jq '.[] | {id: .node_id, status: .status}' +# 2. Agent node registered? (use ?health_status=any — default filter can hide healthy nodes) +curl -fsS 'http://localhost:8080/api/v1/nodes?health_status=any' | jq '.nodes[] | {id: .node_id, status: .status}' # 3. All reasoners discoverable? +# Response shape: .capabilities[].reasoners[].id (NOT .reasoners[].name) curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ - | jq '.reasoners[] | select(.node_id=="") | {name, tags}' + | jq '.capabilities[] | select(.agent_id=="") | .reasoners | map({id, tags})' ``` ### 7. 🎯 Try it — sample curl @@ -287,7 +357,7 @@ curl -X POST http://localhost:8080/api/v1/execute/. \ "input": { "": "", "": , - "model": "openrouter/anthropic/claude-3.5-sonnet" + "model": "openrouter/google/gemini-2.5-flash" } }' | jq ``` diff --git a/skills/agentfield-multi-reasoner-builder/references/anti-patterns.md b/skills/agentfield-multi-reasoner-builder/references/anti-patterns.md index 1b675faab..e6966ef2a 100644 --- a/skills/agentfield-multi-reasoner-builder/references/anti-patterns.md +++ b/skills/agentfield-multi-reasoner-builder/references/anti-patterns.md @@ -90,7 +90,7 @@ When the user (or your own drift) pushes you toward one of these, name the rule, ### 10. Hardcoded model strings ❌ `ai_config=AIConfig(model="gpt-4o")` -✅ `ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/anthropic/claude-3.5-sonnet"))` AND accept a `model` parameter on the entry reasoner that propagates via `app.call(..., model=model)`. +✅ `ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/google/gemini-2.5-flash"))` AND accept a `model` parameter on the entry reasoner that propagates via `app.call(..., model=model)`. **Why:** Users need to swap models per-request to A/B test without rebuilding the container. Make the model dynamic at three layers: env default, container override, per-request override. diff --git a/skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md b/skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md index d83fc4bed..fef4201f3 100644 --- a/skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md +++ b/skills/agentfield-multi-reasoner-builder/references/architecture-patterns.md @@ -210,7 +210,80 @@ inner (coding) ──> per-task agent ──> code --- -## 8. Reactive Document Enrichment +## 8. Reasoner Composition Cascade (READ THIS — it's the master pattern) + +**This is the pattern that distinguishes a real AgentField system from a fancy `asyncio.gather` wrapper.** Every other pattern in this file should be interpreted through this lens. + +**Shape — depth, not breadth:** + +``` +entry_reasoner +├── classifier_reasoner ─────────────────┐ +│ ├── input_normalizer (skill) │ +│ └── intent_extractor (.ai) │ +│ └── slot_filler (.ai called by intent_extractor when ambiguous) +│ +├── analysis_dimension_A_reasoner ────── ┤ ← all parallel via asyncio.gather +│ ├── deterministic_metric_calc (skill) +│ ├── pattern_judge (.ai) +│ │ └── citation_finder (.ai called by pattern_judge) +│ └── confidence_scorer (.ai) +│ +├── analysis_dimension_B_reasoner ────── │ +│ ├── different_metric_calc (skill) +│ ├── different_pattern_judge (.ai) +│ └── confidence_scorer (.ai REUSED — same reasoner) ← reuse across branches! +│ +├── analysis_dimension_C_reasoner ───────┤ +│ └── (3 sub-calls, similar shape) +│ +└── adversarial_synthesizer ─────────────┘ + ├── steel_man_alternative (.ai) ← called once per dimension + ├── disagreement_detector (.ai) + └── final_decision_reasoner (.ai) + └── safety_override (deterministic skill) +``` + +**Each layer fans out via `asyncio.gather`. Each reasoner has a single cognitive responsibility.** The orchestrator at the top is NOT the only thing that calls `app.call` — every dimension reasoner is itself a small orchestrator that calls 2–4 sub-reasoners. + +**Used in:** This is the pattern the medical-triage and loan-underwriter examples should follow when they're deep enough. Most large AgentField systems compose this pattern as the backbone, with the other 8 patterns layered on top (HUNT→PROVE between layers, streaming for partial results, etc.). + +**Why it's the master pattern:** + +1. **Reasoners as software APIs.** Each reasoner has a one-line API contract: *"Given X, return Y. Calls Z, W."* Other reasoners call it the way one microservice calls another. +2. **Composability over monolithic prompts.** A specialist reasoner like `pe_assessor` is NOT a 200-line `.ai()` prompt — it's an orchestrator that calls `wells_score_calculator`, `dyspnea_grader`, and `dvt_history_checker` and synthesizes their outputs. Each piece is testable, replaceable, reusable. +3. **Reuse across branches.** `confidence_scorer` is called from THREE different dimension reasoners. The flat-star pattern would have to copy-paste the logic three times. The composition cascade calls it once per branch — same code, three different contexts. +4. **Multi-layer parallelism.** `asyncio.gather` runs at the entry-reasoner layer (across dimensions A/B/C) AND inside each dimension reasoner (across its sub-calls). Total wall-clock time is dominated by the slowest path through the DAG, not by the sum. +5. **Observability has structure.** The control plane workflow DAG shows the actual call tree. The verifiable credential chain has hierarchy. A future debugger can ask "which sub-call inside `pe_assessor` flagged the concern" — the flat-star pattern can only tell you "pe_assessor returned X." +6. **Each reasoner is independently curl-able.** You can `POST /api/v1/execute/.wells_score_calculator` directly with synthetic input to debug or A/B test it. The flat-star only exposes the entry reasoner. + +**Decomposition rules:** + +- **30-line ceiling.** If a reasoner body is > 30 lines, it's probably 2 reasoners. Look for the seam — usually a "compute X then judge Y" boundary becomes "X is a `@app.skill`, Y is a `@app.reasoner` that calls X". +- **Single-judgment rule.** A reasoner makes ONE judgment call. If your reasoner is making three judgments ("is this concerning, is this acute, what's the risk score"), split into three reasoners. +- **Deterministic-vs-judgment split.** Anything that doesn't require LLM judgment (math, formula, regex, lookup, sort) is `@app.skill()` or a plain helper, not part of an `.ai()` reasoner body. +- **Reuse signal.** If the same logic appears in 2+ reasoners, extract it as its own reasoner and call it from both. +- **One-sentence API contract test.** Can you write a one-sentence contract for each reasoner ("Given a chief complaint string, return a list of red flag categories with confidence scores")? If not, the reasoner is doing too many things. + +**Anti-patterns that mean you fell back to a flat star:** + +- Your entry reasoner is the ONLY thing that calls `app.call` +- Your specialists each have a single fat `.ai()` call with a 500-token prompt +- Your DAG is depth 2 (`entry → specialists → done`) +- You can draw the architecture as a literal asterisk +- Two specialists have the same 50-line prompt with one line different — you should have had one parameterized sub-reasoner + +**Concrete medical-triage example:** + +A flat-star `red_flag_detector` reasoner with one big `.ai()` prompt → bad. + +A `red_flag_detector` reasoner that calls `cardiac_red_flag_checker`, `stroke_red_flag_checker`, `bleeding_red_flag_checker`, `psych_red_flag_checker` in parallel via `asyncio.gather`, each of which is itself a focused `.ai()` with its own narrow prompt and confidence flag → good. The deeper structure means a future agent can swap the cardiac checker for one with a more accurate prompt without touching anything else. + +**When you finish your design, count the depth.** If max depth from entry to leaf is < 3, redesign. A real composite-intelligence system has at least 3 layers of reasoner-calling-reasoner. + +--- + +## 9. Reactive Document Enrichment **Shape:** ``` @@ -227,16 +300,20 @@ event source (DB change stream / webhook) ──> enrichment pipeline ──> ou ## How to pick a pattern (or compose your own) -1. **What triggers the work?** Event stream → pattern 8. Direct API call → patterns 1–7. -2. **Is the input large/navigable?** Yes → harness-first, consider meta-prompting (pattern 4). -3. **Multiple independent analysis dimensions?** Yes → parallel hunters (pattern 1). -4. **False positives expensive?** Yes → add HUNT→PROVE (pattern 2) on top of pattern 1. +**Always start with pattern 8 (Reasoner Composition Cascade) as the backbone.** It's not optional. Every other pattern is layered on top. + +Then ask: + +1. **What triggers the work?** Event stream → pattern 9 (reactive enrichment). Direct API call → patterns 1–7 layered onto 8. +2. **Is the input large/navigable?** Yes → consider meta-prompting (pattern 4) inside one of your dimension reasoners. +3. **Multiple independent analysis dimensions?** Yes → parallel hunters (pattern 1) becomes the second-layer fan-out inside the cascade. +4. **False positives expensive?** Yes → add HUNT→PROVE (pattern 2) as a second-stage reasoner per dimension or one global adversarial reasoner. 5. **Downstream can start before upstream finishes?** Yes → streaming (pattern 3). 6. **Coverage matters and you can't predict shape upfront?** Pattern 6. 7. **Multi-round adaptive execution?** Pattern 5 or 7. 8. **The investigation path depends on discoveries?** Pattern 4 (meta-prompting), always. -Most strong systems compose 2–3 patterns. Example: contract-af = parallel hunters (1) + HUNT→PROVE (2) + streaming (3) + meta-prompting (4) + nested loops (5). +Most strong systems compose **pattern 8 (cascade) as the backbone + 2–3 of the others as layers**. Example: contract-af = composition cascade (8) + parallel hunters (1) at the second layer + HUNT→PROVE (2) at the third layer + streaming (3) between layers + meta-prompting (4) inside the deepest reasoners + nested loops (5). ## When NONE of these fit diff --git a/skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md b/skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md index 6db1efef0..5e7f40608 100644 --- a/skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md +++ b/skills/agentfield-multi-reasoner-builder/references/choosing-primitives.md @@ -83,7 +83,7 @@ result = await app.ai( system: str | None, # system prompt user: str | None, # user prompt (alternative to positional) schema: type[BaseModel] | None, # Pydantic class for structured output - model: str | None, # PER-CALL model override (e.g. "gpt-4o", "openrouter/anthropic/claude-3.5-sonnet") + model: str | None, # PER-CALL model override (e.g. "gpt-4o", "openrouter/google/gemini-2.5-flash") temperature: float | None, max_tokens: int | None, stream: bool | None, @@ -291,7 +291,7 @@ from reasoners.risk import router as risk_router app = Agent( node_id=os.getenv("AGENT_NODE_ID", "financial-reviewer"), - ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/anthropic/claude-3.5-sonnet")), + ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/google/gemini-2.5-flash")), dev_mode=True, ) diff --git a/skills/agentfield-multi-reasoner-builder/references/project-claude-template.md b/skills/agentfield-multi-reasoner-builder/references/project-claude-template.md index 415252390..e5187de2c 100644 --- a/skills/agentfield-multi-reasoner-builder/references/project-claude-template.md +++ b/skills/agentfield-multi-reasoner-builder/references/project-claude-template.md @@ -47,7 +47,7 @@ External callers should hit `.` first. ## Model selection -- Default model: `` via `AI_MODEL` env. +- Default model: `` via `AI_MODEL` env. - The entry reasoner accepts an OPTIONAL `model` parameter in the request body. When present, it propagates to all child reasoners via `app.call(..., model=model)`. This lets users A/B models per request without redeploying. - Provider keys: `OPENROUTER_API_KEY` (default), `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` — any LiteLLM-compatible model works. diff --git a/skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md b/skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md index 5368a7a08..dd3d4822e 100644 --- a/skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md +++ b/skills/agentfield-multi-reasoner-builder/references/scaffold-recipe.md @@ -69,7 +69,7 @@ app = Agent( node_id=os.getenv("AGENT_NODE_ID", ""), agentfield_server=os.getenv("AGENTFIELD_SERVER", "http://localhost:8080"), ai_config=AIConfig( - model=os.getenv("AI_MODEL", "openrouter/anthropic/claude-3.5-sonnet"), + model=os.getenv("AI_MODEL", "openrouter/google/gemini-2.5-flash"), ), dev_mode=True, ) @@ -274,7 +274,7 @@ services: OPENAI_API_KEY: ${OPENAI_API_KEY:-} ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-} GOOGLE_API_KEY: ${GOOGLE_API_KEY:-} - AI_MODEL: ${AI_MODEL:-openrouter/anthropic/claude-3.5-sonnet} + AI_MODEL: ${AI_MODEL:-openrouter/google/gemini-2.5-flash} PORT: ${PORT:-8001} ports: - "${AGENT_NODE_PORT:-8001}:8001" @@ -305,7 +305,7 @@ OPENROUTER_API_KEY=sk-or-v1-... # ANTHROPIC_API_KEY=sk-ant-... # Model — must match the provider above -AI_MODEL=openrouter/anthropic/claude-3.5-sonnet +AI_MODEL=openrouter/google/gemini-2.5-flash # AI_MODEL=gpt-4o # AI_MODEL=anthropic/claude-3-5-sonnet-20241022 @@ -388,7 +388,7 @@ curl -X POST http://localhost:8080/api/v1/execute/. \ "input": { "": "", "": , - "model": "openrouter/anthropic/claude-3.5-sonnet" + "model": "openrouter/google/gemini-2.5-flash" } }' | jq ``` diff --git a/skills/agentfield-multi-reasoner-builder/references/verification.md b/skills/agentfield-multi-reasoner-builder/references/verification.md index 4b0a6163d..e5b11287a 100644 --- a/skills/agentfield-multi-reasoner-builder/references/verification.md +++ b/skills/agentfield-multi-reasoner-builder/references/verification.md @@ -23,7 +23,7 @@ curl -X POST http://localhost:8080/api/v1/execute/. \ "input": { "": "", "": , - "model": "openrouter/anthropic/claude-3.5-sonnet" + "model": "openrouter/google/gemini-2.5-flash" } }' | jq ``` @@ -41,17 +41,28 @@ If any step fails, **do not hand off**. Diagnose and fix. | Execute returns 500 with "model not found" | `AI_MODEL` env var doesn't match the provider key you set | Check `.env` — `OPENROUTER_API_KEY` requires `openrouter/...` model names, etc. | | Execute returns 200 but the output is empty/garbage | The reasoner ran but the architecture is wrong (e.g., `.ai()` got truncated input) | Look at logs to see what input each reasoner actually got | +## Sync execute timeout (90s) — IMPORTANT + +`POST /api/v1/execute/` is a **synchronous** endpoint with a hard **90-second timeout** at the control plane. If the entry reasoner's full pipeline (including all child `app.call`s, all `app.ai` calls, and any retries) takes longer than 90s, the control plane returns `HTTP 400 {"error":"execution timeout after 1m30s"}`. + +**Implications for the architecture you generate:** +- **Pick fast models for the default.** `openrouter/google/gemini-2.5-flash` and `openrouter/openai/gpt-4o-mini` finish a 6–10 step parallel pipeline in 10–25 seconds. Slower models like `openrouter/anthropic/claude-3-5-sonnet-*`, `openrouter/minimax/minimax-m2.7`, or `openrouter/openai/o1` often blow the budget. +- **Parallelize aggressively at multiple depths.** A pipeline of 10 sequential `app.ai` calls at 5s each = 50s (close to the limit). The same 10 calls organized as a deep DAG with 3 parallelism waves = 15s. Use `asyncio.gather` for every fan-out, and push fan-outs DOWN into sub-reasoners (see `architecture-patterns.md` "Reasoner Composition Cascade"), not just at the entry orchestrator. +- **For workflows that genuinely need >90s** (large fan-outs, slow models, navigation-heavy harnesses): use `POST /api/v1/execute/async/` instead. It returns immediately with an `execution_id`; poll `GET /api/v1/executions/` for the result. Document this in the README so users know which endpoint to hit. + +When the user's brief implies a slow pipeline, default to `gemini-2.5-flash` and document the async endpoint as the upgrade path. + ## Useful introspection endpoints | Endpoint | What it tells you | |---|---| | `GET /api/v1/health` | Control plane up | -| `GET /api/v1/nodes` | Which agent nodes have registered | +| `GET /api/v1/nodes?health_status=any` | Which agent nodes have registered (the default filter is `active`, which can return empty even when agents are healthy — use `?health_status=any` to be safe) | | `GET /api/v1/nodes/:node_id` | Details of one node | -| `GET /api/v1/discovery/capabilities` | All reasoners and skills across all nodes | -| `GET /api/v1/agentic/discover?q=` | Search the API catalog by keyword (use to find an endpoint you forgot) | -| `POST /api/v1/execute/:target` | Sync execute a reasoner. Body is the kwargs dict | -| `POST /api/v1/execute/async/:target` | Async execute, returns an execution_id | +| `GET /api/v1/discovery/capabilities` | All reasoners and skills. **Response shape:** `{capabilities: [{agent_id, reasoners: [{id, tags, ...}]}]}` — note `agent_id` not `node_id`, and reasoners live under `.capabilities[].reasoners[]` not `.reasoners[]`. The reasoner identifier field is `id` not `name` | +| `GET /api/v1/agentic/discover?q=` | Search the API catalog by keyword | +| `POST /api/v1/execute/:target` | **Sync** execute. Body is `{"input": {...kwargs...}}`. **90-second hard timeout at the control plane.** | +| `POST /api/v1/execute/async/:target` | Async execute, returns an `execution_id` immediately. Use this when the pipeline > 90s | | `GET /api/v1/executions/:id` | Status of an async execution | | `GET /api/v1/did/workflow/:workflow_id/vc-chain` | Verifiable credential chain for an executed workflow (the AgentField superpower no other framework has) | @@ -77,19 +88,20 @@ In the README, give the user EXACTLY these commands in this order. Do not abbrev # After docker compose up, in another terminal: # 1. Health -curl -fsS http://localhost:8080/api/v1/health +curl -fsS http://localhost:8080/api/v1/health | jq '.status' -# 2. Node registered? -curl -fsS http://localhost:8080/api/v1/nodes | jq '.[].node_id' +# 2. Node registered? (use ?health_status=any — default filter can hide healthy nodes) +curl -fsS 'http://localhost:8080/api/v1/nodes?health_status=any' | jq '.nodes[] | {id: .node_id, status: .status}' -# 3. Reasoners discoverable? -curl -fsS http://localhost:8080/api/v1/discovery/capabilities | jq '.reasoners | map(select(.node_id=="")) | map(.name)' +# 3. Reasoners discoverable? (note .capabilities[].reasoners[].id, NOT .reasoners[].name) +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.capabilities[] | select(.agent_id=="") | .reasoners | map({id, tags})' # 4. THE BIG ONE — run the entry reasoner with real data # Body shape: {"input": {...kwargs...}} — kwargs are NEVER raw at the top level curl -X POST http://localhost:8080/api/v1/execute/. \ -H 'Content-Type: application/json' \ - -d '{"input": {"": "", "model": "openrouter/anthropic/claude-3.5-sonnet"}}' | jq + -d '{"input": {"": "", "model": "openrouter/google/gemini-2.5-flash"}}' | jq # 5. (Optional showpiece) the full verifiable workflow chain LAST_EXEC=$(curl -s http://localhost:8080/api/v1/executions | jq -r '.[0].workflow_id') From 54d98f8f216cf28f756de7eb2cd299273c65da10 Mon Sep 17 00:00:00 2001 From: Santosh Date: Wed, 8 Apr 2026 13:48:26 +0530 Subject: [PATCH 3/4] =?UTF-8?q?feat(skill):=20af=20skill=20install=20?= =?UTF-8?q?=E2=80=94=20embed=20+=20multi-target=20install=20architecture?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mirrors plandb's installer pattern (labs/plandb/install.sh) but lives inside the af binary so existing users can install the skill without re-running the shell bootstrapper, and new users get it automatically as part of `curl install.sh | bash`. ## What ships The af binary now embeds the agentfield-multi-reasoner-builder skill and exposes it through a new `af skill` command tree: af skill install # interactive picker af skill install --all # all detected coding agents af skill install --all-targets # all registered (even undetected) af skill install --target # one specific agent af skill install --version 0.2.0 # pin a specific embedded version af skill install --force # reinstall even if state matches af skill install --dry-run # plan without writing af skill list # show installed skills + targets af skill update # re-install at the binary version af skill uninstall [--target X] # remove from one or all targets af skill uninstall --remove-canonical # also delete ~/.agentfield/skills/ af skill print # SKILL.md to stdout af skill path # canonical store location af skill catalog # list shipped skills ## Canonical on-disk layout (mirrors ~/.cargo, ~/.npm, ~/.rustup) ~/.agentfield/ └── skills/ ├── .state.json # tracks installs across targets └── agentfield-multi-reasoner-builder/ ├── current → ./0.2.0/ # symlink (relative) └── 0.2.0/ ├── SKILL.md └── references/ ├── choosing-primitives.md ├── architecture-patterns.md ├── scaffold-recipe.md ├── verification.md ├── project-claude-template.md └── anti-patterns.md The versioned-store + current-symlink shape lets multiple versions coexist and makes `af skill update` an atomic symlink swap. All target integrations point at `current/` so updates flow through automatically. ## Target integrations (7 supported, all idempotent) | Target | Method | Path | |-------------|---------------|----------------------------------------------| | claude-code | symlink | ~/.claude/skills/ -> .agentfield/.../current | | codex | marker-block | ~/.codex/AGENTS.override.md | | gemini | marker-block | ~/.gemini/GEMINI.md | | opencode | marker-block | ~/.config/opencode/AGENTS.md | | aider | marker-block | ~/.aider.conventions.md (+ ~/.aider.conf.yml read line) | | windsurf | marker-block | ~/.codeium/windsurf/memories/global_rules.md | | cursor | manual | Settings → Rules for AI (printed instructions) | Marker-block targets append a small pointer block bracketed by: ... The block points the agent at the canonical SKILL.md path so updates to the canonical store flow through automatically — no need to re-edit every agent rules file when the skill changes. Re-installs find the existing block by name (regardless of version) and replace it cleanly. ## install.sh integration scripts/install.sh now runs `af skill install` (interactive by default) after the binary verification step. New flags: --no-skill Skip the skill install entirely --all-skills Install into every detected coding agent (no prompt) --all-skill-targets Install into every registered target Or via the SKILL_MODE env var: interactive | all | all-targets | none. This means `curl https://agentfield.ai/install.sh | bash` is now a single command that gives a new user the binary AND the skill installed across every coding agent they have on their machine. ## Source-of-truth sync The Go embed directive can only reach files inside the skillkit package, so a mirror at control-plane/internal/skillkit/skill_data// holds copies of the canonical files in skills//. scripts/sync-embedded- skills.sh keeps them in sync (call it before `go build` after editing skills/, or run `./scripts/sync-embedded-skills.sh --check` in CI to verify). New skills are added by: 1. Creating skills// 2. Adding the directory to scripts/sync-embedded-skills.sh 3. Running the sync script 4. Adding an entry to skillkit.Catalog in catalog.go 5. Adding the embed line to embed.go ## Tested end-to-end on this machine af skill install --all # → 7 targets installed in one shot af skill list # → all 7 reported with version + path + method af skill install --target X # → idempotent re-install correctly skipped af skill uninstall --remove-canonical # → fully clean removal af skill install --all # → fresh re-install, all 7 targets back Verified on disk: ~/.claude/skills/agentfield-multi-reasoner-builder is a symlink to ~/.agentfield/skills/.../current/ and the SKILL.md resolves transparently. Codex/Gemini/OpenCode/Aider/Windsurf marker blocks present and correctly bracketed. Aider's ~/.aider.conf.yml has the read: line. State file records all 7 targets with version 0.2.0 and ISO timestamps. Co-Authored-By: Claude Opus 4.6 (1M context) --- control-plane/internal/cli/root.go | 3 + control-plane/internal/cli/skill.go | 461 ++++++++++++++++ control-plane/internal/skillkit/catalog.go | 93 ++++ control-plane/internal/skillkit/embed.go | 36 ++ control-plane/internal/skillkit/install.go | 348 ++++++++++++ .../internal/skillkit/markerblock.go | 136 +++++ .../SKILL.md | 393 ++++++++++++++ .../references/anti-patterns.md | 143 +++++ .../references/architecture-patterns.md | 320 +++++++++++ .../references/choosing-primitives.md | 502 ++++++++++++++++++ .../references/project-claude-template.md | 118 ++++ .../references/scaffold-recipe.md | 502 ++++++++++++++++++ .../references/verification.md | 120 +++++ control-plane/internal/skillkit/state.go | 125 +++++ .../internal/skillkit/target_aider.go | 101 ++++ .../internal/skillkit/target_claude_code.go | 118 ++++ .../internal/skillkit/target_codex.go | 67 +++ .../internal/skillkit/target_cursor.go | 72 +++ .../internal/skillkit/target_gemini.go | 66 +++ .../internal/skillkit/target_opencode.go | 66 +++ .../internal/skillkit/target_windsurf.go | 67 +++ control-plane/internal/skillkit/targets.go | 154 ++++++ scripts/install.sh | 84 ++- scripts/sync-embedded-skills.sh | 80 +++ 24 files changed, 4168 insertions(+), 7 deletions(-) create mode 100644 control-plane/internal/cli/skill.go create mode 100644 control-plane/internal/skillkit/catalog.go create mode 100644 control-plane/internal/skillkit/embed.go create mode 100644 control-plane/internal/skillkit/install.go create mode 100644 control-plane/internal/skillkit/markerblock.go create mode 100644 control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/SKILL.md create mode 100644 control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/anti-patterns.md create mode 100644 control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/architecture-patterns.md create mode 100644 control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/choosing-primitives.md create mode 100644 control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/project-claude-template.md create mode 100644 control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/scaffold-recipe.md create mode 100644 control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/verification.md create mode 100644 control-plane/internal/skillkit/state.go create mode 100644 control-plane/internal/skillkit/target_aider.go create mode 100644 control-plane/internal/skillkit/target_claude_code.go create mode 100644 control-plane/internal/skillkit/target_codex.go create mode 100644 control-plane/internal/skillkit/target_cursor.go create mode 100644 control-plane/internal/skillkit/target_gemini.go create mode 100644 control-plane/internal/skillkit/target_opencode.go create mode 100644 control-plane/internal/skillkit/target_windsurf.go create mode 100644 control-plane/internal/skillkit/targets.go create mode 100755 scripts/sync-embedded-skills.sh diff --git a/control-plane/internal/cli/root.go b/control-plane/internal/cli/root.go index a3fe15f19..3e1a25b27 100644 --- a/control-plane/internal/cli/root.go +++ b/control-plane/internal/cli/root.go @@ -93,6 +93,9 @@ AI Agent? Run "af agent help" for structured JSON output optimized for programma // Add doctor command — environment introspection for skills/coding agents RootCmd.AddCommand(NewDoctorCommand()) + // Add skill command — install/manage AgentField skills across coding agents + RootCmd.AddCommand(NewSkillCommand()) + // Create service container for framework commands cfg := &config.Config{} // Use default config for now services := application.CreateServiceContainer(cfg, getAgentFieldHomeDir()) diff --git a/control-plane/internal/cli/skill.go b/control-plane/internal/cli/skill.go new file mode 100644 index 000000000..d74db1473 --- /dev/null +++ b/control-plane/internal/cli/skill.go @@ -0,0 +1,461 @@ +package cli + +import ( + "fmt" + "os" + "sort" + "strings" + + "github.com/fatih/color" + "github.com/spf13/cobra" + + "github.com/Agent-Field/agentfield/control-plane/internal/skillkit" +) + +// NewSkillCommand builds the `af skill` command tree. The skill subsystem +// embeds skill content into the af binary, installs it into multiple +// coding-agent integrations (Claude Code, Codex, Gemini, OpenCode, Aider, +// Windsurf, Cursor), and tracks state in ~/.agentfield/skills/.state.json. +// +// Mirrors the plandb installer pattern but lives inside the binary so that +// existing af users can run `af skill install` directly without re-running +// the install.sh shell bootstrapper. +func NewSkillCommand() *cobra.Command { + cmd := &cobra.Command{ + Use: "skill", + Short: "Install and manage AgentField skills across coding agents", + Long: `Manage AgentField skills bundled with the af binary. + +A skill is a self-contained instruction packet (a SKILL.md file plus +reference markdown) that teaches a coding agent (Claude Code, Codex, +Gemini, etc.) how to use AgentField properly. The af binary ships with +the agentfield-multi-reasoner-builder skill embedded — install it once +into every agent you use and they will know how to architect, scaffold, +and ship multi-reasoner systems on AgentField. + +Examples: + af skill install # Interactive picker (default) + af skill install --all # All detected agents, no prompt + af skill install --all-targets # Every registered agent, even undetected + af skill install --target claude-code # Just one agent + af skill list # Show what is installed where + af skill update # Re-install at the binary's embedded version + af skill uninstall # Remove from all targets + af skill print # Print SKILL.md to stdout (pipe to clipboard) + af skill path # Print canonical store location`, + } + + cmd.AddCommand(newSkillInstallCommand()) + cmd.AddCommand(newSkillListCommand()) + cmd.AddCommand(newSkillUpdateCommand()) + cmd.AddCommand(newSkillUninstallCommand()) + cmd.AddCommand(newSkillPrintCommand()) + cmd.AddCommand(newSkillPathCommand()) + cmd.AddCommand(newSkillCatalogCommand()) + + return cmd +} + +// ── install ────────────────────────────────────────────────────────────── + +func newSkillInstallCommand() *cobra.Command { + var ( + skillName string + version string + targets []string + allDetected bool + allTargets bool + force bool + dryRun bool + nonInteractive bool + ) + + cmd := &cobra.Command{ + Use: "install [skill-name]", + Short: "Install a skill into one or more coding-agent integrations", + Args: cobra.MaximumNArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + if len(args) == 1 { + skillName = args[0] + } + + // If no targets explicitly chosen and not in --all/--all-targets mode, + // run the interactive picker. + if len(targets) == 0 && !allDetected && !allTargets && !nonInteractive { + picked, err := runInteractivePicker() + if err != nil { + return err + } + if len(picked) == 0 { + printInfo("Skill install skipped — no targets selected") + return nil + } + targets = picked + } + + report, err := skillkit.Install(skillkit.InstallOptions{ + SkillName: skillName, + Version: version, + Targets: targets, + AllDetected: allDetected, + AllRegistered: allTargets, + Force: force, + DryRun: dryRun, + }) + if err != nil { + return err + } + printInstallReport(report, dryRun) + return nil + }, + } + + cmd.Flags().StringVar(&skillName, "skill", "", "Skill name to install (defaults to the first/only skill in the catalog)") + cmd.Flags().StringVar(&version, "version", "", "Specific skill version to install (defaults to the version embedded in the binary)") + cmd.Flags().StringSliceVar(&targets, "target", nil, "Specific target(s) to install into (claude-code, codex, gemini, opencode, aider, windsurf, cursor). Repeatable.") + cmd.Flags().BoolVar(&allDetected, "all", false, "Install into every detected target without prompting") + cmd.Flags().BoolVar(&allTargets, "all-targets", false, "Install into every registered target even if not detected on this machine") + cmd.Flags().BoolVar(&force, "force", false, "Reinstall even if the same version is already present in state") + cmd.Flags().BoolVar(&dryRun, "dry-run", false, "Print the planned operations without writing") + cmd.Flags().BoolVar(&nonInteractive, "non-interactive", false, "Skip the interactive picker; default to detected targets") + + return cmd +} + +// ── list ───────────────────────────────────────────────────────────────── + +func newSkillListCommand() *cobra.Command { + return &cobra.Command{ + Use: "list", + Short: "List installed skills and their target integrations", + RunE: func(cmd *cobra.Command, args []string) error { + state, err := skillkit.ListInstalled() + if err != nil { + return err + } + printSkillList(state) + return nil + }, + } +} + +// ── update ─────────────────────────────────────────────────────────────── + +func newSkillUpdateCommand() *cobra.Command { + var skillName string + cmd := &cobra.Command{ + Use: "update [skill-name]", + Short: "Re-install a skill at the binary's embedded version into every target it is currently installed at", + Args: cobra.MaximumNArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + if len(args) == 1 { + skillName = args[0] + } + report, err := skillkit.Update(skillName) + if err != nil { + return err + } + printInstallReport(report, false) + return nil + }, + } + cmd.Flags().StringVar(&skillName, "skill", "", "Skill name to update") + return cmd +} + +// ── uninstall ──────────────────────────────────────────────────────────── + +func newSkillUninstallCommand() *cobra.Command { + var ( + skillName string + targets []string + removeCanonical bool + ) + cmd := &cobra.Command{ + Use: "uninstall [skill-name]", + Short: "Remove a skill from one or more targets", + Args: cobra.MaximumNArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + if len(args) == 1 { + skillName = args[0] + } + err := skillkit.Uninstall(skillkit.UninstallOptions{ + SkillName: skillName, + Targets: targets, + RemoveCanonical: removeCanonical, + }) + if err != nil { + return err + } + printSuccess("Skill uninstalled") + return nil + }, + } + cmd.Flags().StringVar(&skillName, "skill", "", "Skill name to uninstall") + cmd.Flags().StringSliceVar(&targets, "target", nil, "Specific target(s) to uninstall from (default: all installed targets)") + cmd.Flags().BoolVar(&removeCanonical, "remove-canonical", false, "Also delete the canonical ~/.agentfield/skills// directory") + return cmd +} + +// ── print ──────────────────────────────────────────────────────────────── + +func newSkillPrintCommand() *cobra.Command { + var skillName string + cmd := &cobra.Command{ + Use: "print [skill-name]", + Short: "Print SKILL.md for the named skill to stdout", + Args: cobra.MaximumNArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + if len(args) == 1 { + skillName = args[0] + } + if skillName == "" { + skillName = skillkit.Catalog[0].Name + } + skill, err := skillkit.CatalogByName(skillName) + if err != nil { + return err + } + content, err := skill.EntryContent() + if err != nil { + return err + } + _, err = os.Stdout.Write(content) + return err + }, + } + cmd.Flags().StringVar(&skillName, "skill", "", "Skill name (defaults to the first in the catalog)") + return cmd +} + +// ── path ───────────────────────────────────────────────────────────────── + +func newSkillPathCommand() *cobra.Command { + return &cobra.Command{ + Use: "path", + Short: "Print the canonical skill store location (~/.agentfield/skills)", + RunE: func(cmd *cobra.Command, args []string) error { + root, err := skillkit.CanonicalRoot() + if err != nil { + return err + } + fmt.Println(root) + return nil + }, + } +} + +// ── catalog ────────────────────────────────────────────────────────────── + +func newSkillCatalogCommand() *cobra.Command { + return &cobra.Command{ + Use: "catalog", + Short: "List skills bundled with this af binary", + RunE: func(cmd *cobra.Command, args []string) error { + bold := color.New(color.Bold) + bold.Println("Skills shipped with this af binary") + fmt.Println() + for _, s := range skillkit.Catalog { + bold.Printf(" %s ", s.Name) + color.New(color.FgCyan).Printf("v%s\n", s.Version) + fmt.Printf(" %s\n\n", s.Description) + } + fmt.Println("Install with:") + fmt.Println(" af skill install # interactive picker") + fmt.Println(" af skill install --all # all detected agents") + fmt.Println(" af skill install --target codex # one specific agent") + return nil + }, + } +} + +// ── interactive picker ─────────────────────────────────────────────────── + +func runInteractivePicker() ([]string, error) { + bold := color.New(color.Bold) + cyan := color.New(color.FgCyan) + dim := color.New(color.Faint) + green := color.New(color.FgGreen) + + bold.Println("\nInstall agentfield-multi-reasoner-builder skill") + fmt.Println() + fmt.Println(" This skill teaches any coding agent how to architect and ship") + fmt.Println(" multi-reasoner systems on AgentField. It uses composite-") + fmt.Println(" intelligence patterns: parallel reasoner hunters, dynamic") + fmt.Println(" routing, deep DAG composition, safe-default fallbacks, and") + fmt.Println(" the canonical scaffold-to-curl workflow.") + fmt.Println() + bold.Println(" Targets") + fmt.Println() + + targets := skillkit.AllTargets() + for i, t := range targets { + marker := dim.Sprint("○ ") + suffix := "" + if t.Detected() { + marker = green.Sprint("● ") + suffix = dim.Sprint(" (detected)") + } + fmt.Printf(" %s%d. %s%s\n", marker, i+1, t.DisplayName(), suffix) + } + + fmt.Println() + bold.Println(" Options") + fmt.Println() + fmt.Printf(" %s Install into all detected targets\n", cyan.Sprint("a")) + fmt.Printf(" %s Install into ALL targets (even undetected)\n", cyan.Sprint("A")) + fmt.Printf(" %s Skip skill install\n", cyan.Sprint("n")) + fmt.Printf(" %s Toggle individual targets (comma-separated)\n", cyan.Sprint("1-7")) + fmt.Println() + fmt.Printf(" Choice [%s]: ", cyan.Sprint("a")) + + var choice string + if _, err := fmt.Scanln(&choice); err != nil { + // blank input → default + choice = "a" + } + choice = strings.TrimSpace(choice) + if choice == "" { + choice = "a" + } + + switch choice { + case "a": + var picked []string + for _, t := range targets { + if t.Detected() { + picked = append(picked, t.Name()) + } + } + return picked, nil + case "A": + var picked []string + for _, t := range targets { + picked = append(picked, t.Name()) + } + return picked, nil + case "n", "N": + return nil, nil + default: + var picked []string + for _, num := range strings.Split(choice, ",") { + num = strings.TrimSpace(num) + var idx int + if _, err := fmt.Sscanf(num, "%d", &idx); err == nil { + if idx >= 1 && idx <= len(targets) { + picked = append(picked, targets[idx-1].Name()) + } + } + } + return picked, nil + } +} + +// ── output rendering ───────────────────────────────────────────────────── + +func printInstallReport(report *skillkit.InstallReport, dryRun bool) { + bold := color.New(color.Bold) + green := color.New(color.FgGreen) + yellow := color.New(color.FgYellow) + red := color.New(color.FgRed) + cyan := color.New(color.FgCyan) + + fmt.Println() + if dryRun { + bold.Println("Skill install — DRY RUN") + } else { + bold.Println("Skill install") + } + fmt.Println() + + cyan.Printf(" Skill: %s v%s\n", report.Skill.Name, report.Skill.Version) + cyan.Printf(" Canonical: %s\n", report.CanonicalDir) + cyan.Printf(" Current link: %s\n", report.CurrentLink) + fmt.Println() + + if len(report.TargetsInstalled) > 0 { + bold.Println(" Installed") + for _, t := range report.TargetsInstalled { + green.Printf(" ✓ %-12s ", t.TargetName) + fmt.Printf("(%s) %s\n", t.Method, t.Path) + } + fmt.Println() + } + + if len(report.TargetsSkipped) > 0 { + bold.Println(" Skipped") + for _, s := range report.TargetsSkipped { + yellow.Printf(" ○ %-12s ", s.TargetName) + fmt.Printf("(%s)\n", s.Reason) + } + fmt.Println() + } + + if len(report.TargetsFailed) > 0 { + bold.Println(" Failed") + for _, e := range report.TargetsFailed { + red.Printf(" ✗ %-12s ", e.TargetName) + fmt.Printf("%s\n", e.Err) + } + fmt.Println() + } + + bold.Println(" Verify") + fmt.Println(" af skill list") + fmt.Println() + bold.Println(" Use") + fmt.Println(" Open Claude Code / Codex / etc. and ask:") + fmt.Println(` "Build me a multi-reasoner agent on AgentField that..."`) + fmt.Println(" The skill will fire automatically.") + fmt.Println() +} + +func printSkillList(state *skillkit.State) { + bold := color.New(color.Bold) + dim := color.New(color.Faint) + green := color.New(color.FgGreen) + cyan := color.New(color.FgCyan) + yellow := color.New(color.FgYellow) + + fmt.Println() + bold.Println("Installed skills") + fmt.Println() + + if len(state.Skills) == 0 { + dim.Println(" No skills installed yet.") + fmt.Println() + fmt.Println(" Install with:") + fmt.Println(" af skill install # interactive picker") + fmt.Println(" af skill install --all # all detected agents") + fmt.Println() + return + } + + skillNames := make([]string, 0, len(state.Skills)) + for name := range state.Skills { + skillNames = append(skillNames, name) + } + sort.Strings(skillNames) + + for _, name := range skillNames { + s := state.Skills[name] + bold.Printf(" %s ", name) + cyan.Printf("v%s\n", s.CurrentVersion) + dim.Printf(" Installed: %s\n", s.InstalledAt.Format("2006-01-02 15:04:05 MST")) + dim.Printf(" Versions available locally: %s\n", strings.Join(s.AvailableVersions, ", ")) + + if len(s.Targets) == 0 { + yellow.Println(" (no active target integrations)") + fmt.Println() + continue + } + + fmt.Println(" Targets:") + for _, tname := range s.SortedTargetNames() { + t := s.Targets[tname] + green.Printf(" ✓ %-12s ", tname) + fmt.Printf("v%s %s %s\n", t.Version, t.Method, t.Path) + } + fmt.Println() + } +} diff --git a/control-plane/internal/skillkit/catalog.go b/control-plane/internal/skillkit/catalog.go new file mode 100644 index 000000000..84837f224 --- /dev/null +++ b/control-plane/internal/skillkit/catalog.go @@ -0,0 +1,93 @@ +package skillkit + +import ( + "fmt" + "io/fs" + "path" + "strings" +) + +// Skill describes a skill that ships with the af binary. The catalog below +// is the only place new skills get registered. Bump Version on every change +// so `af skill update` knows there's a new build. +type Skill struct { + Name string // canonical skill name (kebab-case, used as directory name) + Version string // semver-ish version string baked into the binary + Description string // one-line description for `af skill list` + EmbedRoot string // root path inside SkillData where this skill's files live + EntryFile string // relative path to the skill's main file (usually SKILL.md) +} + +// Catalog is the registry of every skill the binary ships. Add a new entry +// here when adding a new skill, and drop the source files into +// skill_data// so the embed picks them up. +var Catalog = []Skill{ + { + Name: "agentfield-multi-reasoner-builder", + Version: "0.2.0", + Description: "Architect and ship complete multi-agent backends on AgentField — composite intelligence patterns, deep DAG composition, scaffold-to-curl in one workflow.", + EmbedRoot: "skill_data/agentfield-multi-reasoner-builder", + EntryFile: "SKILL.md", + }, +} + +// CatalogByName returns the skill with the given name, or an error if it +// is not in the registry. +func CatalogByName(name string) (Skill, error) { + for _, s := range Catalog { + if s.Name == name { + return s, nil + } + } + available := make([]string, len(Catalog)) + for i, s := range Catalog { + available[i] = s.Name + } + return Skill{}, fmt.Errorf("skill %q not found in the af binary catalog (available: %s)", name, strings.Join(available, ", ")) +} + +// EnumerateFiles walks the embedded skill data and returns every file path +// relative to the skill's EmbedRoot, paired with its raw bytes. Used by the +// installer to write the canonical on-disk copy. +func (s Skill) EnumerateFiles() (map[string][]byte, error) { + files := make(map[string][]byte) + err := fs.WalkDir(SkillData, s.EmbedRoot, func(p string, d fs.DirEntry, walkErr error) error { + if walkErr != nil { + return walkErr + } + if d.IsDir() { + return nil + } + rel, err := relativeUnderEmbed(s.EmbedRoot, p) + if err != nil { + return err + } + data, err := fs.ReadFile(SkillData, p) + if err != nil { + return fmt.Errorf("read embedded %s: %w", p, err) + } + files[rel] = data + return nil + }) + if err != nil { + return nil, fmt.Errorf("enumerate embedded skill %q: %w", s.Name, err) + } + if len(files) == 0 { + return nil, fmt.Errorf("embedded skill %q is empty — did the embed directive in embed.go include this skill's files?", s.Name) + } + return files, nil +} + +// EntryContent returns the raw bytes of the skill's entry file (SKILL.md). +// Used by `af skill install --print` and by Cursor's clipboard fallback. +func (s Skill) EntryContent() ([]byte, error) { + return fs.ReadFile(SkillData, path.Join(s.EmbedRoot, s.EntryFile)) +} + +func relativeUnderEmbed(root, p string) (string, error) { + rootSlash := strings.TrimSuffix(root, "/") + "/" + if !strings.HasPrefix(p, rootSlash) { + return "", fmt.Errorf("path %q is not under embed root %q", p, root) + } + return strings.TrimPrefix(p, rootSlash), nil +} diff --git a/control-plane/internal/skillkit/embed.go b/control-plane/internal/skillkit/embed.go new file mode 100644 index 000000000..f79c0e03a --- /dev/null +++ b/control-plane/internal/skillkit/embed.go @@ -0,0 +1,36 @@ +// Package skillkit owns the skill catalog: it embeds skill content into the +// af binary, installs / uninstalls / lists skills against multiple coding-agent +// targets (Claude Code, Codex, Gemini, OpenCode, Aider, Windsurf, Cursor), +// and tracks state in ~/.agentfield/skills/.state.json. +// +// The canonical on-disk layout (after `af skill install`) is: +// +// ~/.agentfield/skills/ +// ├── .state.json # tracking +// └── / +// ├── current → .// # symlink +// └── / # versioned store +// ├── SKILL.md +// └── references/ +// └── ... +// +// Each target then either: +// - symlinks into ~/./skills/ (Claude Code style), OR +// - appends a marker block to the agent's global rules file pointing at the +// canonical SKILL.md path so updates flow through automatically. +// +// New skills are added by dropping a directory into skill_data/ and registering +// it in catalog.go. The skill content is embedded at build time via go:embed +// (the source-of-truth lives in repo-root skills// — keep them in sync +// via scripts/sync-embedded-skills.sh). +package skillkit + +import "embed" + +// SkillData is the embedded filesystem containing the source-of-truth content +// for every shipped skill. Files live under skill_data// and are +// copied from the repo-root skills/ directory at build time. +// +//go:embed skill_data/agentfield-multi-reasoner-builder/SKILL.md +//go:embed skill_data/agentfield-multi-reasoner-builder/references/*.md +var SkillData embed.FS diff --git a/control-plane/internal/skillkit/install.go b/control-plane/internal/skillkit/install.go new file mode 100644 index 000000000..781df824a --- /dev/null +++ b/control-plane/internal/skillkit/install.go @@ -0,0 +1,348 @@ +package skillkit + +import ( + "fmt" + "os" + "path/filepath" + "sort" + "time" +) + +// InstallOptions controls how a skill is installed across targets. +type InstallOptions struct { + SkillName string // canonical skill name; empty = first in catalog + Version string // explicit version; empty = current binary's embedded version + Targets []string // explicit target list; empty = use AllDetected/AllRegistered/Selection + AllDetected bool // install into every target Detected() reports true + AllRegistered bool // install into every registered target (even undetected) + Force bool // re-install even if state shows the same version is already present + DryRun bool // print what would happen, don't write +} + +// InstallReport summarizes one install operation. The CLI uses this to print +// a clean handoff message. +type InstallReport struct { + Skill Skill + CanonicalDir string // ~/.agentfield/skills// + CurrentLink string // ~/.agentfield/skills//current + WroteCanonical bool + TargetsInstalled []InstalledTarget + TargetsSkipped []SkipReason + TargetsFailed []TargetError +} + +type SkipReason struct { + TargetName string + Reason string +} + +type TargetError struct { + TargetName string + Err error +} + +// Install runs an install pass according to opts. It performs the canonical +// write first, switches the `current` symlink, then installs into each +// selected target. Idempotent and safe to re-run. +func Install(opts InstallOptions) (*InstallReport, error) { + skill, err := resolveSkill(opts.SkillName, opts.Version) + if err != nil { + return nil, err + } + + root, err := CanonicalRoot() + if err != nil { + return nil, err + } + + report := &InstallReport{ + Skill: skill, + CanonicalDir: filepath.Join(root, skill.Name, skill.Version), + CurrentLink: filepath.Join(root, skill.Name, "current"), + } + + // 1. Write canonical store (versioned dir + current symlink) + if !opts.DryRun { + if err := writeCanonical(skill, report.CanonicalDir); err != nil { + return nil, fmt.Errorf("write canonical store: %w", err) + } + if err := updateCurrentLink(report.CurrentLink, report.CanonicalDir); err != nil { + return nil, fmt.Errorf("update current symlink: %w", err) + } + report.WroteCanonical = true + } + + // 2. Resolve target selection + selected, skipped, err := resolveTargets(opts) + if err != nil { + return nil, err + } + report.TargetsSkipped = append(report.TargetsSkipped, skipped...) + + // 3. Install into each selected target + state, err := LoadState() + if err != nil { + return nil, err + } + skillState, ok := state.Skills[skill.Name] + if !ok { + skillState = InstalledSkill{ + CurrentVersion: skill.Version, + InstalledAt: time.Now().UTC(), + AvailableVersions: []string{skill.Version}, + Targets: map[string]InstalledTarget{}, + } + } else { + // Track new version if not seen before + seen := false + for _, v := range skillState.AvailableVersions { + if v == skill.Version { + seen = true + break + } + } + if !seen { + skillState.AvailableVersions = append(skillState.AvailableVersions, skill.Version) + sort.Strings(skillState.AvailableVersions) + } + skillState.CurrentVersion = skill.Version + if skillState.Targets == nil { + skillState.Targets = map[string]InstalledTarget{} + } + } + + for _, t := range selected { + // Skip if already at this version and not forced. + if !opts.Force { + if existing, ok := skillState.Targets[t.Name()]; ok && existing.Version == skill.Version { + report.TargetsSkipped = append(report.TargetsSkipped, SkipReason{ + TargetName: t.Name(), + Reason: fmt.Sprintf("already installed at v%s (use --force to reinstall)", existing.Version), + }) + continue + } + } + + if opts.DryRun { + report.TargetsInstalled = append(report.TargetsInstalled, InstalledTarget{ + TargetName: t.Name(), + Method: t.Method(), + Version: skill.Version, + }) + continue + } + + inst, err := t.Install(skill, report.CurrentLink) + if err != nil { + report.TargetsFailed = append(report.TargetsFailed, TargetError{TargetName: t.Name(), Err: err}) + continue + } + inst.TargetName = t.Name() + skillState.Targets[t.Name()] = inst + report.TargetsInstalled = append(report.TargetsInstalled, inst) + } + + if !opts.DryRun { + state.Skills[skill.Name] = skillState + if err := SaveState(state); err != nil { + return nil, fmt.Errorf("save state: %w", err) + } + } + + return report, nil +} + +// Uninstall removes a skill from the named targets (or all if empty), and +// optionally drops the canonical store entirely (if RemoveCanonical=true). +type UninstallOptions struct { + SkillName string + Targets []string + RemoveCanonical bool +} + +func Uninstall(opts UninstallOptions) error { + skill, err := resolveSkill(opts.SkillName, "") + if err != nil { + return err + } + + state, err := LoadState() + if err != nil { + return err + } + skillState, ok := state.Skills[skill.Name] + if !ok { + return fmt.Errorf("skill %q is not installed", skill.Name) + } + + targetNames := opts.Targets + if len(targetNames) == 0 { + // uninstall from all currently-installed targets + for name := range skillState.Targets { + targetNames = append(targetNames, name) + } + } + sort.Strings(targetNames) + + for _, name := range targetNames { + t, err := TargetByName(name) + if err != nil { + return err + } + if err := t.Uninstall(); err != nil { + return fmt.Errorf("uninstall from %s: %w", name, err) + } + delete(skillState.Targets, name) + } + + if len(skillState.Targets) == 0 || opts.RemoveCanonical { + delete(state.Skills, skill.Name) + if opts.RemoveCanonical { + root, err := CanonicalRoot() + if err == nil { + _ = os.RemoveAll(filepath.Join(root, skill.Name)) + } + } + } else { + state.Skills[skill.Name] = skillState + } + + return SaveState(state) +} + +// Update is a convenience wrapper that re-installs the skill into every +// target it's currently installed at, using the binary's embedded version. +func Update(skillName string) (*InstallReport, error) { + state, err := LoadState() + if err != nil { + return nil, err + } + skill, err := resolveSkill(skillName, "") + if err != nil { + return nil, err + } + skillState, ok := state.Skills[skill.Name] + if !ok { + return nil, fmt.Errorf("skill %q is not installed (run `af skill install` first)", skill.Name) + } + var targets []string + for name := range skillState.Targets { + targets = append(targets, name) + } + sort.Strings(targets) + return Install(InstallOptions{ + SkillName: skill.Name, + Targets: targets, + Force: true, + }) +} + +// ListInstalled returns the on-disk state for `af skill list`. +func ListInstalled() (*State, error) { + return LoadState() +} + +// ── Internals ──────────────────────────────────────────────────────────── + +func resolveSkill(name, version string) (Skill, error) { + if name == "" { + if len(Catalog) == 0 { + return Skill{}, fmt.Errorf("no skills registered in this binary") + } + name = Catalog[0].Name + } + skill, err := CatalogByName(name) + if err != nil { + return Skill{}, err + } + if version != "" && version != skill.Version { + return Skill{}, fmt.Errorf("skill %q version %q is not embedded in this binary (binary ships v%s); upgrade the binary or build with the desired version", name, version, skill.Version) + } + return skill, nil +} + +func resolveTargets(opts InstallOptions) (selected []Target, skipped []SkipReason, err error) { + if len(opts.Targets) > 0 { + for _, name := range opts.Targets { + t, err := TargetByName(name) + if err != nil { + return nil, nil, err + } + selected = append(selected, t) + } + return selected, nil, nil + } + if opts.AllRegistered { + return AllTargets(), nil, nil + } + if opts.AllDetected { + for _, t := range AllTargets() { + if t.Detected() { + selected = append(selected, t) + } else { + skipped = append(skipped, SkipReason{ + TargetName: t.Name(), + Reason: "not detected on this machine (use --all-targets to force)", + }) + } + } + return selected, skipped, nil + } + // Default: detected only + for _, t := range AllTargets() { + if t.Detected() { + selected = append(selected, t) + } else { + skipped = append(skipped, SkipReason{ + TargetName: t.Name(), + Reason: "not detected", + }) + } + } + return selected, skipped, nil +} + +func writeCanonical(skill Skill, dir string) error { + if err := os.MkdirAll(dir, 0o755); err != nil { + return err + } + files, err := skill.EnumerateFiles() + if err != nil { + return err + } + for rel, data := range files { + dest := filepath.Join(dir, rel) + if err := os.MkdirAll(filepath.Dir(dest), 0o755); err != nil { + return fmt.Errorf("mkdir %s: %w", filepath.Dir(dest), err) + } + if err := os.WriteFile(dest, data, 0o644); err != nil { + return fmt.Errorf("write %s: %w", dest, err) + } + } + return nil +} + +func updateCurrentLink(linkPath, targetPath string) error { + // Remove any existing symlink, file, or directory at the link path + if info, err := os.Lstat(linkPath); err == nil { + if info.Mode()&os.ModeSymlink != 0 { + if err := os.Remove(linkPath); err != nil { + return err + } + } else if info.IsDir() { + if err := os.RemoveAll(linkPath); err != nil { + return err + } + } else { + if err := os.Remove(linkPath); err != nil { + return err + } + } + } + // Use a relative symlink so the canonical store is portable across home moves + rel, err := filepath.Rel(filepath.Dir(linkPath), targetPath) + if err != nil { + rel = targetPath + } + return os.Symlink(rel, linkPath) +} diff --git a/control-plane/internal/skillkit/markerblock.go b/control-plane/internal/skillkit/markerblock.go new file mode 100644 index 000000000..1585394bf --- /dev/null +++ b/control-plane/internal/skillkit/markerblock.go @@ -0,0 +1,136 @@ +package skillkit + +import ( + "fmt" + "os" + "path/filepath" + "strings" + "time" +) + +// installMarkerBlock is the shared install logic used by every file-append +// target (Codex, Gemini, OpenCode, Aider, Windsurf). It: +// +// 1. Ensures the parent directory exists +// 2. Reads the existing file (or starts empty) +// 3. Strips any prior block belonging to THIS skill (regardless of version) +// 4. Appends the freshly rendered pointer block (with the current version) +// 5. Writes the file back atomically +// +// The marker pattern is `` ... +// `` so re-installs replace cleanly and +// other tools (plandb, etc.) can append their own blocks without collision. +func installMarkerBlock(skill Skill, canonicalCurrentDir, targetPath string) (InstalledTarget, error) { + if err := os.MkdirAll(filepath.Dir(targetPath), 0o755); err != nil { + return InstalledTarget{}, fmt.Errorf("create parent dir for %s: %w", targetPath, err) + } + + existing := "" + if data, err := os.ReadFile(targetPath); err == nil { + existing = string(data) + } else if !os.IsNotExist(err) { + return InstalledTarget{}, fmt.Errorf("read %s: %w", targetPath, err) + } + + cleaned := stripMarkerBlock(existing, skill) + cleaned = strings.TrimRight(cleaned, "\n") + + block := renderPointerBlock(skill, canonicalCurrentDir) + var sb strings.Builder + if cleaned != "" { + sb.WriteString(cleaned) + sb.WriteString("\n\n") + } + sb.WriteString(block) + sb.WriteString("\n") + + tmp := targetPath + ".af-tmp" + if err := os.WriteFile(tmp, []byte(sb.String()), 0o644); err != nil { + return InstalledTarget{}, fmt.Errorf("write %s: %w", tmp, err) + } + if err := os.Rename(tmp, targetPath); err != nil { + return InstalledTarget{}, fmt.Errorf("rename into %s: %w", targetPath, err) + } + + return InstalledTarget{ + Method: "marker-block", + Path: targetPath, + Version: skill.Version, + InstalledAt: time.Now().UTC(), + }, nil +} + +// uninstallMarkerBlock strips a skill's marker block from a target file. If +// the file is empty after the strip, it is removed. +func uninstallMarkerBlock(skill Skill, targetPath string) error { + data, err := os.ReadFile(targetPath) + if os.IsNotExist(err) { + return nil + } + if err != nil { + return fmt.Errorf("read %s: %w", targetPath, err) + } + cleaned := strings.TrimRight(stripMarkerBlock(string(data), skill), "\n") + if cleaned == "" { + // Don't leave an empty file lying around if it was created solely for our block. + _ = os.Remove(targetPath) + return nil + } + tmp := targetPath + ".af-tmp" + if err := os.WriteFile(tmp, []byte(cleaned+"\n"), 0o644); err != nil { + return fmt.Errorf("write %s: %w", tmp, err) + } + return os.Rename(tmp, targetPath) +} + +// stripMarkerBlock removes any agentfield-skill block for the named skill +// from the input string. Tolerates multiple occurrences (defensive). +func stripMarkerBlock(input string, skill Skill) string { + startNeedle := markerStartPattern(skill) + endNeedle := markerEnd(skill) + + out := input + for { + startIdx := strings.Index(out, startNeedle) + if startIdx < 0 { + return out + } + endIdx := strings.Index(out[startIdx:], endNeedle) + if endIdx < 0 { + // Malformed: opening marker but no close. Drop everything from the + // opening marker to end-of-file to avoid leaving half a block. + return strings.TrimRight(out[:startIdx], "\n") + } + endIdx += startIdx + len(endNeedle) + // Trim a single trailing newline after the end marker for cleanliness. + if endIdx < len(out) && out[endIdx] == '\n' { + endIdx++ + } + // Trim trailing whitespace before the start marker too. + before := strings.TrimRight(out[:startIdx], " \t\n") + out = before + "\n" + out[endIdx:] + } +} + +// readMarkerVersion scans a target file and returns the version of the skill +// currently installed there, or empty if not present. Used by Status(). +func readMarkerVersion(skill Skill, targetPath string) string { + data, err := os.ReadFile(targetPath) + if err != nil { + return "" + } + content := string(data) + pattern := markerStartPattern(skill) // "" — extract up to space. + end := strings.IndexAny(rest, " ") + if end < 0 { + return "" + } + v := strings.TrimPrefix(rest[:end], "v") + return v +} diff --git a/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/SKILL.md b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/SKILL.md new file mode 100644 index 000000000..c2fa3a97d --- /dev/null +++ b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/SKILL.md @@ -0,0 +1,393 @@ +--- +name: agentfield-multi-reasoner-builder +description: Architect and ship a complete multi-agent backend system on AgentField from a one-line user request. Use when the user asks to build, scaffold, design, or ship an agent system, multi-agent pipeline, reasoner network, AgentField project, financial reviewer, research agent, compliance agent, or any LLM composition that should outperform LangChain/CrewAI/AutoGen — especially when they want a runnable Docker-compose stack and a working curl smoke test. +--- + +# AgentField Multi-Reasoner Builder + +You are not a prompt engineer. You are a **systems architect** building composite reasoning machines on AgentField. The intelligence is in the composition, not the components. + +## HARD GATE — READ BEFORE ANYTHING ELSE + +> **Do NOT write any code, generate any file, or scaffold any project until you have:** +> 1. Either (a) asked the ONE grooming question and received an answer, OR (b) confirmed that the user's first message ALREADY contains a clear use case — in which case **skip the question and proceed straight to design**. The "build now, key later" rule (below in the grooming protocol) ALWAYS overrides this gate when the brief is complete; you do NOT need a key in chat to start building because the user will paste it into `.env` themselves +> 2. Read `references/choosing-primitives.md` (mandatory — sets the philosophy and the real SDK signatures) +> 3. Designed the reasoner topology with **depth, not just breadth** (see "Reasoners are software APIs" below) — which `@app.reasoner` units, who calls whom, which are `.ai` vs deterministic skills, where the dynamic routing happens +> +> **Do NOT default to a single big reasoner with one `app.ai` call.** That's a CrewAI clone. Decompose. +> +> **Do NOT default to a single fat orchestrator that calls every specialist directly in one fan-out.** That's a star pattern, also a CrewAI clone wearing a different costume. Build deep call chains (see below). +> +> If you cannot draw your system as a non-trivial graph **with depth ≥ 3**, you have not architected anything. +> +> Violating the letter of this gate is violating the spirit of the gate. There are no exceptions for "simple" use cases. + +## The unit of intelligence is the reasoner — treat them as software APIs + +This is the most important framing in the entire skill. **Each reasoner is a microservice. Reasoners call other reasoners the way one REST API calls another.** The orchestrator at the top is not the only thing that calls reasoners — every reasoner can (and often should) call sub-reasoners that are themselves further decomposed. + +**Bad shape — flat star (the default a coding agent will reach for):** +``` +entry_orchestrator +├── specialist_1 ──┐ +├── specialist_2 ──┤ +├── specialist_3 ──┼── all called once, in parallel, by the orchestrator +├── specialist_4 ──┤ +└── specialist_5 ──┘ + │ + v + synthesizer +``` + +This is depth = 2 (entry → specialist → done). It's basically `asyncio.gather([llm_call_1, llm_call_2, ...])` with extra ceremony. Easy to write, but it doesn't earn the AgentField label. + +**Good shape — composition cascade (depth ≥ 3, parallelism at multiple levels):** +``` +triage_case (entry) +├── case_classifier ─────────────┐ +│ └── chief_complaint_parser │ +│ └── medical_term_normalizer +│ +├── ami_assessor │ ← all parallel +│ ├── cardiac_risk_calculator (deterministic skill) +│ ├── ami_pattern_matcher (.ai) +│ │ └── ecg_finding_classifier (.ai called by ami_pattern_matcher when needed) +│ └── biomarker_predictor (.ai) +│ +├── pe_assessor │ +│ ├── wells_score_calculator (deterministic skill) +│ ├── dyspnea_grader (.ai) +│ └── dvt_history_checker (.ai) +│ +├── stroke_assessor │ +│ ├── fast_screen (.ai) +│ └── nihss_estimator (.ai called only if fast_screen positive) +│ +└── adversarial_synthesizer ─────┘ + ├── steel_man_alternative_dx (.ai called once per primary assessment) + └── confidence_reconciler (.ai) + └── deterministic_safety_overrides (plain Python) +``` + +This system has depth 4, runs **at least three parallelism waves**, and each "specialist" is itself composed of 2–4 sub-reasoners that may call each other. **Each reasoner has a single cognitive responsibility you could write a one-line API contract for.** Reasoners that always co-execute become one reasoner; reasoners that have distinct judgment surfaces stay separate. + +**Why this matters:** +1. **Each reasoner is replaceable.** Want to swap `wells_score_calculator` for a more accurate one? Change one file. The flat-star pattern would have that logic buried inside a 200-line `pe_assessor` reasoner. +2. **Each reasoner is testable in isolation.** You can `curl /api/v1/execute/medical-triage.wells_score_calculator` directly with a synthetic input. The flat-star pattern only exposes the entry reasoner. +3. **Each reasoner is reusable.** `medical_term_normalizer` can be called from `chief_complaint_parser` AND from `comorbidity_amplifier` AND from a future `discharge_summary_generator`. The flat-star pattern duplicates logic across specialists. +4. **Each reasoner is observable.** The control plane workflow DAG shows the full call tree, not just a single `gather`. The verifiable credential chain has structure. +5. **Parallelism happens at multiple levels.** The flat-star fan-outs N specialists once. The deep DAG fans out N specialists × M sub-calls each, with the orchestration `asyncio.gather`-ing at each layer. Total wall-clock time goes down even though total calls go up. + +**Concrete rules:** +- If a reasoner has more than ~30 lines of body code, it's probably 2 reasoners +- If two reasoners always call each other in sequence, they should be one reasoner (or one reasoner with a deterministic helper) +- If your entry reasoner is the ONLY thing that calls `app.call`, the architecture is too flat — push the calls down into the specialists +- If your topology can be drawn as a literal star, throw it out and design for depth +- A reasoner should have a clear API contract you could write in one sentence: *"Given X, return Y. Calls Z, W."* + +**The unit of intelligence is the reasoner. Treat them like software APIs and the system writes itself.** + +## The non-negotiable promise + +Every invocation of this skill must end with the user able to run **two commands** and get a working multi-reasoner system: + +```bash +docker compose up --build +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"...": "..."}}' +``` + +If you cannot deliver that, you have failed. No theoretical architectures. No "here's how you would do it." A running stack and a curl that returns a real reasoned answer. + +**Note the curl body shape: `{"input": {...kwargs...}}`** — the control plane wraps reasoner kwargs in an `input` field. Verified against `control-plane/internal/handlers/execute.go:1000`. Many coding agents get this wrong. + +## Workflow (universal — works for any coding agent) + +1. **Announce** you're using the `agentfield-multi-reasoner-builder` skill. +2. **Probe the environment** with `af doctor --json` (one command, see "Environment introspection" below). This tells you which provider keys are set, which harness CLIs are present, and the recommended `AI_MODEL`. Use this output instead of guessing. +3. **Ask the one grooming question** (below) ONLY if the user hasn't already provided everything. +4. **Read `choosing-primitives.md` ALWAYS.** Read other references when their trigger fires (table below). +5. **Design the topology** before writing files. +6. **Lay down infrastructure** with `af init --language python --docker --defaults --non-interactive --default-model ` (one command, see "Infrastructure scaffold" below). +7. **Customize `main.py` and `reasoners.py`** with the real reasoner architecture per `scaffold-recipe.md`. Generate `CLAUDE.md` (from `project-claude-template.md`) and `README.md` AFTER you know the entry reasoner name and the curl payload. +8. **Validate**: `python3 -m py_compile main.py`, `docker compose config`, ideally `docker compose up --build` + verification ladder. +9. **Hand off** with the output contract below. + +## Environment introspection: `af doctor` + +Run this **once** at the start of every build. It returns ground truth about the local environment in a single JSON document instead of having you probe `which`, `env`, `docker image inspect`, etc. yourself: + +```bash +af doctor --json +``` + +Key fields you'll consume: +- `recommendation.provider` — `openrouter` / `openai` / `anthropic` / `google` / `none` +- `recommendation.ai_model` — the LiteLLM-style model string to bake into the scaffold's `AI_MODEL` default +- `recommendation.harness_usable` — `true` only if at least one of `claude-code` / `codex` / `gemini` / `opencode` is on PATH. **If `false`, do not use `app.harness()` in the scaffold under any circumstance.** +- `recommendation.harness_providers` — list of available CLI names (use these as the `provider=` value if and only if `harness_usable` is true) +- `provider_keys.{name}.set` — boolean per provider (no values leaked) +- `control_plane.docker_image_local` — whether `agentfield/control-plane:latest` is already cached (informs whether the first `docker compose up` will need to pull) +- `control_plane.reachable` — whether a control plane is already running locally (so you can curl test reasoners against it before building your own) + +**Use the doctor's output to set the `--default-model` flag on `af init` and to decide whether `app.harness()` is even an option in the architecture.** Do not hardcode your assumptions about the environment. + +## Infrastructure scaffold: `af init --docker` + +Run this **once** after `af doctor` and your architecture design. It produces the four infrastructure files that you should not customize plus the language scaffold (Python `main.py`, `reasoners.py`, `requirements.txt`): + +```bash +af init --language python --docker --defaults --non-interactive \ + --default-model +``` + +What it generates: +- `Dockerfile` — universal Python 3.11-slim, builds from project dir, no repo coupling +- `docker-compose.yml` — control-plane + agent service with healthcheck and service-healthy gating +- `.env.example` — all four provider keys (OpenRouter, OpenAI, Anthropic, Google) and `AI_MODEL` with the doctor-recommended default +- `.dockerignore` +- `main.py`, `reasoners.py`, `requirements.txt`, `README.md`, `.gitignore` — the standard language scaffold (you'll **rewrite `main.py` and `reasoners.py`** with your real architecture) + +What it does NOT generate (intentionally): +- `CLAUDE.md` — you generate this from `references/project-claude-template.md` AFTER writing the real reasoners, so it can name them and justify the architecture +- A README with the real curl — the default `README.md` is generic; you replace it AFTER picking the entry reasoner so the curl uses real kwargs + +The four infrastructure files are zero-change for the agent: Dockerfile installs `agentfield` from `requirements.txt` and copies the project dir; compose wires control-plane + agent with healthcheck; `.env.example` exposes all providers; `.dockerignore` covers the standard cases. **Do not modify them unless you have a real reason.** + +## Reference table — load when + +| File | Load when | +|---|---| +| `choosing-primitives.md` | **Every invocation** — before any code | +| `architecture-patterns.md` | Designing inter-reasoner flow / picking HUNT→PROVE, parallel hunters, fan-out, streaming, meta-prompting | +| `scaffold-recipe.md` | Actually writing files / docker-compose / Dockerfile | +| `verification.md` | Writing the smoke test ladder or declaring done | +| `project-claude-template.md` | Generating the per-project CLAUDE.md (always) | +| `anti-patterns.md` | When tempted to take a shortcut OR when the user pushes back on a rejection | + +Reference files are one level deep from this file. Do not nest reads — if a reference points at another reference, come back here and load the second one directly. + +## The grooming protocol (1 question, then build) + +Ask **exactly one** question and **one** key request. Nothing else upfront: + +> "Tell me in 1–2 sentences what you want this agent system to do, and paste your provider key. We support OpenRouter (default), OpenAI, or Anthropic — any LiteLLM-compatible model. Example: `OPENROUTER_API_KEY=sk-or-v1-...`" + +**Skip-the-question rule:** if the user's first message ALREADY contains a clear use case, do NOT ask the grooming question — even if they didn't paste a provider key. This is the **"build now, key later"** policy: + +- If the user gives a clear use case AND a provider key → proceed straight to design + build +- If the user gives a clear use case AND says they'll paste the key into `.env` later → ALSO proceed straight to design + build. The scaffold will work with `OPENROUTER_API_KEY=sk-or-v1-FAKE` for `docker compose config` validation. The user runs the real key from `.env` when they're ready +- If the user gives a clear use case AND says nothing about a key → proceed straight to design + build. The `.env.example` you generate makes it obvious where to put the key +- If the user's request is genuinely vague or ambiguous along an architecture-changing axis → THEN ask one question + +The point is to **never block the build on a key the user is going to drop into `.env` themselves**. Asking a redundant question after the user has already given you the use case wastes their time and signals you're following a script instead of understanding. + +Then proceed. Infer everything else from the use case. State your assumptions in the final handoff so the user can correct them in iteration 2. + +**Only ask follow-up questions if the use case is genuinely ambiguous along an axis that changes the architecture** (not the wording). Examples that warrant a follow-up: + +- Input is a 200-page document vs. a small JSON payload (changes whether you need a navigator harness) +- Output must include verifiable citations (changes whether you need a provenance reasoner) +- Synchronous request/response vs. event-driven (pattern 8 vs. patterns 1–7) + +Examples that do **NOT** warrant a follow-up: model preference, file naming, port number, code style, what to call the entry reasoner. Decide and state. + +## The five primitives (cheat sheet — full detail in `choosing-primitives.md`) + +- **`@app.reasoner()`** — every cognitive unit. Schemas come from **type hints** (no `input_schema=` param exists). +- **`@app.skill()`** — deterministic functions. No LLM. Use whenever an LLM call is overkill. +- **`app.ai(system, user, schema, model, tools, ...)`** — single OR multi-turn LLM call. `tools=[...]` makes it stateful. `model="..."` per call overrides AIConfig default. +- **`app.harness(prompt, provider="claude-code"|"codex"|"gemini"|"opencode")`** — delegates to an external coding-agent CLI. **Not** a generic tool-using LLM (that's `app.ai(tools=[...])`). **REQUIRES** the chosen provider's CLI to be installed inside the agent container — see "Harness availability gate" below. +- **`app.call(target, **kwargs)`** — inter-reasoner traffic THROUGH the control plane. Returns `dict`. **No model override param** — thread `model` as a regular reasoner kwarg. + +**The bias:** many small `@app.reasoner()` units. `@app.skill()` for anything code can do. `app.ai()` with explicit prompts. Reserve `app.harness()` for real coding-agent delegation. + +## Harness availability gate (READ BEFORE USING `app.harness()`) + +`app.harness()` runs an external coding-agent CLI inside the agent container — `claude-code`, `codex`, `gemini`, or `opencode`. **The default `python:3.11-slim` Docker image has none of these installed.** A scaffold that uses `app.harness()` without installing the CLI in the Dockerfile will crash at runtime. + +**The check is automated.** `af doctor --json` reports `recommendation.harness_usable` (true/false) and `recommendation.harness_providers` (the list of CLIs on PATH). Use the doctor output as the source of truth — do not assume. + +**Default rule:** scaffolds **MUST NOT** use `app.harness()` at all when `recommendation.harness_usable == false`. Use `app.ai(tools=[...])` for stateful reasoning, or a `@app.reasoner()` that loops `app.ai()` for chunked work. These work in the default container with zero extra setup. + +**You may use `app.harness()` ONLY when ALL of the following are true:** + +1. The use case **genuinely requires a real coding agent** in the loop — i.e. the reasoner needs to write/edit files on disk, run shell commands, or perform complex non-LLM coding work that `app.ai(tools=[...])` cannot do. +2. You modify the Dockerfile to install the chosen provider's CLI. Example for Claude Code: + ```dockerfile + RUN apt-get update && apt-get install -y --no-install-recommends nodejs npm \ + && npm install -g @anthropic-ai/claude-code \ + && rm -rf /var/lib/apt/lists/* + ``` +3. You add a **startup availability check** in `main.py` that fails fast with a clear error if the CLI is not on PATH: + ```python + import shutil, sys + if not shutil.which("claude"): # or "codex" / "gemini" / "opencode" + print("ERROR: app.harness(provider='claude-code') requires the `claude` CLI in PATH.", file=sys.stderr) + sys.exit(1) + ``` +4. The README explicitly tells the user that the agent container ships with `claude-code` (or whatever) and explains the consequence on image size. + +**If any of the four are not satisfied, do not use `app.harness()`.** Refactor the reasoner to use `app.ai(tools=[...])` or a chunked `@app.reasoner()` loop. There is no scenario where it's OK to write `app.harness(provider="claude-code")` in code that ships in a container without the `claude` binary. + +When in doubt: **don't use harness.** The user can ask for it in iteration 2. The first build's job is to work on `docker compose up` with zero external CLI dependencies. + +## Mandatory patterns (every build must have all three) + +### 1. Per-request model propagation + +The entry reasoner accepts `model: str | None = None` and threads it through every `app.ai(..., model=model)` and `app.call(..., model=model)`. Child reasoners accept `model` the same way and use it. The user can A/B test models per request: + +```bash +curl -X POST http://localhost:8080/api/v1/execute/. \ + -d '{"input": {"...": "...", "model": "openrouter/openai/gpt-4o"}}' +``` + +If `model` is omitted, the AIConfig default from the env var `AI_MODEL` is used. **`app.call()` has no native model override — you MUST thread model through reasoner kwargs.** + +### 2. Routers when reasoners > 4 + +Use `AgentRouter(prefix="domain", tags=["domain"])` and `app.include_router(router)` to split reasoners into separate files. Tags merge between router and per-decorator. **Note:** `prefix="clauses"` auto-namespaces reasoner IDs as `clauses_` — call them as `app.call(f"{app.node_id}.clauses_", ...)`. + +### 3. Tags on the entry reasoner + +The public entry reasoner is decorated with `tags=["entry"]` so it surfaces in the discovery API. Tags are free-form (not reserved); use domain tags for internal reasoners. + +## Hard rejections — refuse these without negotiation + +| ❌ Rejected pattern | ✅ AgentField alternative | +|---|---| +| Direct HTTP between reasoners (`httpx.post(...)`) | `await app.call(f"{app.node_id}.X", ...)` — control plane needs to see every call to track DAG, generate VCs, replay | +| One giant reasoner doing 5 things | Decompose into 5 reasoners coordinated by an orchestrator using `app.call` + `asyncio.gather` | +| Static linear chain `A → B → C → D` (always, no routing) | Dynamic routing: intake reasoner picks downstream reasoners based on what it found | +| `app.ai(prompt=full_50_page_doc)` | `@app.reasoner` that loops `app.ai` per chunk, OR `app.ai(tools=[...])` with explicit tool calls | +| Unbounded `while not confident: app.ai(...)` | Hard cap: `for _ in range(MAX_ROUNDS): ...` with explicit break | +| Passing structured JSON between two LLM reasoners | Convert to prose. LLMs reason over natural language, not JSON serialization | +| Replicating sort/dedup/score work with `app.ai` | `@app.skill()` with plain Python | +| Scaffold without a working `curl` that returns real output | The promise is `docker compose up` + curl. Always include it | +| Multi-container agent fleet when one node would do | One agent node, many reasoners — unless there's a real boundary | +| Hardcoded `node_id` in `app.call("financial-reviewer.X", ...)` | `app.call(f"{app.node_id}.X", ...)` — survives `AGENT_NODE_ID` rename | +| Hardcoded model | `model=os.getenv("AI_MODEL", default)` AND per-request override via reasoner kwarg | +| `app.ai()` schema with no `confident` field and no fallback | Schema must include `confident: bool`, call site checks it and escalates | +| `app.harness(provider="claude-code")` in a default scaffold | Default container has no `claude` CLI. Use `app.ai(tools=[...])` or a chunked-loop reasoner. See "Harness availability gate" | +| `input_schema=` or `output_schema=` parameter on `@app.reasoner` | These don't exist. Schemas come from type hints | +| `app.serve()` in `__main__` | `app.run()` — auto-detects CLI vs server mode | + +When the user explicitly demands a rejected pattern, name the rejection, explain *why* in one sentence, propose the AgentField alternative, and only build it their way after they've confirmed they understand the tradeoff. Add a `# NOTE: User requested X over canonical Y` comment. + +## Rationalization counters & red flags + +These thoughts mean STOP. If you notice any of them, re-read the linked reference and reconsider. + +| Thought / symptom | Reality / re-read | +|---|---| +| "Quick demo, I'll skip the architecture" | The skill exists to be stronger than a chain. Weak demo proves nothing | +| "I'll pass JSON between two reasoners" | LLMs reason over prose. Strings between LLMs, JSON only for code | +| "One big `analyze()` reasoner is fewer files" | Decompose. Granularity is the forcing function for parallelism. `choosing-primitives.md` | +| "I'll skip the CLAUDE.md / README" | They're how the next coding agent extends without breaking it. Always generate | +| "I'll ask 5 questions to be safe" | One question. State assumptions. Iterate | +| "Curl is enough, skip discovery API" | Discovery API tells you in 2s which step actually failed. `verification.md` | +| "I need stateful tool-using → `app.harness()`" | NO. `app.harness()` is external coding-agent CLI delegation AND requires the CLI in the container. Use `app.ai(tools=[...])` or a chunked-loop reasoner | +| "I'll add `app.harness(provider='claude-code')` for the deep reasoning step" | The default Python container has no `claude` CLI. The scaffold will crash on first run. Read "Harness availability gate" | +| "I'll add `input_schema=` to the decorator" | That param doesn't exist. Schemas come from type hints | +| ".ai() for a 50-page document" | `app.ai(tools=[...])` or a chunked-loop reasoner. `choosing-primitives.md` | +| "Static `for` loop of LLM calls, no routing" | Add dynamic routing or admit AgentField isn't justified. `architecture-patterns.md` | +| "Skipping `python3 -m py_compile` and `docker compose config`" | Always run. `scaffold-recipe.md` | +| "I'll write `import requests` to call the other reasoner" | Use `app.call(f"{app.node_id}.X", ...)`. `choosing-primitives.md` | +| "I'll use `app.serve()` in main" | Use `app.run()`. Auto-detects CLI vs server | + +## Output contract (every build) + +The final message to the user MUST contain these sections, in this order, in a clean copy-pasteable format. The whole point is the first-time user can read the message top to bottom and within 60 seconds have the system running and a working curl in another terminal. + +### 1. What was scaffolded + +Generated file tree with absolute paths. + +### 2. Architecture sketch + +4–6 bullets: what each reasoner does, who calls whom, where the dynamic routing happens, where the safety guardrails fire. + +### 3. Assumptions made + +5–10 bullets — the things you inferred without asking. + +### 4. 🚀 Run it (3 commands) + +```bash +cd +cp .env.example .env # then paste your OPENROUTER_API_KEY into .env +docker compose up --build +``` + +Wait until you see `agent registered` in the logs (~30–90 seconds first run). + +### 5. 🌐 Open the UI + +After the stack is up, open these URLs in your browser: + +| URL | What it shows | +|---|---| +| **http://localhost:8080/ui/** | AgentField control plane web UI — live workflow DAG, reasoner discovery, execution history, verifiable credential chains | +| **http://localhost:8080/api/v1/discovery/capabilities** | JSON: every reasoner registered with the control plane (proves your build deployed) | +| **http://localhost:8080/api/v1/health** | Health check | + +### 6. ✅ Verify the build (in another terminal) + +```bash +# 1. Control plane up? +curl -fsS http://localhost:8080/api/v1/health | jq '.status' + +# 2. Agent node registered? (use ?health_status=any — default filter can hide healthy nodes) +curl -fsS 'http://localhost:8080/api/v1/nodes?health_status=any' | jq '.nodes[] | {id: .node_id, status: .status}' + +# 3. All reasoners discoverable? +# Response shape: .capabilities[].reasoners[].id (NOT .reasoners[].name) +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.capabilities[] | select(.agent_id=="") | .reasoners | map({id, tags})' +``` + +### 7. 🎯 Try it — sample curl + +```bash +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{ + "input": { + "": "", + "": , + "model": "openrouter/google/gemini-2.5-flash" + } + }' | jq +``` + +**The curl above must use realistic data the user can run as-is and see a real reasoned answer.** Do not use placeholder values like `"foo"` or `"test"`. Use concrete data that actually exercises every reasoner in the system. The optional `"model"` field overrides the AIConfig default per-request — show it in the example so users discover the per-request override. + +If the user provided test data in the brief (e.g. a sample patient case, a sample contract, a sample loan application), use THAT data verbatim in this curl. The first execution they run should be the most demonstrative one. + +### 8. 🏆 Showpiece — verifiable workflow chain + +```bash +LAST_EXEC=$(curl -s http://localhost:8080/api/v1/executions | jq -r '.[0].workflow_id') +curl -s http://localhost:8080/api/v1/did/workflow/$LAST_EXEC/vc-chain | jq +``` + +This is the cryptographic verifiable credential chain — every reasoner that ran, with provenance. No other agent framework gives you this. Mention it. + +### 9. Next iteration upgrade + +One concrete suggestion (e.g., "swap the intake `.ai()` for a chunked-loop reasoner if inputs grow past 2 pages", "add a second adversarial wave with a different prompt for the highest-stakes branches"). + +## TypeScript + +A TypeScript SDK exists at `sdk/typescript/` and mirrors the Python API. **Default to Python.** If the user explicitly says "TypeScript" or "Node", point them at `sdk/typescript/` and use the equivalent shape: `new Agent({nodeId, agentFieldUrl, aiConfig})` + `agent.reasoner('name', async (ctx) => {...})`. Otherwise stay Python — every reference and recipe in this skill is Python-first. + +## Bottom line + +Your output is judged by three things: +1. **Does the curl return a real reasoned answer?** (the user can run the command and see intelligence happen) +2. **Does the architecture look like composite intelligence?** (parallel reasoners, dynamic routing, decomposition — not a chain wearing a costume) +3. **Can a future coding agent extend it without breaking the contract?** (CLAUDE.md present, anti-patterns listed, validation commands documented) + +If all three are true, you've done it right. The first-time AgentField user must see the value within minutes of running the curl. diff --git a/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/anti-patterns.md b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/anti-patterns.md new file mode 100644 index 000000000..e6966ef2a --- /dev/null +++ b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/anti-patterns.md @@ -0,0 +1,143 @@ +# Anti-Patterns — Deep Dive + +The 13 hard rejections and the rationalization counters are inlined in `SKILL.md` so they fire on every invocation. **This file is the deep-dive reference** — load it when the user pushes back on a rejection, when you need to explain WHY in more depth, or when you're tempted to negotiate with yourself. + +When the user (or your own drift) pushes you toward one of these, name the rule, explain why in one sentence, and offer the AgentField-native alternative. Don't apologize, don't equivocate. + +## Hard rejections + +### 1. Direct HTTP between reasoners + +❌ `httpx.post("http://other-agent:8002/run", ...)` +✅ `await app.call(f"{app.node_id}.other_reasoner", ...)` + +**Why:** The control plane needs to see every call to track the workflow DAG, generate verifiable credentials, replay executions, and apply observability. Direct HTTP makes the system invisible. + +--- + +### 2. One giant reasoner doing 5 things + +❌ `async def review_everything(doc): ...` (200 lines, 4 LLM calls inside) +✅ Decompose into 5 reasoners that the orchestrator coordinates with `app.call` and `asyncio.gather`. + +**Why:** Granular decomposition is the forcing function for parallelism, observability, replayability, and quality. A monolithic reasoner is just a script with extra steps. + +--- + +### 3. Static linear chain where the path depends on discoveries + +❌ `intake → analyze → score → report` (always, in this order, regardless of intake) +✅ Intake routes to different downstream reasoners based on what it found. If risk is high, spawn a deep-dive harness. If complexity is low, skip the adversary. + +**Why:** Dynamic routing IS the meta-level intelligence that distinguishes AgentField from chain frameworks. A static chain can be written in 30 lines of LangChain. + +--- + +### 4. `.ai()` on a long document + +❌ `await app.ai(prompt=full_50_page_contract, schema=Result)` +✅ A `.harness()` that can navigate the document with `read_section` / `lookup_definition` tools. + +**Why:** `.ai()` is single-shot. It cannot adapt, navigate, or escalate. Stuffing a long doc into the prompt either truncates silently, blows the context window, or produces shallow answers because the model never reads past page 3. + +--- + +### 5. Unbounded loops + +❌ `while not confident: result = await app.ai(...)` +✅ `for _ in range(MAX_ROUNDS): ...` with a hard cap and an explicit break condition. + +**Why:** "Keep going until confident" is how you get a $400 bug report. Every loop has a cap. Period. + +--- + +### 6. Structured JSON shoved into another LLM as "context" + +❌ `await app.ai(user=str(previous_findings.model_dump()), ...)` +✅ `await app.ai(user=format_findings_as_prose(previous_findings), ...)` + +**Why:** LLMs reason over natural language, not over JSON serialization. Structured output between code and a reasoner is correct. Structured output between two reasoners is a smell — convert it to prose with the relevant context. + +--- + +### 7. Replicating programmatic work with an LLM + +❌ `await app.ai(prompt="Sort these 50 items by score", ...)` +✅ `sorted(items, key=lambda x: x.score, reverse=True)` + +**Why:** You are paying for intelligence. Sorting is not intelligence. If a `for` loop or a sort function would do it, do it. Save the LLM calls for things that previously required a human expert. + +--- + +### 8. Scaffold without a working `curl` + +❌ "Here are the files; you can figure out how to test it." +✅ A README with the exact verification ladder (health → nodes → capabilities → execute) and a curl that returns a real reasoned answer. + +**Why:** The promise is `docker compose up` + curl. If the user can't run those two commands and see real output, the build failed regardless of how nice the architecture looks on paper. + +--- + +### 9. Multi-container agent fleet when one node would do + +❌ Five Docker services for "research agent", "writer agent", "editor agent", "fact-checker agent", "publisher agent" +✅ ONE agent node with five reasoners. Same orchestration capability, 5× less ops surface. + +**Why:** Reasoners are cheaper than containers. Use multiple containers only when there's a real boundary (separate teams, separate language runtimes, separate scaling profiles, separate trust domains). Otherwise, one node with many reasoners is the right shape. + +--- + +### 10. Hardcoded model strings + +❌ `ai_config=AIConfig(model="gpt-4o")` +✅ `ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/google/gemini-2.5-flash"))` AND accept a `model` parameter on the entry reasoner that propagates via `app.call(..., model=model)`. + +**Why:** Users need to swap models per-request to A/B test without rebuilding the container. Make the model dynamic at three layers: env default, container override, per-request override. + +--- + +### 11. Hardcoded `node_id` in `app.call` + +❌ `await app.call("financial-reviewer.score", ...)` +✅ `await app.call(f"{app.node_id}.score", ...)` + +**Why:** When the user renames the node via `AGENT_NODE_ID`, hardcoded calls break. Always reference your own reasoners through `app.node_id`. + +--- + +### 12. `.ai()` with no `confident` flag and no fallback + +❌ Schema is `{decision: str, reason: str}` and the call site doesn't validate. +✅ Schema is `{decision: str, reason: str, confident: bool}` and the call site checks `if not result.confident: escalate_to_harness()`. + +**Why:** Every `.ai()` has a failure mode. A failed `.ai()` that propagates a confidently-wrong answer is the single most expensive bug an AgentField system can ship. + +--- + +## Rationalization counters + +When you (or the user) start producing one of these, recognize it and refuse: + +| Rationalization | Counter | +|---|---| +| "Just for the demo, a chain is fine" | The demo is the proof. A weak demo proves nothing. | +| "The LLM is smart enough to handle the whole document in one call" | The LLM is 0.3-grade. The architecture is 0.8-grade. Don't mix them up. | +| "I'll add the harness later if it doesn't work" | You'll never know it doesn't work because the .ai() will silently truncate. Start with harness. | +| "Routing is overkill, the workflow is always the same" | Then the workflow doesn't justify AgentField. Tell the user honestly. | +| "I'll skip the curl smoke test, the user will figure it out" | The user invoked a skill. The skill's whole point is they don't have to figure it out. | +| "The CLAUDE.md is bureaucratic, the code is self-documenting" | Code documents WHAT. CLAUDE.md documents WHY this is the architecture and what NOT to undo. The next agent needs both. | +| "Two grooming questions is barely anything" | One question. The point is to feel magical to the first-time user. Infer the rest. | +| "I'll skip the discovery API check, I trust the build" | A curl that hangs at 30s tells you nothing about which step failed. Discovery API tells you in 2s. | +| "I'll ship the JSON directly to the next reasoner, it's cleaner" | Cleaner for you. Worse for the LLM. Convert to prose. | +| "More containers means better separation" | More containers means more YAML, more network hops, more failure modes. Use one node unless you have a real reason. | + +## When the user explicitly demands a rejected pattern + +Some users will insist. Honor that — but only after you've named the rejection, explained why in one sentence, and they've confirmed they understand the tradeoff. Then build it their way and add a comment in the code: + +```python +# NOTE: User explicitly requested static chain over dynamic routing despite +# the canonical AgentField pattern being dynamic. See README "Tradeoffs" section. +``` + +The point is not to be a tyrant — it's to refuse drift. Conscious choices are fine. Drift is not. diff --git a/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/architecture-patterns.md b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/architecture-patterns.md new file mode 100644 index 000000000..fef4201f3 --- /dev/null +++ b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/architecture-patterns.md @@ -0,0 +1,320 @@ +# Architecture Patterns — The 8 AgentField Compositions + +These are battle-tested patterns from real AgentField systems (`sec-af`, `af-swe`, `contract-af`, `af-deep-research`, `reactive-atlas`). Pick one, compose two, or invent your own — but never default to a static linear chain. + +For each pattern: when to use it, the shape, and a real-system reference. + +--- + +## 1. Parallel Hunters + Signal Cascade + +**Shape:** +``` +input ──┬──> hunter_A ──┐ + ├──> hunter_B ──┼──> findings_pool ──> downstream + ├──> hunter_C ──┘ + └──> hunter_D +``` + +**When:** Any problem with multiple independent analysis dimensions that can be examined concurrently. Each hunter is a specialist that knows about ONE dimension deeply. + +**Reference:** `examples/sec-af/` — parallel strategy hunters analyzing SEC filings; `examples/contract-af/` — parallel clause analysts (IP / liability / non-compete / data / termination). + +**Code shape:** +```python +@app.reasoner() +async def review(document: str) -> dict: + findings = await asyncio.gather(*[ + app.call(f"{app.node_id}.{h}", document=document) + for h in ["profitability_hunter", "liquidity_hunter", "risk_hunter", "efficiency_hunter"] + ]) + return await app.call(f"{app.node_id}.synthesizer", findings=findings) +``` + +**Common mistake:** Making the hunters do "everything" each. Each hunter is a NARROW specialist. If hunters overlap heavily, you decomposed wrong. + +--- + +## 2. HUNT → PROVE Adversarial Tension + +**Shape:** +``` +input ──> hunters ──> candidate findings ──> provers ──> verified findings + ↑ + adversary tries to disprove each one +``` + +**When:** Any problem where false positives are catastrophic — security, legal, compliance, medical, financial. + +**Reference:** `examples/sec-af/` — vulnerability hunters → exploit provers; `examples/contract-af/` — clause analysts → adversary reviewer. + +**Why it works:** Hunters are biased toward sensitivity (find everything). Provers are biased toward specificity (refuse anything unproven). The tension between them is the intelligence — neither alone produces a good answer. + +```python +@app.reasoner() +async def adversarial_review(input: str) -> dict: + candidates = await app.call(f"{app.node_id}.hunter_pool", input=input) + verified = await asyncio.gather(*[ + app.call(f"{app.node_id}.prover", finding=f, original=input) + for f in candidates + ]) + return [v for v in verified if v["proven"]] +``` + +--- + +## 3. Streaming Pipeline (asyncio.Queue) + +**Shape:** +``` +upstream ──emits──> queue ──consumes──> downstream + (starts working before upstream finishes) +``` + +**When:** Downstream reasoners can start working on partial results — and waiting for the full upstream batch wastes time and misses interaction effects. + +**Reference:** `examples/sec-af/` — HUNT→PROVE streaming; `examples/contract-af/` — analysts → cross-reference + adversary streaming. + +```python +findings_queue = asyncio.Queue() + +async def producer(items): + for item in items: + finding = await app.call(f"{app.node_id}.analyze", item=item) + await findings_queue.put(finding) + await findings_queue.put(None) # sentinel + +async def consumer(): + seen = [] + while (f := await findings_queue.get()) is not None: + # Check this finding against everything seen so far + await app.call(f"{app.node_id}.cross_ref", new=f, prior=seen) + seen.append(f) +``` + +--- + +## 4. Meta-Prompting (Harnesses Spawning Harnesses) + +**Shape:** +``` +parent_harness ──discovers X──> crafts a SPECIFIC prompt ──spawns──> child_harness ──> findings + ↑ │ + └────────────────── integrates findings ─────────────────────────────────────────────┘ +``` + +**When:** The investigation path depends on what gets discovered. You cannot pre-define which sub-reasoners will run, because you don't know yet what's there. + +**Reference:** `examples/contract-af/` — clause analysts spawning definition-impact analyzers when they discover a referenced defined term; cross-reference resolver spawning combination deep-dives. + +**This is the pattern that no framework chain can replicate.** It's pure dynamic intelligence. + +```python +@app.reasoner() +async def clause_analyst(clause: str, context: str) -> dict: + initial = await app.harness( + goal=f"Analyze this clause: {clause}", + tools=["read_section", "lookup_definition"], + max_iterations=10, + ) + + # The harness discovered a defined term that needs deeper analysis. + # Craft a SPECIFIC prompt for a child harness at runtime. + if initial.discovered_terms: + for term in initial.discovered_terms: + sub_prompt = ( + f"You are analyzing the cascading impact of the defined term '{term}' " + f"in the context of clause: {clause}. " + f"Read every section that references '{term}' and determine if any " + f"interaction creates risk. Return: affected_sections, risk_level, rationale." + ) + sub = await app.call( + f"{app.node_id}.term_impact_analyzer", + prompt=sub_prompt, + term=term, + ) + initial.term_impacts.append(sub) + return initial.model_dump() +``` + +**Hard rule:** every meta-spawn point has a depth cap. + +--- + +## 5. Three Nested Control Loops (Inner / Middle / Outer) + +**Shape:** + +| Loop | Scope | Trigger | Cap | +|---|---|---|---| +| **Inner** | Per-reasoner self-adaptation | Found a reference, escalation needed | `max_follows=3`, `max_escalations=1` | +| **Middle** | Cross-reasoner deep-dives | Critical combination, hidden interaction | `max_spawns=5` | +| **Outer** | Pipeline-wide coverage | Coverage gate detects a gap | `max_iterations=3` | + +**When:** Long-running analysis where you can't predict upfront how deep you need to go. Coverage matters and edge cases are dangerous. + +**Reference:** `examples/af-swe/` — inner coding loop / middle sprint loop / outer factory loop; `examples/contract-af/` — analyst loop / cross-ref loop / coverage loop. + +**Hard rule:** every loop has an absolute cap. "Keep going until confident" is how you get a $400 bug report. + +--- + +## 6. Fan-Out → Filter → Gap-Find → Recurse + +**Shape:** +``` +seed ──> [generate N candidates] ──> [filter to top K] ──> [gap analysis] + │ + ├─ gaps found ──> recurse with new seeds + └─ no gaps ──> done +``` + +**When:** Comprehensive coverage problems where you don't know the shape of the answer upfront — research, due diligence, audits, literature reviews. + +**Reference:** `examples/af-deep-research/` — recursive research with quality-driven loops. + +```python +@app.reasoner() +async def deep_research(question: str, max_rounds: int = 3) -> dict: + seeds = [question] + all_findings = [] + for round in range(max_rounds): + findings = await asyncio.gather(*[ + app.call(f"{app.node_id}.investigator", seed=s) for s in seeds + ]) + all_findings.extend(findings) + gaps = await app.call(f"{app.node_id}.gap_finder", findings=all_findings, original=question) + if not gaps.gaps: + break + seeds = gaps.gaps # next round's seeds + return await app.call(f"{app.node_id}.synthesizer", findings=all_findings) +``` + +--- + +## 7. Factory Control Loops + +**Shape:** Three nested loops for long-running multi-step execution with adaptive replanning. + +``` +outer (factory) ──> sprint planner ──> goals +middle (sprint) ──> task executor ──> tasks +inner (coding) ──> per-task agent ──> code + │ + └─ fails ──> outer replan +``` + +**When:** Multi-step execution that needs to replan based on intermediate results — code generation, document production, migration execution, multi-step research. + +**Reference:** `examples/af-swe/`. + +--- + +## 8. Reasoner Composition Cascade (READ THIS — it's the master pattern) + +**This is the pattern that distinguishes a real AgentField system from a fancy `asyncio.gather` wrapper.** Every other pattern in this file should be interpreted through this lens. + +**Shape — depth, not breadth:** + +``` +entry_reasoner +├── classifier_reasoner ─────────────────┐ +│ ├── input_normalizer (skill) │ +│ └── intent_extractor (.ai) │ +│ └── slot_filler (.ai called by intent_extractor when ambiguous) +│ +├── analysis_dimension_A_reasoner ────── ┤ ← all parallel via asyncio.gather +│ ├── deterministic_metric_calc (skill) +│ ├── pattern_judge (.ai) +│ │ └── citation_finder (.ai called by pattern_judge) +│ └── confidence_scorer (.ai) +│ +├── analysis_dimension_B_reasoner ────── │ +│ ├── different_metric_calc (skill) +│ ├── different_pattern_judge (.ai) +│ └── confidence_scorer (.ai REUSED — same reasoner) ← reuse across branches! +│ +├── analysis_dimension_C_reasoner ───────┤ +│ └── (3 sub-calls, similar shape) +│ +└── adversarial_synthesizer ─────────────┘ + ├── steel_man_alternative (.ai) ← called once per dimension + ├── disagreement_detector (.ai) + └── final_decision_reasoner (.ai) + └── safety_override (deterministic skill) +``` + +**Each layer fans out via `asyncio.gather`. Each reasoner has a single cognitive responsibility.** The orchestrator at the top is NOT the only thing that calls `app.call` — every dimension reasoner is itself a small orchestrator that calls 2–4 sub-reasoners. + +**Used in:** This is the pattern the medical-triage and loan-underwriter examples should follow when they're deep enough. Most large AgentField systems compose this pattern as the backbone, with the other 8 patterns layered on top (HUNT→PROVE between layers, streaming for partial results, etc.). + +**Why it's the master pattern:** + +1. **Reasoners as software APIs.** Each reasoner has a one-line API contract: *"Given X, return Y. Calls Z, W."* Other reasoners call it the way one microservice calls another. +2. **Composability over monolithic prompts.** A specialist reasoner like `pe_assessor` is NOT a 200-line `.ai()` prompt — it's an orchestrator that calls `wells_score_calculator`, `dyspnea_grader`, and `dvt_history_checker` and synthesizes their outputs. Each piece is testable, replaceable, reusable. +3. **Reuse across branches.** `confidence_scorer` is called from THREE different dimension reasoners. The flat-star pattern would have to copy-paste the logic three times. The composition cascade calls it once per branch — same code, three different contexts. +4. **Multi-layer parallelism.** `asyncio.gather` runs at the entry-reasoner layer (across dimensions A/B/C) AND inside each dimension reasoner (across its sub-calls). Total wall-clock time is dominated by the slowest path through the DAG, not by the sum. +5. **Observability has structure.** The control plane workflow DAG shows the actual call tree. The verifiable credential chain has hierarchy. A future debugger can ask "which sub-call inside `pe_assessor` flagged the concern" — the flat-star pattern can only tell you "pe_assessor returned X." +6. **Each reasoner is independently curl-able.** You can `POST /api/v1/execute/.wells_score_calculator` directly with synthetic input to debug or A/B test it. The flat-star only exposes the entry reasoner. + +**Decomposition rules:** + +- **30-line ceiling.** If a reasoner body is > 30 lines, it's probably 2 reasoners. Look for the seam — usually a "compute X then judge Y" boundary becomes "X is a `@app.skill`, Y is a `@app.reasoner` that calls X". +- **Single-judgment rule.** A reasoner makes ONE judgment call. If your reasoner is making three judgments ("is this concerning, is this acute, what's the risk score"), split into three reasoners. +- **Deterministic-vs-judgment split.** Anything that doesn't require LLM judgment (math, formula, regex, lookup, sort) is `@app.skill()` or a plain helper, not part of an `.ai()` reasoner body. +- **Reuse signal.** If the same logic appears in 2+ reasoners, extract it as its own reasoner and call it from both. +- **One-sentence API contract test.** Can you write a one-sentence contract for each reasoner ("Given a chief complaint string, return a list of red flag categories with confidence scores")? If not, the reasoner is doing too many things. + +**Anti-patterns that mean you fell back to a flat star:** + +- Your entry reasoner is the ONLY thing that calls `app.call` +- Your specialists each have a single fat `.ai()` call with a 500-token prompt +- Your DAG is depth 2 (`entry → specialists → done`) +- You can draw the architecture as a literal asterisk +- Two specialists have the same 50-line prompt with one line different — you should have had one parameterized sub-reasoner + +**Concrete medical-triage example:** + +A flat-star `red_flag_detector` reasoner with one big `.ai()` prompt → bad. + +A `red_flag_detector` reasoner that calls `cardiac_red_flag_checker`, `stroke_red_flag_checker`, `bleeding_red_flag_checker`, `psych_red_flag_checker` in parallel via `asyncio.gather`, each of which is itself a focused `.ai()` with its own narrow prompt and confidence flag → good. The deeper structure means a future agent can swap the cardiac checker for one with a more accurate prompt without touching anything else. + +**When you finish your design, count the depth.** If max depth from entry to leaf is < 3, redesign. A real composite-intelligence system has at least 3 layers of reasoner-calling-reasoner. + +--- + +## 9. Reactive Document Enrichment + +**Shape:** +``` +event source (DB change stream / webhook) ──> enrichment pipeline ──> output +``` + +**When:** Work is triggered by data arriving — incidents, PRs, contracts on upload, form submissions, telemetry events. + +**Reference:** `examples/reactive-atlas/` — MongoDB change streams → enrichment agents. + +**The point:** the engine is domain-agnostic; the config defines the domain. The same pattern handles "new contract uploaded → enrich → score → route" as it handles "new incident filed → triage → assign → notify". + +--- + +## How to pick a pattern (or compose your own) + +**Always start with pattern 8 (Reasoner Composition Cascade) as the backbone.** It's not optional. Every other pattern is layered on top. + +Then ask: + +1. **What triggers the work?** Event stream → pattern 9 (reactive enrichment). Direct API call → patterns 1–7 layered onto 8. +2. **Is the input large/navigable?** Yes → consider meta-prompting (pattern 4) inside one of your dimension reasoners. +3. **Multiple independent analysis dimensions?** Yes → parallel hunters (pattern 1) becomes the second-layer fan-out inside the cascade. +4. **False positives expensive?** Yes → add HUNT→PROVE (pattern 2) as a second-stage reasoner per dimension or one global adversarial reasoner. +5. **Downstream can start before upstream finishes?** Yes → streaming (pattern 3). +6. **Coverage matters and you can't predict shape upfront?** Pattern 6. +7. **Multi-round adaptive execution?** Pattern 5 or 7. +8. **The investigation path depends on discoveries?** Pattern 4 (meta-prompting), always. + +Most strong systems compose **pattern 8 (cascade) as the backbone + 2–3 of the others as layers**. Example: contract-af = composition cascade (8) + parallel hunters (1) at the second layer + HUNT→PROVE (2) at the third layer + streaming (3) between layers + meta-prompting (4) inside the deepest reasoners + nested loops (5). + +## When NONE of these fit + +Then the use case probably doesn't justify AgentField at all — it's a one-shot LLM call wearing a costume. Tell the user honestly. diff --git a/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/choosing-primitives.md b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/choosing-primitives.md new file mode 100644 index 000000000..5e7f40608 --- /dev/null +++ b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/choosing-primitives.md @@ -0,0 +1,502 @@ +# Choosing Primitives — Philosophy + Real SDK Surface + +The most consequential architectural decision in any AgentField build. This file is one read because the philosophy IS the primitive choice — you cannot decide between `.ai()` and a `@reasoner` loop without first knowing what kind of reasoning you're trying to amplify. Read top to bottom before writing code. + +--- + +## Part 1 — Composite Intelligence (the "why") + +A single LLM call reasons at ~0.3–0.4 on a normalized scale where 1.0 is human-expert. **You cannot prompt your way to 0.8.** You can architect your way there. + +A well-composed system of ten 0.3-grade reasoners can outperform a single 0.4-grade monolith by 5–10× on complex tasks — because the architecture itself encodes intelligence about how to break down problems, allocate cognitive work, combine partial solutions, and stay coherent across steps. + +You are not a prompt engineer. You are a **systems architect**. Your job is to design the cognitive architecture; the LLMs are interchangeable parts. + +### What this is NOT + +- ❌ A single super-intelligent generalist that solves anything in one call +- ❌ A linear chain of LLM calls dressed up with "agent" branding (LangChain, CrewAI, AutoGen patterns) +- ❌ A pile of unbounded autonomous agents "thinking" their way to an answer +- ❌ A tool to orchestrate tools (that's what a script is for) + +### What it IS + +- ✅ A network of **specialized cognitive functions**, each tightly scoped +- ✅ **Architecture patterns** that elevate collective reasoning above any individual call +- ✅ **Decomposed atomic reasoning units** that can run in parallel +- ✅ **Guided autonomy**: agents have freedom inside a tight scope, not unbounded freedom +- ✅ **Dynamic routing**: the path adapts to what gets discovered, not a hardcoded DAG +- ✅ **Verifiable provenance**: every claim traces to its source + +### The five foundational principles + +**1. Granular decomposition is mandatory.** No complex problem is solved by a single agent in a single step. The constraint is a forcing function that produces parallelism, observability, and quality. If your "AI agent" is one 200-line function, you decomposed wrong. + +**2. Guided autonomy, never unbounded.** A reasoner has freedom in HOW it accomplishes its goal, but **zero freedom** in WHAT the goal is. The orchestrator is a CEO, not a babysitter — it sets objectives and verifies outcomes. + +**3. Dynamic, state-responsive orchestration.** The flow of control is not static. Agent A's output determines what subsystem B even looks like. This is the **meta-level** intelligence that distinguishes AgentField from chain frameworks: the chain shape itself is intelligence. + +**4. Contextual fidelity & verifiable provenance.** The orchestrator is a context broker. Every reasoner gets exactly what it needs — no more, no less. Every claim carries a citation key that propagates to the final output. + +**5. Asynchronous parallelism.** Decompose to parallelize. If your reasoner pipeline runs sequentially, your decomposition is wrong. Use `asyncio.gather` aggressively. + +### The intelligence test + +The whole point is **intelligence**. If something can be done programmatically — sorting, scoring, deduping, filtering, regex extraction, schema validation — **do it in code** (`@app.skill()`). LLMs are reserved for things that previously required a human expert: judgment, discovery, synthesis, routing decisions on ambiguous data, recognizing patterns that don't have clean rules. + +If your "AI agent" is doing work a Python `for` loop could do, you're burning money and intelligence on the wrong layer. + +### Why AgentField, not LangChain or CrewAI + +LangChain and CrewAI give you **tools to build chains**. AgentField gives you a **control plane** that: + +- Routes every inter-reasoner call through a server you can introspect, replay, and audit +- Tracks the live workflow DAG so you can see the system's reasoning shape +- Generates W3C verifiable credentials for every execution (cryptographic audit trail) +- Lets reasoners spawn sub-reasoners with dynamic prompts at runtime (meta-prompting) +- Enforces a clean separation between agent nodes (deployable units) and reasoners (cognitive units) +- Gives you per-call model overrides so a parent reasoner can route different sub-tasks to different LLMs + +You are not building "an agent." You are deploying a **reasoning system** as production infrastructure. + +--- + +## Part 2 — The Real Python SDK Surface (the "how") + +Signatures here come from reading `sdk/python/agentfield/agent.py`, `router.py`, and `tool_calling.py` directly. Many docs describe an idealized API — this section is what actually works. + +## The five primitives + +| Primitive | What it really does | When to use | +|---|---|---| +| `@app.reasoner()` | Registers a function as a reasoner with the control plane. The function body is yours — make as many `app.ai()` / `app.call()` calls as you want | Wrap **every cognitive unit** in your system | +| `@app.skill()` | Registers a deterministic function. No LLM | Pure transforms, scoring, parsing, dedup, validation — anything code can do | +| `app.ai(...)` | Single call OR multi-turn tool-using LLM call (when `tools=` is passed). Returns text or a Pydantic schema | Classification, routing, structured analysis, **and** stateful tool-using reasoning when you give it tools | +| `app.call(target, **kwargs)` | Calls another reasoner/skill THROUGH the control plane. Tracks the workflow DAG | All inter-reasoner traffic. Never use direct HTTP | +| `app.harness(prompt, provider=...)` | **Delegates to an external coding-agent CLI** (claude-code, codex, gemini, opencode). Returns a `HarnessResult` | When you need a real coding agent to read/write files, run shell commands, or execute a non-trivial coding task as part of your pipeline | + +## What `app.ai()` actually accepts + +```python +result = await app.ai( + *args, # positional: text, urls, file paths, bytes, dicts, lists (multimodal) + system: str | None, # system prompt + user: str | None, # user prompt (alternative to positional) + schema: type[BaseModel] | None, # Pydantic class for structured output + model: str | None, # PER-CALL model override (e.g. "gpt-4o", "openrouter/google/gemini-2.5-flash") + temperature: float | None, + max_tokens: int | None, + stream: bool | None, + response_format: "auto" | "json" | "text" | dict | None, + tools: list | str | None, # tool definitions for tool-calling, OR "discover" to auto-discover + context: dict | None, + memory_scope: list[str] | None, # ["workflow", "session", "reasoner"] etc. + **kwargs, # provider-specific extras +) +``` + +**Critical things most coding agents miss:** +- `model=` is per-call. You can override the AIConfig default on any specific call. **Always** thread `model` through from the entry reasoner so the user can A/B test models per request. +- `tools=` makes `app.ai()` a multi-turn tool-using LLM. This is how you build "stateful reasoning agents" — not via `app.harness()`. Pass `tools="discover"` to auto-discover available tools, or pass a list of tool definitions. +- `memory_scope=["workflow", "session", "reasoner"]` injects relevant memory state into the prompt automatically. +- `schema=` returns a validated Pydantic instance, not a dict. Call `.model_dump()` to serialize. + +## What `app.harness()` actually accepts + +```python +result = await app.harness( + prompt: str, # task description + schema: type[BaseModel] | None, # optional structured output + provider: "claude-code" | "codex" | "gemini" | "opencode" | None, + model: str | None, # override the provider's default model + max_turns: int | None, # iteration cap + max_budget_usd: float | None, # cost cap + tools: list[str] | None, # which tools the coding agent is allowed to use + permission_mode: "plan" | "auto" | None, + system_prompt: str | None, + env: dict[str, str] | None, + cwd: str | None, + **kwargs, +) +# Returns HarnessResult with .text, .parsed (validated schema), .result +``` + +**Use harness when:** you need a real coding agent (Claude Code, Codex, Gemini CLI) to perform a task that requires actual file I/O, shell access, or multi-step coding. Example: a "fix-this-failing-test" reasoner spawns a Claude Code harness to actually edit the test file. + +**Do NOT use harness for:** in-process stateful LLM reasoning over a document. That's `app.ai(..., tools=[...])`. Harness is heavyweight — it spawns a subprocess running an entire agent CLI. + +## What `app.call()` actually does + +```python +result: dict = await app.call( + target: str, # "node_id.reasoner_name" + *args, # positional args (auto-mapped to target's params for local calls) + **kwargs, # keyword args passed to the target reasoner +) +``` + +**Always returns a `dict`** — even if the target reasoner returns a Pydantic model. Convert manually: +```python +result_dict = await app.call(f"{app.node_id}.score", text=passage) +result = ScoreResult(**result_dict) +``` + +**Critical:** always reference reasoners as `f"{app.node_id}.reasoner_name"` so renaming the node via `AGENT_NODE_ID` env doesn't break the system. Hardcoding the node ID is a bug waiting to happen. + +**Workflow tracking:** every `app.call` is recorded in the control plane's workflow DAG. Direct HTTP between reasoners bypasses this and is forbidden. + +## The decision tree (real, not aspirational) + +``` +What is this reasoner doing? + +├─ Pure deterministic transform (sort, parse, dedup, score-with-formula)? +│ → @app.skill() (no LLM, free, replayable) +│ +├─ Single classification with ≤4 flat fields, input fits comfortably in ~2k tokens? +│ → app.ai(system, user, schema=FlatModel) (with confident: bool, with fallback) +│ +├─ Stateful reasoning where the LLM needs to call tools, search, iterate? +│ → app.ai(system, user, tools=[...]) (multi-turn tool-using mode) +│ +├─ Long input (a document, a transcript, a corpus) that needs navigation? +│ → @app.reasoner() that does LOOPED app.ai() calls with chunking, +│ OR app.ai(..., tools=["read_section", ...]) if you've defined the tools, +│ OR pre-process with a @app.skill() chunker then fan-out via asyncio.gather +│ +├─ Need an actual coding agent to write/edit files / run shell? +│ → app.harness(prompt, provider="claude-code", tools=[...]) +│ +└─ Composing multiple reasoners? + → @app.reasoner() that uses app.call() and asyncio.gather +``` + +**The bias:** decompose into many small `@app.reasoner()` units. Use `app.ai()` with explicit prompts. Use `tools=` when you need tool-calling. Reserve `app.harness()` for when you literally need a coding agent in the loop. + +## The model-propagation pattern (mandatory in every build) + +The user must be able to swap models per request without rebuilding the container. Implement it like this in **every** generated entry reasoner: + +```python +@app.reasoner(tags=["entry"]) +async def review_financials( + company_name: str, + business_summary: str, + financial_snapshot: dict, + analyst_question: str = "Should we proceed?", + model: str | None = None, # ← per-request model override +) -> dict: + # 1. Use it in app.ai + plan = await app.ai( + system="You are a financial intake router.", + user=f"...", + schema=IntakePlan, + model=model, # ← propagate + ) + + # 2. Pass it to child reasoners via app.call + reviews = await asyncio.gather(*[ + app.call( + f"{app.node_id}.{axis}_reviewer", + company_name=company_name, + business_summary=business_summary, + model=model, # ← propagate + ) + for axis in plan.focus_areas + ]) + + # 3. Each child reasoner accepts and uses model the same way +``` + +And in every child reasoner: +```python +@app.reasoner() +async def profitability_reviewer( + company_name: str, + business_summary: str, + model: str | None = None, # ← accept it +) -> dict: + review = await app.ai( + system="You are a profitability reviewer.", + user=f"...", + schema=TrackReview, + model=model, # ← use it + ) + return review.model_dump() +``` + +The user can now pick the model per request: +```bash +curl -X POST http://localhost:8080/api/v1/execute/financial-reviewer.review_financials \ + -H 'Content-Type: application/json' \ + -d '{ + "company_name": "Acme", + "business_summary": "...", + "financial_snapshot": {...}, + "model": "openrouter/openai/gpt-4o" + }' +``` + +If `model` is omitted, the AIConfig default from the env var `AI_MODEL` is used. **This pattern is non-negotiable.** Every generated build must support per-request model override. + +## The router pattern (organize reasoners across files) + +When a build has more than ~4 reasoners, split them into router files. + +**Important detail from the SDK:** `AgentRouter(prefix="...")` **auto-namespaces** the reasoner IDs. A router with `prefix="clauses"` containing a reasoner `analyze_ip` registers as `clauses_analyze_ip`. Call it as `app.call(f"{app.node_id}.clauses_analyze_ip", ...)`. + +**Three prefix variations and what they do:** + +| Constructor call | Reasoner `analyze_ip` registers as | Use when | +|---|---|---| +| `AgentRouter(prefix="clauses")` | `clauses_analyze_ip` | You want grouped namespacing | +| `AgentRouter(prefix="")` (or omit `prefix`) | `analyze_ip` | You want raw function names — **the canonical default** | +| `@router.reasoner(name="explicit")` overrides any prefix | `explicit` | You want full control over the registered ID | + +**Canonical default:** use `AgentRouter(prefix="", tags=["domain"])` so reasoner IDs match function names and your `app.call(f"{app.node_id}.func_name", ...)` calls stay readable. Only use `prefix=` when you have ID collisions across routers. + +`reasoners/finance.py`: +```python +from agentfield import AgentRouter +from pydantic import BaseModel + +# prefix="" → no auto-namespace; tags merge with per-decorator tags +router = AgentRouter(prefix="", tags=["finance"]) + +class TrackReview(BaseModel): + axis: str + score: int + rationale: str + +@router.reasoner() +async def profitability_reviewer( + company_name: str, + business_summary: str, + model: str | None = None, +) -> TrackReview: # type-hinted return drives schema + return await router.ai( # router.ai proxies to the attached agent + system="You are a profitability reviewer.", + user=f"Company: {company_name}\n{business_summary}", + schema=TrackReview, + model=model, + ) +``` + +`main.py`: +```python +import os +from agentfield import Agent, AIConfig +from reasoners.finance import router as finance_router +from reasoners.risk import router as risk_router + +app = Agent( + node_id=os.getenv("AGENT_NODE_ID", "financial-reviewer"), + ai_config=AIConfig(model=os.getenv("AI_MODEL", "openrouter/google/gemini-2.5-flash")), + dev_mode=True, +) + +app.include_router(finance_router) +app.include_router(risk_router) + +# Entry reasoner stays in main.py +@app.reasoner(tags=["entry"]) +async def review_financials(...): ... + +if __name__ == "__main__": + app.run() # auto-detects CLI vs server +``` + +**Router facts (verified against `router.py`):** +- `AgentRouter` proxies *every* agent attribute via `__getattr__` — so `router.ai()`, `router.call()`, `router.memory`, `router.harness()` all work identically to `app.ai()` etc. +- Tags merge: `AgentRouter(tags=["finance"])` + `@router.reasoner(tags=["scoring"])` → reasoner has BOTH tags. +- `prefix` auto-namespaces IDs as `{prefix_segments}_{func_name}`. +- The canonical pattern is one router per domain file; one `Agent(...)` + multiple `include_router(...)` calls in `main.py`. + +**When to use a router vs. keep everything in main.py:** +- ≤ 4 reasoners → main.py only +- 5–10 reasoners → split by domain into 2–3 router files +- > 10 reasoners → consider whether you've decomposed correctly OR whether you need multiple nodes + +## Tags + +Tags are **free-form** metadata attached to reasoners (verified against the control plane source — there are no reserved tag names). They surface in the discovery API: + +```bash +curl -s http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.reasoners[] | select(.tags[]? == "entry")' +``` + +**Conventions used by AgentField examples (not enforced, just convention):** +- `"entry"` — mark the public-facing entry reasoner. Always tag it. +- A domain tag (e.g., `"finance"`, `"risk"`, `"intake"`) — for filtering in discovery and the UI. + +**Hard rule:** every entry reasoner gets `tags=["entry"]` so the user can find it via discovery without reading the source. + +## `Agent(...)` constructor — verified signature + +From `sdk/python/agentfield/agent.py:464`: + +```python +app = Agent( + node_id: str, # REQUIRED. e.g. "customer-triage" + agentfield_server: str | None = None, # control plane URL. env: AGENTFIELD_SERVER. default http://localhost:8080 + version: str = "1.0.0", + description: str | None = None, + tags: list[str] | None = None, # agent-LEVEL tags (distinct from per-reasoner tags) + author: dict[str, str] | None = None, + ai_config: AIConfig | None = None, # default AIConfig.from_env(). Pass AIConfig(model="...") to set default + harness_config: HarnessConfig | None = None, + memory_config: MemoryConfig | None = None, + dev_mode: bool = False, # verbose logs + DEBUG level. Always set True in scaffolds + callback_url: str | None = None, # else AGENT_CALLBACK_URL env, else auto-detect + auto_register: bool = True, + vc_enabled: bool | None = True, # generate verifiable credentials for executions + api_key: str | None = None, # X-API-Key header to control plane + # ... other auth/DID parameters +) +``` + +**Critical things scaffolds get wrong:** +- The parameter is **`agentfield_server`** (not `agentfield_url`, not `server_url`). Verified in `agent.py:464`. +- Read it from env: `agentfield_server=os.getenv("AGENTFIELD_SERVER", "http://localhost:8080")`. +- Set `dev_mode=True` in every scaffold so the user sees what's happening on first run. +- `Agent` subclasses FastAPI — you can use any FastAPI feature on it directly. + +### `AGENT_CALLBACK_URL` env var + +The agent node needs a URL the control plane can use to call back into it (for sync execution dispatch). In Docker Compose this is `http://:`. The SDK reads it from `AGENT_CALLBACK_URL`. You set it in the compose file: + +```yaml +environment: + AGENT_CALLBACK_URL: http://customer-triage:8001 +``` + +If you don't set it, the SDK auto-detects, which works locally but is unreliable inside containers. **Always set it explicitly in the compose file** to the in-network DNS name of the service. + +## `@app.reasoner()` real signature + +Based on `agent.py:1612`, the decorator only accepts these parameters: + +```python +@app.reasoner( + path: str | None = None, # default /reasoners/{func_name} + name: str | None = None, # override the registered ID + tags: list[str] | None = None, + *, + vc_enabled: bool | None = None, # inherits agent default + require_realtime_validation: bool = False, +) +``` + +**Important things it does NOT accept:** `input_schema=`, `output_schema=`, `description=`, `version=`. **Schemas are derived from type hints.** The function's parameter type hints become the input schema; the return type hint becomes the output schema. + +```python +class IntakeResult(BaseModel): + contract_type: str + confident: bool + +@app.reasoner(tags=["entry"]) +async def classify(text: str, model: str | None = None) -> IntakeResult: + return await app.ai(system="...", user=text, schema=IntakeResult, model=model) +``` + +## `app.run()` is the entry point + +`agent.py:4194` confirms `app.run()` auto-detects whether to launch in CLI mode (`af call`, `af list`, `af shell`) or server mode. **Always use `app.run()` in `__main__`**, not `app.serve()`: + +```python +if __name__ == "__main__": + app.run(host="0.0.0.0", port=int(os.getenv("PORT", "8001")), auto_port=False) +``` + +## Memory scopes (one paragraph) + +```python +await app.memory.set(key, value, scope="global"|"agent"|"session"|"run") +await app.memory.get(key, default=None, scope=...) +``` + +**global** = cross-everything; **agent** = this node, all sessions; **session** = one conversation; **run** = single workflow execution. Use `session` for chat-like workflows, `run` for per-execution scratch state, `agent` for cached embeddings, `global` for shared knowledge. + +## The `confident` flag pattern (mandatory for every `.ai()` gate) + +Every `.ai()` schema includes a `confident: bool` field, and the call site checks it. **Three valid fallback options exist** when `confident` is false — pick the right one for the situation: + +| Fallback option | When to use | Cost | +|---|---|---| +| **(a) Escalate to a deeper reasoner** | The system has another `@app.reasoner()` that can handle the harder case (chunked-loop, multi-call, more context) | Extra call | +| **(b) Deterministic safe default (RECOMMENDED for safety/regulated systems)** | The use case has a "safe" terminal state — `REFER_TO_HUMAN`, `REJECT`, `RETRY_LATER`, `NEEDS_HUMAN_REVIEW`. Return a Pydantic instance hard-coded to that safe state | Free | +| **(c) Escalate to `app.harness()`** | ONLY when `recommendation.harness_usable == true` from `af doctor`, AND the Dockerfile installs the CLI, AND there's a startup `shutil.which()` check | Heavy | + +**Default for regulated, safety-critical, or judgment-based systems: option (b).** A confident-wrong automated decision is almost always worse than a referral. Build `fallback_*` constructors in `helpers.py` that return Pydantic instances hard-coded to the safe-default state. + +### Pattern (a) — escalate to a deeper reasoner + +```python +class IntakeDecision(BaseModel): + contract_type: str + complexity: str + confident: bool + +result = await app.ai(system="...", user="...", schema=IntakeDecision, model=model) + +if not result.confident or result.complexity == "high": + # Escalate to a deeper reasoner that can navigate more context + result_dict = await app.call( + f"{app.node_id}.deep_intake", + document=full_document, + partial=result.model_dump(), + model=model, + ) + result = DeepIntakeResult(**result_dict) +``` + +### Pattern (b) — deterministic safe default + +```python +# In helpers.py: +def fallback_specialist_review(*, axis: str, reason: str) -> SpecialistReview: + """Safe default Pydantic instance returned when an .ai() gate isn't confident.""" + return SpecialistReview( + axis=axis, + verdict="NEEDS_HUMAN_REVIEW", + confidence_score=0.0, + confident=False, + rationale=reason, + decisive_fact_ids=[], + ) + +# In specialists.py: +review = await router.ai(system="...", user="...", schema=SpecialistReview, model=model) +if not review.confident: + return fallback_specialist_review( + axis=axis, + reason=f"{axis} reviewer was not confident enough to automate a terminal view.", + ) +return review +``` + +This is the dominant pattern in real builds. The orchestrator at the top of the pipeline uses **deterministic governance overrides** (plain Python `if` statements) to convert any non-confident specialist into a `REFER_TO_HUMAN` final decision. The intelligence stays in the LLM; the safety stays in the code. + +Every `.ai()` gate has a `confident` flag and one of these three fallback paths. No exceptions. + +## What about long-document navigation? + +The philosophy doc talks about "navigating documents" with a harness that has tools. In the actual SDK, you have three real options: + +**Option A — `app.ai(tools=[...])` with custom tool definitions.** Define tools (e.g., `read_section(section_id)`, `search_document(query)`) the LLM can call iteratively. The `app.ai()` call becomes multi-turn automatically. + +**Option B — Loop yourself in a `@app.reasoner()`.** Chunk the document with a `@app.skill()`, fan out `app.ai()` calls per chunk via `asyncio.gather`, then synthesize. + +**Option C — `app.harness(provider="claude-code", tools=["read", "grep"])`.** Spawn a real coding agent CLI to navigate the document on the filesystem. Most powerful, also the most expensive. + +Pick A for in-process tool-calling, B for embarrassingly-parallel chunked analysis, C for "I need a real agent to do file system work". + +## The cost-of-being-wrong test + +Before choosing `.ai()` without tools, ask: **"What does it cost the system if this call gets the wrong answer?"** + +- Cheap to be wrong (a routing hint that gets corrected) → plain `.ai()` with `confident` flag +- Expensive to be wrong (a verdict the system commits to) → `.ai(tools=[...])` for iterative reasoning, or decompose into multiple narrower `.ai()` calls with adversarial verification + +The financial cost of more reasoner calls is real but bounded. The reputation cost of a confidently-wrong answer propagating through your pipeline is unbounded. diff --git a/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/project-claude-template.md b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/project-claude-template.md new file mode 100644 index 000000000..e5187de2c --- /dev/null +++ b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/project-claude-template.md @@ -0,0 +1,118 @@ +# Project `CLAUDE.md` Template + +Every generated AgentField project ships with a `CLAUDE.md` at its root. This file is the contract that any *future* coding agent (including a fresh Claude Code session next week) must follow when extending the project. + +Without this file, the next agent will refactor the system back into a CrewAI-style chain. With it, the architecture survives. + +## Required structure + +Generate a `CLAUDE.md` with these exact sections, customized to the specific build. + +```markdown +# CLAUDE.md — + +## Mission + + + +External callers should hit `.` first. + +## Architecture at a glance + +- **Pattern(s):** +- **Topology:** one AgentField node (``) with N reasoners +- **Entry reasoner:** `` — orchestrates the full pipeline +- **Internal reasoners:** + - `` (`.ai()` / `.harness()`) — + - `` (`.ai()` / `.harness()`) — + - … +- **Inter-reasoner traffic:** all internal calls go through `app.call(".X", ...)`. Never direct HTTP. + +## Why this architecture (not a chain) + +<2–3 sentences explaining what makes this composite intelligence rather than a linear chain. Cite the dynamic-routing decisions, the parallelism, the harness/ai split. This is the "do not undo this" justification for the next agent.> + +## Primitive selection rules (binding) + +- `.ai()` is used ONLY at gates and routers (currently: ``). Every `.ai()` here has a `confident` field and a `.harness()` fallback. +- `.harness()` is used for ``. Each has hard caps on iterations and cost. +- `@app.skill()` is used for deterministic transforms (``). +- New reasoners default to `.harness()`. To use `.ai()`, prove the input fits in <2k tokens AND output fits in 4 flat fields AND there's a fallback. + +## Data-flow rules + +- Structured JSON between code and reasoners (when code branches on the result). +- Natural-language strings between reasoners that feed each other context. +- Hybrid only when both consumers exist. Do not use hybrid by default. + +## Model selection + +- Default model: `` via `AI_MODEL` env. +- The entry reasoner accepts an OPTIONAL `model` parameter in the request body. When present, it propagates to all child reasoners via `app.call(..., model=model)`. This lets users A/B models per request without redeploying. +- Provider keys: `OPENROUTER_API_KEY` (default), `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` — any LiteLLM-compatible model works. + +## Runtime contract + +- Local runtime is `docker-compose.yml` in this directory. +- One container: `agentfield/control-plane:latest` (local mode, SQLite/BoltDB). +- One container: this Python agent node, built from `Dockerfile`. +- The agent node depends on the control plane being healthy before it boots. +- Default ports: control plane `8080`, agent node `8001`. Override via env if needed. + +## Delivery contract — every change must preserve + +- ✅ A runnable `docker compose up --build` (validated with `docker compose config`) +- ✅ A valid `.env.example` listing all required keys +- ✅ A `README.md` with the exact verification ladder (health → nodes → capabilities → execute) +- ✅ The canonical curl smoke test in the README — body shape `{"input": {...kwargs...}}`, returns a real reasoned answer not a stub +- ✅ This `CLAUDE.md` + +## Validation commands (run after every change) + +```bash +python3 -m py_compile main.py +docker compose config > /dev/null +docker compose up --build -d +sleep 8 +curl -fsS http://localhost:8080/api/v1/health +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[].node_id' +curl -fsS http://localhost:8080/api/v1/discovery/capabilities | jq '.reasoners | map(select(.node_id=="")) | map(.name)' +# the canonical curl from README.md +docker compose down +``` + +If any of those fail, the change is not done. + +## Anti-patterns (reject these) + +- ❌ Direct HTTP between reasoners. All internal traffic uses `app.call`. +- ❌ Replacing a `.harness()` with `.ai()` "for speed" without proving the input fits. +- ❌ Adding a new reasoner without registering it through the entry reasoner OR through a router that's included in `main.py`. +- ❌ Removing the smoke test from README "because it's obvious." +- ❌ Hardcoding `node_id` in `app.call`. Always use `f"{app.node_id}.X"` so renaming the node doesn't break the system. +- ❌ Hardcoding the model. Always read from env (`AI_MODEL`) and accept a per-request override. +- ❌ Replacing the dynamic routing in `` with a static `for` loop. +- ❌ Unbounded loops or recursive harness spawns without explicit caps. +- ❌ Removing the `confident` field from a `.ai()` schema without replacing the validation check. + +## Extension points (where to safely add work) + +<3–5 bullets specific to the architecture. Examples:> +- Add a new analysis dimension: create a new `@app.reasoner()` that takes the same inputs as the existing dimension reviewers, and add it to the dispatch list in ``. +- Switch from `.ai()` intake to `.harness()` intake when inputs grow past 2 pages: replace `intake_router` with `intake_navigator` per `references/primitives.md` in the skill. +- Add provenance: have each dimension reviewer return citation keys, then add a `provenance_collector` that aggregates them into the final response. + +## Owner + +This system was scaffolded by the `agentfield-multi-reasoner-builder` skill. To rebuild, run that skill again with the same use case description. To extend, follow this CLAUDE.md. +``` + +## Generation rules + +When you write the actual `CLAUDE.md` for a build: + +1. **Fill in every ``.** Do not ship a CLAUDE.md with `` still in it. +2. **List every reasoner you actually generated** with its primitive (`.ai()` or `.harness()`) and one-line role. +3. **Justify the architecture** in 2–3 sentences. The "Why this architecture" section is the most important part — it tells the next agent what NOT to undo. +4. **Customize the extension points** to the specific build. Don't copy the generic examples. +5. **Match the validation commands to the actual reasoners and node ID.** No `` placeholders in the final file. diff --git a/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/scaffold-recipe.md b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/scaffold-recipe.md new file mode 100644 index 000000000..dd3d4822e --- /dev/null +++ b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/scaffold-recipe.md @@ -0,0 +1,502 @@ +# Scaffold Recipe — Exact Files to Generate + +This is the file-by-file generation contract. Every AgentField multi-reasoner build produces ALL of these files. No omissions, no "I'll add that later." + +## Where it goes + +``` +examples/python_agent_nodes// +├── main.py +├── reasoners.py # if the system has > 4 reasoners +├── Dockerfile +├── docker-compose.yml +├── .env.example +├── .dockerignore +├── requirements.txt +├── README.md +└── CLAUDE.md +``` + +`` is lowercase-hyphenated, derived from the use case (e.g., `financial-reviewer`, `clinical-triage`, `sec-filing-auditor`). + +## Step 0: Use `af init` if it speeds you up, then layer on top + +```bash +cd /Users/santoshkumarradha/Documents/agentfield/code/platform/agentfield +go run ./control-plane/cmd/af init --language python --defaults --non-interactive +``` + +This produces `main.py`, `reasoners.py`, `requirements.txt`, `README.md`, `.gitignore`. You then **rewrite `main.py` and `reasoners.py`** with your real architecture and **add** the Docker / compose / CLAUDE.md / .env files. + +If `af init` gets in the way, just generate the files directly. The output matters, not the path. + +## File 1: `main.py` + +```python +""". + +Entry reasoner: `.` +Architecture: +""" +import asyncio +import os +from typing import Any + +from agentfield import Agent, AIConfig +from pydantic import BaseModel, Field + + +# ---- Schemas (type-hinted; AgentField derives them automatically) ---- + +class IntakePlan(BaseModel): + focus_areas: list[str] + confident: bool # MANDATORY on every .ai gate + +class TrackReview(BaseModel): + axis: str + score: int = Field(ge=1, le=10) + rationale: str + +class FinalVerdict(BaseModel): + overall: str + strengths: list[str] + risks: list[str] + + +# ---- Agent ---- + +app = Agent( + node_id=os.getenv("AGENT_NODE_ID", ""), + agentfield_server=os.getenv("AGENTFIELD_SERVER", "http://localhost:8080"), + ai_config=AIConfig( + model=os.getenv("AI_MODEL", "openrouter/google/gemini-2.5-flash"), + ), + dev_mode=True, +) + + +# ---- Internal reasoners ---- + +@app.reasoner() +async def intake_router( + payload: dict, + model: str | None = None, # propagate model +) -> IntakePlan: + plan = await app.ai( + system="You classify the input and pick the smallest set of analysis tracks needed.", + user=str(payload), + schema=IntakePlan, + model=model, + ) + if not plan.confident or not plan.focus_areas: + # FALLBACK: escalate (could be a chunked-loop reasoner or a deeper pass) + plan.focus_areas = ["default_a", "default_b"] + return plan + + +@app.reasoner() +async def dimension_reviewer( + payload: dict, + axis: str, + model: str | None = None, +) -> TrackReview: + return await app.ai( + system=f"You are a {axis} reviewer. Score and rationalize.", + user=f"Axis: {axis}\nPayload: {payload}", + schema=TrackReview, + model=model, + ) + + +# ---- Entry reasoner (the public surface) ---- + +@app.reasoner(tags=["entry"]) +async def review( + payload: dict, + model: str | None = None, # per-request model override +) -> dict: + plan_dict = await app.call( + f"{app.node_id}.intake_router", + payload=payload, + model=model, + ) + plan = IntakePlan(**plan_dict) + + # Parallel fan-out across selected dimensions + review_dicts = await asyncio.gather(*[ + app.call( + f"{app.node_id}.dimension_reviewer", + payload=payload, + axis=axis, + model=model, + ) + for axis in plan.focus_areas + ]) + + # Synthesize via another LLM reasoner — pass prose, not JSON + review_prose = "\n".join( + f"- [{r['axis']}] score={r['score']} — {r['rationale']}" + for r in review_dicts + ) + verdict = await app.ai( + system="You are the lead reviewer. Synthesize the dimension findings into a verdict.", + user=review_prose, + schema=FinalVerdict, + model=model, + ) + + return { + "plan": plan.model_dump(), + "reviews": review_dicts, + "verdict": verdict.model_dump(), + } + + +if __name__ == "__main__": + # app.run() auto-detects CLI vs server mode (verified at sdk/python/agentfield/agent.py:4194) + app.run(host="0.0.0.0", port=int(os.getenv("PORT", "8001")), auto_port=False) +``` + +**Hard requirements:** +- `node_id`, `agentfield_server`, `model` all read from env with sensible defaults +- `auto_port=False` so the port is deterministic and the curl works +- Exactly ONE entry reasoner with `tags=["entry"]` for discovery +- Schemas are derived from **type hints** — do NOT pass `input_schema=` or `output_schema=` to `@app.reasoner` (those parameters do not exist) +- Every `.ai()` gate has a `confident: bool` field in its schema and a fallback path +- Every reasoner that calls `.ai()` accepts an optional `model: str | None = None` parameter and threads it through `app.ai(model=model)` +- The entry reasoner accepts `model` and propagates it via `app.call(..., model=model)` to all children +- All inter-reasoner calls use `app.call(f"{app.node_id}.X", ...)` — never hardcoded node IDs +- Never `requests.post()` to another reasoner. Use `app.call` +- Use `app.run()` in `__main__`, not `app.serve()` + +## File 2: the `reasoners/` package (canonical layout for non-trivial systems) + +When the system has more than 4 reasoners, **use this canonical 4-file router package layout**. It separates concerns cleanly and makes the build extensible without breaking the orchestrator: + +``` +/ +├── main.py # Agent + entry reasoner + orchestration +└── reasoners/ + ├── __init__.py # Re-exports the routers so main.py can include them + ├── models.py # Pydantic schemas — every BaseModel used by every reasoner + ├── helpers.py # Plain Python utilities: math, prose renderers, fact registry, fallbacks + ├── specialists.py # AgentRouter for the parallel "hunter" / specialist reasoners + └── committee.py # AgentRouter for the orchestration-layer reasoners (intake router, adversarial reviewer, synthesizer) +``` + +**`reasoners/__init__.py`:** +```python +from .committee import router as committee_router +from .specialists import router as specialists_router + +__all__ = ["committee_router", "specialists_router"] +``` + +**`reasoners/models.py`** — every Pydantic schema in one place. Includes the input application schema, the per-specialist review schema (with `confident: bool` mandatory), the routing plan schema, the adversarial review schema, the final decision schema, and any deterministic-metric schemas. Keeping these in one file makes type-checking trivial and prevents circular imports between routers. + +**`reasoners/helpers.py`** — plain Python (NOT decorated with `@app.skill`) for: deterministic math (DTI, payment amount, employment-gap calc), `render_specialist_review()` and similar **prose renderers** that convert Pydantic instances to natural-language strings before passing them to another LLM, the fact-registry builder for citation IDs, and **fallback constructors** like `fallback_specialist_review(axis, reason)` that produce safe-default Pydantic instances when an `.ai()` call returns `confident=False`. + +> **Why plain helpers vs `@app.skill()`?** `@app.skill()` makes a function discoverable and callable through `app.call`. Use it when the deterministic function is something the system might call from a reasoner OR something an external caller might want to invoke directly through the control plane. For purely internal helpers used inside reasoner bodies (math, prose rendering, schema construction), plain Python is cleaner — no decorator overhead, no registration ceremony. Promote a helper to `@app.skill()` only when you actually want to call it via `app.call`. + +**`reasoners/specialists.py`** — one `AgentRouter(prefix="", tags=["specialist"])`, one `@router.reasoner` per analysis dimension. Often these specialists share a `_run_specialist_review()` private helper that takes a system prompt + focus prompt as parameters, runs `router.ai(...)`, and applies the `confident=False` fallback. This keeps each specialist body to ~5 lines of configuration. + +**`reasoners/committee.py`** — one `AgentRouter(prefix="", tags=["committee"])` with the orchestration-layer reasoners: `intake_router` (decides which specialists to run), `adversarial_challenger` (the HUNT→PROVE counterpart), `committee_reconciler` (synthesizes specialists + adversarials → final decision). + +**`main.py`** does three things: +1. Construct `Agent(...)` with `node_id`, `agentfield_server`, `ai_config` +2. `app.include_router(committee_router)` and `app.include_router(specialists_router)` +3. Define the public **entry reasoner** with `tags=["entry"]` that orchestrates the full pipeline using `app.call(f"{app.node_id}.X", ...)` and `asyncio.gather` for parallel fan-out, plus deterministic governance overrides at the end + +**This is the layout that emerges naturally** when you decompose a real composite-intelligence system. If your build has fewer than 4 reasoners, keep everything in `main.py` and skip the package. If it has more, use this layout. Do not invent a different layout. + +### Smaller systems (≤4 reasoners): keep everything in `main.py` + +For trivial builds, skip the package and inline everything. Use `@app.reasoner()` directly on `app`. Don't create a router with one reasoner in it. + +## File 3: `Dockerfile` + +**Use `af init --docker` to generate this. The command produces the universal shape below — do not customize.** + +```dockerfile +FROM python:3.11-slim + +ENV PYTHONDONTWRITEBYTECODE=1 \ + PYTHONUNBUFFERED=1 + +WORKDIR /app + +COPY requirements.txt /app/requirements.txt +RUN pip install --no-cache-dir --upgrade pip && \ + pip install --no-cache-dir -r /app/requirements.txt + +COPY . /app/ + +EXPOSE 8001 + +CMD ["python", "main.py"] +``` + +**Key properties of this Dockerfile (verified against `af init --docker`):** +- **Universal — no repo coupling.** The build context is the project directory itself (`docker-compose.yml` uses `context: .`), so the same scaffold works whether the project lives inside the agentfield repo at `examples/python_agent_nodes//` or completely standalone at `/tmp/my-build/`. +- The SDK is installed via `pip install -r requirements.txt`, where `requirements.txt` lists `agentfield`. **Do not** add `COPY sdk/python /tmp/python-sdk` — that's the old repo-coupled pattern, and it breaks for out-of-repo builds. +- `requirements.txt` must contain at least `agentfield` (one line). Add `pydantic>=2,<3` and any libraries the reasoners actually need. + +## File 4: `docker-compose.yml` + +**Use `af init --docker` to generate this. The command produces the universal shape below — do not customize unless you have a specific reason.** + +```yaml +services: + control-plane: + image: agentfield/control-plane:latest + environment: + AGENTFIELD_STORAGE_MODE: local + AGENTFIELD_HTTP_ADDR: 0.0.0.0:8080 + ports: + - "${AGENTFIELD_HTTP_PORT:-8080}:8080" + volumes: + - agentfield-data:/data + healthcheck: + test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:8080/api/v1/health"] + interval: 3s + timeout: 2s + retries: 20 + + : + build: + context: . + dockerfile: Dockerfile + environment: + AGENTFIELD_SERVER: http://control-plane:8080 + AGENT_CALLBACK_URL: http://:8001 + AGENT_NODE_ID: ${AGENT_NODE_ID:-} + OPENROUTER_API_KEY: ${OPENROUTER_API_KEY:-} + OPENAI_API_KEY: ${OPENAI_API_KEY:-} + ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:-} + GOOGLE_API_KEY: ${GOOGLE_API_KEY:-} + AI_MODEL: ${AI_MODEL:-openrouter/google/gemini-2.5-flash} + PORT: ${PORT:-8001} + ports: + - "${AGENT_NODE_PORT:-8001}:8001" + depends_on: + control-plane: + condition: service_healthy + restart: on-failure + +volumes: + agentfield-data: +``` + +**Build context is `.` (the project directory itself), not `../../..`.** This makes the scaffold portable to any location on disk. All four provider env vars are exposed with `:-` defaults so missing keys don't crash compose validation. + +**Hard requirements:** +- Control plane has a healthcheck so the agent only starts after the control plane is ready +- Agent uses `depends_on: condition: service_healthy` (not just `depends_on: [control-plane]`) +- All three common provider env vars (`OPENROUTER_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`) are exposed so the user can swap providers without editing compose +- Default model is OpenRouter Claude 3.5 Sonnet (most reliable for reasoning) but trivially overridable via `AI_MODEL` +- Port 8080 = control plane, port 8001 = agent node, never co-located + +## File 5: `.env.example` + +```bash +# Required: pick ONE provider +OPENROUTER_API_KEY=sk-or-v1-... +# OPENAI_API_KEY=sk-... +# ANTHROPIC_API_KEY=sk-ant-... + +# Model — must match the provider above +AI_MODEL=openrouter/google/gemini-2.5-flash +# AI_MODEL=gpt-4o +# AI_MODEL=anthropic/claude-3-5-sonnet-20241022 + +# Optional overrides +AGENT_NODE_ID= +AGENT_NODE_PORT=8001 +AGENTFIELD_HTTP_PORT=8080 +``` + +## File 6: `requirements.txt` + +``` +# The agentfield SDK is installed from the local repo by the Dockerfile. +# Keep this file to additional runtime deps the reasoners need. +pydantic>=2.0 +``` + +Add libraries the reasoners actually use (httpx, beautifulsoup4, pdfplumber, etc.). Don't list `agentfield` here — it comes from the local SDK copy. + +## File 7: `.dockerignore` + +``` +__pycache__ +*.pyc +.pytest_cache +.env +.venv +*.log +``` + +## File 8: `README.md` + +```markdown +# + +<2-sentence description.> + +## Architecture + +- **Entry reasoner:** `.` +- **Pattern(s):** +- **Reasoners:** + - `intake_router` — `.ai()` gate that classifies inputs and selects active dimensions + - `_reviewer` — analyzer for dimension A (parallel) + - `_reviewer` — analyzer for dimension B (parallel) + - `synthesizer` — combines dimension findings into a final verdict + +## Run + +```bash +cp .env.example .env +# edit .env and set OPENROUTER_API_KEY (or your provider of choice) +docker compose up --build +``` + +Wait until you see `agent registered` in the logs. + +## Verify (run in another terminal) + +```bash +# 1. Control plane is up +curl -fsS http://localhost:8080/api/v1/health | jq + +# 2. Agent node has registered +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[] | {id: .node_id, status: .status}' + +# 3. All reasoners are discoverable (look for tags=["entry"]) +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.reasoners[] | select(.node_id=="") | {name, tags}' +``` + +## Run a real reasoned answer + +**Important:** the control plane wraps reasoner kwargs in an `input` field. Body shape is `{"input": {...kwargs...}}` — verified against `control-plane/internal/handlers/execute.go`. + +```bash +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{ + "input": { + "": "", + "": , + "model": "openrouter/google/gemini-2.5-flash" + } + }' | jq +``` + +The optional `"model"` field overrides the AIConfig default for THIS request. Try different models: + +```bash +# Same request, different model +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"": "...", "model": "openrouter/openai/gpt-4o"}}' | jq +``` + +## Showpiece — see the cryptographic workflow trail + +```bash +LAST_EXEC=$(curl -s http://localhost:8080/api/v1/executions | jq -r '.[0].workflow_id') +curl -s http://localhost:8080/api/v1/did/workflow/$LAST_EXEC/vc-chain | jq +``` + +This is the verifiable credential chain — every reasoner that ran, with cryptographic provenance. No other agent framework gives you this. + +## Stop + +```bash +docker compose down +docker compose down --volumes # also clears local control-plane state +``` +``` + +## File 9: `CLAUDE.md` + +See `references/project-claude-template.md` for the template. Generate it specific to this build. + +## Generation order (do these in this order) + +1. Decide the architecture (pattern + reasoner roles + which are `.ai()` vs `.harness()`) +2. Create the directory `examples/python_agent_nodes//` +3. Write `main.py` with real reasoners (NOT a placeholder) +4. Write `requirements.txt`, `Dockerfile`, `.dockerignore` +5. Write `docker-compose.yml` +6. Write `.env.example` +7. Write `CLAUDE.md` (use the template from `references/project-claude-template.md`) +8. Write `README.md` with the actual curl payload for THIS use case +9. Validate (see next section) + +## Validation (every build) + +### Online validation (when Docker can pull images and you have a key) + +```bash +# 1. Python syntax — must pass +python3 -m py_compile examples/python_agent_nodes//main.py +# Plus any reasoner files if you split with routers: +python3 -m py_compile examples/python_agent_nodes//reasoners/*.py + +# 2. Compose file is valid +cd examples/python_agent_nodes/ +OPENROUTER_API_KEY=sk-or-v1-FAKE docker compose config > /dev/null + +# 3. Start the stack and run the smoke test +docker compose up --build -d +sleep 10 && curl -fs http://localhost:8080/api/v1/health +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"...": "..."}}' +docker compose logs --tail=50 +docker compose down +``` + +### Offline validation (sandbox / CI / no docker pull) + +If the environment cannot pull `agentfield/control-plane:latest` or doesn't have a real provider key, you **still validate**. These are the static checks that count as "validated": + +```bash +# Syntax check +python3 -m py_compile examples/python_agent_nodes//main.py +python3 -m py_compile examples/python_agent_nodes//reasoners/*.py 2>/dev/null || true + +# Compose syntax check (no image pull required) +cd examples/python_agent_nodes/ +OPENROUTER_API_KEY=sk-or-v1-FAKE docker compose config > /dev/null +``` + +Then **run this visual-invariant checklist** against the generated files. Every box must be checked: + +- [ ] `app.run(...)` in `__main__` (NOT `app.serve(...)`) +- [ ] Entry reasoner has `tags=["entry"]` +- [ ] Every `app.ai(...)` call's schema includes a `confident: bool` field if used as a gate, AND the call site has a fallback path +- [ ] Every reasoner that calls `app.ai(...)` accepts `model: str | None = None` and threads `model=model` +- [ ] Entry reasoner accepts `model` and propagates via `app.call(..., model=model)` to every child +- [ ] All `app.call(...)` use `f"{app.node_id}.X"` — no hardcoded node IDs +- [ ] No `requests.post()` / `httpx.post()` between reasoners (use `app.call`) +- [ ] No `app.harness(provider="...")` unless the Dockerfile installs the CLI AND main.py has a startup `shutil.which()` check +- [ ] No `input_schema=` / `output_schema=` parameters on `@app.reasoner()` +- [ ] README curl uses body shape `{"input": {...kwargs...}}` (NOT raw kwargs at top level) +- [ ] `Agent(agentfield_server=os.getenv("AGENTFIELD_SERVER", ...))` — exact parameter name +- [ ] `AGENT_CALLBACK_URL` set in compose to the in-network DNS name (`http://:8001`) +- [ ] Control plane has a healthcheck and the agent service uses `condition: service_healthy` +- [ ] `auto_port=False` in `app.run()` so the port is deterministic +- [ ] CLAUDE.md exists with no `` tokens left in it +- [ ] `.env.example` lists `OPENROUTER_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY` +- [ ] If reasoners are split across files: routers use `prefix=""` (or document the namespacing in the curl path) +- [ ] LLM-to-LLM context is passed as natural-language strings, not raw JSON dicts +- [ ] Returning `dict` from an orchestrator reasoner is fine — Pydantic model returns are also fine — both work because schemas come from type hints + +If any box fails, **fix before handing off**. A "scaffold that almost works" is worth zero. + +### Return-type note + +Orchestrator reasoners that return heterogeneous results (e.g. `{"plan": ..., "reviews": [...], "verdict": ...}`) should declare `-> dict` as the return type. Single-purpose reasoners that produce one validated result should declare `-> SomePydanticModel`. Both work — schemas are derived from the type hint either way. diff --git a/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/verification.md b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/verification.md new file mode 100644 index 000000000..e5b11287a --- /dev/null +++ b/control-plane/internal/skillkit/skill_data/agentfield-multi-reasoner-builder/references/verification.md @@ -0,0 +1,120 @@ +# Verification — Prove the Build Is Real + +A scaffold that "looks right" but isn't actually wired up is worse than no scaffold. The control plane exposes a discovery API that lets you prove the system works in seconds. Use it. + +## The verification ladder (run all four, in order) + +```bash +# 1. Control plane health +curl -fsS http://localhost:8080/api/v1/health | jq + +# 2. Agent node has registered itself with the control plane +curl -fsS http://localhost:8080/api/v1/nodes | jq '.[] | {id: .node_id, status, last_seen}' + +# 3. Every reasoner you defined is discoverable +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq --arg slug "" '.reasoners[] | select(.node_id==$slug) | {name, tags, description}' + +# 4. The entry reasoner produces a real reasoned answer +# NOTE: control plane wraps kwargs in {"input": {...}} (verified at execute.go:1000) +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{ + "input": { + "": "", + "": , + "model": "openrouter/google/gemini-2.5-flash" + } + }' | jq +``` + +If any step fails, **do not hand off**. Diagnose and fix. + +## Common failures and fast diagnosis + +| Symptom | Likely cause | Fix | +|---|---|---| +| `/api/v1/health` hangs or refuses connection | Control plane container is still booting | Wait 5–10s, retry. If still failing, `docker compose logs control-plane` | +| `/api/v1/nodes` returns `[]` | Agent node hasn't registered. Network issue or agent crashed at boot | `docker compose logs ` — look for `OPENROUTER_API_KEY` missing, import errors, or `agent registered` | +| Node listed but no reasoners in `/discovery/capabilities` | The Python file imported, but the `@app.reasoner()` decorators didn't run (e.g., reasoners are in a router that wasn't included) | Verify `app.include_router(...)` is called in `main.py` before `app.run()` | +| Reasoners present but execute hangs | Reasoner is making an LLM call that's failing silently | `docker compose logs --follow` while running curl. Look for litellm errors | +| Execute returns 500 with "model not found" | `AI_MODEL` env var doesn't match the provider key you set | Check `.env` — `OPENROUTER_API_KEY` requires `openrouter/...` model names, etc. | +| Execute returns 200 but the output is empty/garbage | The reasoner ran but the architecture is wrong (e.g., `.ai()` got truncated input) | Look at logs to see what input each reasoner actually got | + +## Sync execute timeout (90s) — IMPORTANT + +`POST /api/v1/execute/` is a **synchronous** endpoint with a hard **90-second timeout** at the control plane. If the entry reasoner's full pipeline (including all child `app.call`s, all `app.ai` calls, and any retries) takes longer than 90s, the control plane returns `HTTP 400 {"error":"execution timeout after 1m30s"}`. + +**Implications for the architecture you generate:** +- **Pick fast models for the default.** `openrouter/google/gemini-2.5-flash` and `openrouter/openai/gpt-4o-mini` finish a 6–10 step parallel pipeline in 10–25 seconds. Slower models like `openrouter/anthropic/claude-3-5-sonnet-*`, `openrouter/minimax/minimax-m2.7`, or `openrouter/openai/o1` often blow the budget. +- **Parallelize aggressively at multiple depths.** A pipeline of 10 sequential `app.ai` calls at 5s each = 50s (close to the limit). The same 10 calls organized as a deep DAG with 3 parallelism waves = 15s. Use `asyncio.gather` for every fan-out, and push fan-outs DOWN into sub-reasoners (see `architecture-patterns.md` "Reasoner Composition Cascade"), not just at the entry orchestrator. +- **For workflows that genuinely need >90s** (large fan-outs, slow models, navigation-heavy harnesses): use `POST /api/v1/execute/async/` instead. It returns immediately with an `execution_id`; poll `GET /api/v1/executions/` for the result. Document this in the README so users know which endpoint to hit. + +When the user's brief implies a slow pipeline, default to `gemini-2.5-flash` and document the async endpoint as the upgrade path. + +## Useful introspection endpoints + +| Endpoint | What it tells you | +|---|---| +| `GET /api/v1/health` | Control plane up | +| `GET /api/v1/nodes?health_status=any` | Which agent nodes have registered (the default filter is `active`, which can return empty even when agents are healthy — use `?health_status=any` to be safe) | +| `GET /api/v1/nodes/:node_id` | Details of one node | +| `GET /api/v1/discovery/capabilities` | All reasoners and skills. **Response shape:** `{capabilities: [{agent_id, reasoners: [{id, tags, ...}]}]}` — note `agent_id` not `node_id`, and reasoners live under `.capabilities[].reasoners[]` not `.reasoners[]`. The reasoner identifier field is `id` not `name` | +| `GET /api/v1/agentic/discover?q=` | Search the API catalog by keyword | +| `POST /api/v1/execute/:target` | **Sync** execute. Body is `{"input": {...kwargs...}}`. **90-second hard timeout at the control plane.** | +| `POST /api/v1/execute/async/:target` | Async execute, returns an `execution_id` immediately. Use this when the pipeline > 90s | +| `GET /api/v1/executions/:id` | Status of an async execution | +| `GET /api/v1/did/workflow/:workflow_id/vc-chain` | Verifiable credential chain for an executed workflow (the AgentField superpower no other framework has) | + +## Inspect the live workflow DAG + +After running an execution, hit: + +```bash +# Get the most recent executions +curl -s http://localhost:8080/api/v1/executions | jq '.[0:3]' + +# Get the VC chain for one — this shows you the full reasoning DAG with cryptographic provenance +curl -s http://localhost:8080/api/v1/did/workflow//vc-chain | jq +``` + +This is the **single best demo** of why AgentField beats CrewAI: you get a cryptographic, replayable, introspectable record of every reasoner that ran, what it called, and what came back. Show the user this output in the handoff — it makes the "this is composite intelligence as production infrastructure" case for itself. + +## The smoke-test contract (every build) + +In the README, give the user EXACTLY these commands in this order. Do not abbreviate. Do not say "and so on." + +```bash +# After docker compose up, in another terminal: + +# 1. Health +curl -fsS http://localhost:8080/api/v1/health | jq '.status' + +# 2. Node registered? (use ?health_status=any — default filter can hide healthy nodes) +curl -fsS 'http://localhost:8080/api/v1/nodes?health_status=any' | jq '.nodes[] | {id: .node_id, status: .status}' + +# 3. Reasoners discoverable? (note .capabilities[].reasoners[].id, NOT .reasoners[].name) +curl -fsS http://localhost:8080/api/v1/discovery/capabilities \ + | jq '.capabilities[] | select(.agent_id=="") | .reasoners | map({id, tags})' + +# 4. THE BIG ONE — run the entry reasoner with real data +# Body shape: {"input": {...kwargs...}} — kwargs are NEVER raw at the top level +curl -X POST http://localhost:8080/api/v1/execute/. \ + -H 'Content-Type: application/json' \ + -d '{"input": {"": "", "model": "openrouter/google/gemini-2.5-flash"}}' | jq + +# 5. (Optional showpiece) the full verifiable workflow chain +LAST_EXEC=$(curl -s http://localhost:8080/api/v1/executions | jq -r '.[0].workflow_id') +curl -s http://localhost:8080/api/v1/did/workflow/$LAST_EXEC/vc-chain | jq +``` + +## When you cannot run docker locally + +If the environment running the skill doesn't have Docker, you can still: + +1. `python3 -m py_compile main.py` — catches syntax errors +2. `docker compose config` — catches compose errors +3. Read the generated files back with `cat` to spot obvious issues +4. Provide the verification commands in the README as a checklist for the user to run themselves + +You **must** still validate the Python and the compose file syntactically. "I generated it but didn't check" is a failure mode. diff --git a/control-plane/internal/skillkit/state.go b/control-plane/internal/skillkit/state.go new file mode 100644 index 000000000..6f468c650 --- /dev/null +++ b/control-plane/internal/skillkit/state.go @@ -0,0 +1,125 @@ +package skillkit + +import ( + "encoding/json" + "fmt" + "os" + "path/filepath" + "sort" + "time" +) + +// State is the on-disk record of which skills are installed, at which version, +// and which target integrations are active. Persisted at +// ~/.agentfield/skills/.state.json. +type State struct { + Version string `json:"state_version"` + Skills map[string]InstalledSkill `json:"skills"` +} + +// InstalledSkill records the installed-version state of a single skill. +type InstalledSkill struct { + CurrentVersion string `json:"current_version"` + InstalledAt time.Time `json:"installed_at"` + AvailableVersions []string `json:"available_versions"` + Targets map[string]InstalledTarget `json:"targets"` +} + +// InstalledTarget records one target integration for one skill. +type InstalledTarget struct { + TargetName string `json:"target_name"` // "claude-code", "codex", ... + Method string `json:"method"` // "symlink", "marker-block", "manual" + Path string `json:"path"` // file or directory the integration writes to + Version string `json:"version"` // version installed at this target + InstalledAt time.Time `json:"installed_at"` +} + +const stateFileVersion = "1" + +// CanonicalRoot returns ~/.agentfield/skills/. Honors $AGENTFIELD_HOME if set +// (useful for tests and for users who want a non-default location). +func CanonicalRoot() (string, error) { + if root := os.Getenv("AGENTFIELD_HOME"); root != "" { + return filepath.Join(root, "skills"), nil + } + home, err := os.UserHomeDir() + if err != nil { + return "", fmt.Errorf("resolve home directory: %w", err) + } + return filepath.Join(home, ".agentfield", "skills"), nil +} + +func stateFilePath() (string, error) { + root, err := CanonicalRoot() + if err != nil { + return "", err + } + return filepath.Join(root, ".state.json"), nil +} + +// LoadState reads the state file from disk. If the file does not exist yet, +// returns an empty State so first-install flows just write fresh. +func LoadState() (*State, error) { + path, err := stateFilePath() + if err != nil { + return nil, err + } + data, err := os.ReadFile(path) + if errIsNotExist(err) { + return &State{Version: stateFileVersion, Skills: map[string]InstalledSkill{}}, nil + } + if err != nil { + return nil, fmt.Errorf("read state file %s: %w", path, err) + } + var s State + if err := json.Unmarshal(data, &s); err != nil { + return nil, fmt.Errorf("parse state file %s: %w", path, err) + } + if s.Skills == nil { + s.Skills = map[string]InstalledSkill{} + } + if s.Version == "" { + s.Version = stateFileVersion + } + return &s, nil +} + +// SaveState writes the state file atomically (write-temp + rename). +func SaveState(s *State) error { + path, err := stateFilePath() + if err != nil { + return err + } + if err := os.MkdirAll(filepath.Dir(path), 0o755); err != nil { + return fmt.Errorf("create state dir: %w", err) + } + s.Version = stateFileVersion + data, err := json.MarshalIndent(s, "", " ") + if err != nil { + return fmt.Errorf("marshal state: %w", err) + } + tmp := path + ".tmp" + if err := os.WriteFile(tmp, data, 0o644); err != nil { + return fmt.Errorf("write state tmp: %w", err) + } + if err := os.Rename(tmp, path); err != nil { + return fmt.Errorf("rename state file: %w", err) + } + return nil +} + +// SortedTargetNames returns the keys of the Targets map in stable order so +// `af skill list` output is deterministic. +func (i InstalledSkill) SortedTargetNames() []string { + names := make([]string, 0, len(i.Targets)) + for name := range i.Targets { + names = append(names, name) + } + sort.Strings(names) + return names +} + +// errIsNotExist reports whether err indicates a missing file. +func errIsNotExist(err error) bool { + return err != nil && os.IsNotExist(err) +} diff --git a/control-plane/internal/skillkit/target_aider.go b/control-plane/internal/skillkit/target_aider.go new file mode 100644 index 000000000..193c743ba --- /dev/null +++ b/control-plane/internal/skillkit/target_aider.go @@ -0,0 +1,101 @@ +package skillkit + +import ( + "errors" + "fmt" + "os" + "path/filepath" + "strings" +) + +// aiderTarget installs into Aider by appending a marker block to +// ~/.aider.conventions.md AND ensuring ~/.aider.conf.yml has a "read:" line +// that loads the conventions file. +type aiderTarget struct{} + +func init() { RegisterTarget(aiderTarget{}) } + +func (aiderTarget) Name() string { return "aider" } +func (aiderTarget) DisplayName() string { return "Aider" } +func (aiderTarget) Method() string { return "marker-block" } + +func (aiderTarget) Detected() bool { + return commandAvailable("aider") || + fileExists(filepath.Join(homeDir(), ".aider.conventions.md")) || + fileExists(filepath.Join(homeDir(), ".aider.conf.yml")) +} + +func (aiderTarget) TargetPath() (string, error) { + h := homeDir() + if h == "" { + return "", errors.New("could not resolve home directory") + } + return filepath.Join(h, ".aider.conventions.md"), nil +} + +func (t aiderTarget) Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) { + path, err := t.TargetPath() + if err != nil { + return InstalledTarget{}, err + } + inst, err := installMarkerBlock(skill, canonicalCurrentDir, path) + if err != nil { + return InstalledTarget{}, err + } + inst.TargetName = t.Name() + + // Make sure ~/.aider.conf.yml references the conventions file. Aider only + // loads it if explicitly told to. + confPath := filepath.Join(homeDir(), ".aider.conf.yml") + readLine := "read: " + path + if err := ensureLineInFile(confPath, readLine); err != nil { + return InstalledTarget{}, fmt.Errorf("update aider conf: %w", err) + } + + return inst, nil +} + +func (t aiderTarget) Uninstall() error { + path, err := t.TargetPath() + if err != nil { + return err + } + for _, s := range Catalog { + if err := uninstallMarkerBlock(s, path); err != nil { + return err + } + } + return nil +} + +func (t aiderTarget) Status() (bool, string, error) { + path, err := t.TargetPath() + if err != nil { + return false, "", err + } + v := readMarkerVersion(Catalog[0], path) + if v == "" { + return false, "", nil + } + return true, v, nil +} + +// ensureLineInFile ensures the given line exists in the file at path. Creates +// the file if it doesn't exist. Idempotent — re-runs are no-ops. +func ensureLineInFile(path, line string) error { + data, err := os.ReadFile(path) + if err != nil && !os.IsNotExist(err) { + return err + } + if strings.Contains(string(data), line) { + return nil + } + var sb strings.Builder + sb.Write(data) + if len(data) > 0 && !strings.HasSuffix(string(data), "\n") { + sb.WriteString("\n") + } + sb.WriteString(line) + sb.WriteString("\n") + return os.WriteFile(path, []byte(sb.String()), 0o644) +} diff --git a/control-plane/internal/skillkit/target_claude_code.go b/control-plane/internal/skillkit/target_claude_code.go new file mode 100644 index 000000000..969b21bcf --- /dev/null +++ b/control-plane/internal/skillkit/target_claude_code.go @@ -0,0 +1,118 @@ +package skillkit + +import ( + "errors" + "fmt" + "os" + "path/filepath" + "time" +) + +// claudeCodeTarget installs the skill into Claude Code via the +// ~/.claude/skills// directory using a symlink to the canonical +// versioned-store location. This is the Anthropic-recommended way: Claude +// Code natively understands SKILL.md + references and the symlink ensures +// updates to the canonical store flow through automatically. +type claudeCodeTarget struct{} + +func init() { RegisterTarget(claudeCodeTarget{}) } + +func (claudeCodeTarget) Name() string { return "claude-code" } +func (claudeCodeTarget) DisplayName() string { return "Claude Code" } +func (claudeCodeTarget) Method() string { return "symlink" } + +func (claudeCodeTarget) Detected() bool { + return dirExists(filepath.Join(homeDir(), ".claude")) +} + +func (claudeCodeTarget) TargetPath() (string, error) { + h := homeDir() + if h == "" { + return "", errors.New("could not resolve home directory") + } + return filepath.Join(h, ".claude", "skills"), nil +} + +func (t claudeCodeTarget) skillLink(skill Skill) (string, error) { + root, err := t.TargetPath() + if err != nil { + return "", err + } + return filepath.Join(root, skill.Name), nil +} + +func (t claudeCodeTarget) Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) { + root, err := t.TargetPath() + if err != nil { + return InstalledTarget{}, err + } + if err := os.MkdirAll(root, 0o755); err != nil { + return InstalledTarget{}, fmt.Errorf("create %s: %w", root, err) + } + link, err := t.skillLink(skill) + if err != nil { + return InstalledTarget{}, err + } + + // Remove any existing entry (regular dir, file, or symlink). Claude Code + // reads symlinks transparently, so we always replace with a fresh link to + // the canonical current/ directory. + if info, err := os.Lstat(link); err == nil { + if info.Mode()&os.ModeSymlink != 0 || info.IsDir() || info.Mode().IsRegular() { + if err := os.RemoveAll(link); err != nil { + return InstalledTarget{}, fmt.Errorf("remove existing %s: %w", link, err) + } + } + } + + if err := os.Symlink(canonicalCurrentDir, link); err != nil { + return InstalledTarget{}, fmt.Errorf("symlink %s -> %s: %w", link, canonicalCurrentDir, err) + } + + return InstalledTarget{ + TargetName: t.Name(), + Method: t.Method(), + Path: link, + Version: skill.Version, + InstalledAt: time.Now().UTC(), + }, nil +} + +func (t claudeCodeTarget) Uninstall() error { + // Remove every shipped skill's symlink. (Currently a single skill, but the + // catalog can grow.) + for _, s := range Catalog { + link, err := t.skillLink(s) + if err != nil { + continue + } + if info, err := os.Lstat(link); err == nil { + if info.Mode()&os.ModeSymlink != 0 || info.IsDir() || info.Mode().IsRegular() { + if err := os.RemoveAll(link); err != nil { + return fmt.Errorf("remove %s: %w", link, err) + } + } + } + } + return nil +} + +func (t claudeCodeTarget) Status() (bool, string, error) { + link, err := t.skillLink(Catalog[0]) + if err != nil { + return false, "", err + } + info, err := os.Lstat(link) + if err != nil { + return false, "", nil + } + if info.Mode()&os.ModeSymlink == 0 { + return true, "manual", nil // a regular dir/file lives there — not ours + } + dest, err := os.Readlink(link) + if err != nil { + return false, "", nil + } + // dest looks like .../.agentfield/skills// + return true, filepath.Base(dest), nil +} diff --git a/control-plane/internal/skillkit/target_codex.go b/control-plane/internal/skillkit/target_codex.go new file mode 100644 index 000000000..9f33dea8c --- /dev/null +++ b/control-plane/internal/skillkit/target_codex.go @@ -0,0 +1,67 @@ +package skillkit + +import ( + "errors" + "path/filepath" +) + +// codexTarget installs the skill into Codex (OpenAI's coding agent CLI) by +// appending a marker block to ~/.codex/AGENTS.override.md. The block points +// at the canonical SKILL.md so updates flow through automatically. +type codexTarget struct{} + +func init() { RegisterTarget(codexTarget{}) } + +func (codexTarget) Name() string { return "codex" } +func (codexTarget) DisplayName() string { return "Codex (OpenAI)" } +func (codexTarget) Method() string { return "marker-block" } + +func (codexTarget) Detected() bool { + return commandAvailable("codex") || dirExists(filepath.Join(homeDir(), ".codex")) +} + +func (codexTarget) TargetPath() (string, error) { + h := homeDir() + if h == "" { + return "", errors.New("could not resolve home directory") + } + return filepath.Join(h, ".codex", "AGENTS.override.md"), nil +} + +func (t codexTarget) Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) { + path, err := t.TargetPath() + if err != nil { + return InstalledTarget{}, err + } + inst, err := installMarkerBlock(skill, canonicalCurrentDir, path) + if err != nil { + return InstalledTarget{}, err + } + inst.TargetName = t.Name() + return inst, nil +} + +func (t codexTarget) Uninstall() error { + path, err := t.TargetPath() + if err != nil { + return err + } + for _, s := range Catalog { + if err := uninstallMarkerBlock(s, path); err != nil { + return err + } + } + return nil +} + +func (t codexTarget) Status() (bool, string, error) { + path, err := t.TargetPath() + if err != nil { + return false, "", err + } + v := readMarkerVersion(Catalog[0], path) + if v == "" { + return false, "", nil + } + return true, v, nil +} diff --git a/control-plane/internal/skillkit/target_cursor.go b/control-plane/internal/skillkit/target_cursor.go new file mode 100644 index 000000000..e52930aa3 --- /dev/null +++ b/control-plane/internal/skillkit/target_cursor.go @@ -0,0 +1,72 @@ +package skillkit + +import ( + "errors" + "fmt" + "path/filepath" + "time" +) + +// cursorTarget is a "manual" target — Cursor's global rules live in the +// Settings UI rather than a file we can write. Install() prints instructions +// for the user to copy/paste the SKILL.md content into Cursor → Settings → +// Rules for AI, and records the install in state so `af skill list` shows it +// as "manual / pending user action". +type cursorTarget struct{} + +func init() { RegisterTarget(cursorTarget{}) } + +func (cursorTarget) Name() string { return "cursor" } +func (cursorTarget) DisplayName() string { return "Cursor" } +func (cursorTarget) Method() string { return "manual" } + +func (cursorTarget) Detected() bool { + return commandAvailable("cursor") || + dirExists(filepath.Join(homeDir(), ".cursor")) || + dirExists(filepath.Join(homeDir(), "Library", "Application Support", "Cursor")) +} + +func (cursorTarget) TargetPath() (string, error) { + return "", errors.New("Cursor stores global rules in the Settings UI; no file path") +} + +func (t cursorTarget) Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) { + skillPath := filepath.Join(canonicalCurrentDir, skill.EntryFile) + fmt.Printf(` + ⚠ Cursor manual install required + + Cursor's global rules live in the Settings UI, not a file the af binary + can write to. To enable the skill in Cursor: + + 1. Open Cursor + 2. Cmd+, → Settings → General → Rules for AI + 3. Add a rule like: + + When the user asks you to architect or build a multi-agent system on + AgentField, read this skill first: + %s + + The skill is self-contained — every reference is one level deep + from SKILL.md. + + (You can also add a per-project rule at .cursor/rules/agentfield.mdc.) +`, skillPath) + + return InstalledTarget{ + TargetName: t.Name(), + Method: t.Method(), + Path: "Cursor Settings → Rules for AI (manual)", + Version: skill.Version, + InstalledAt: time.Now().UTC(), + }, nil +} + +func (cursorTarget) Uninstall() error { + fmt.Println(" ⚠ Cursor manual uninstall: remove the AgentField rule from Settings → Rules for AI") + return nil +} + +func (cursorTarget) Status() (bool, string, error) { + // Cursor's UI state isn't readable from disk. Always report unknown. + return false, "", nil +} diff --git a/control-plane/internal/skillkit/target_gemini.go b/control-plane/internal/skillkit/target_gemini.go new file mode 100644 index 000000000..4a1c4cf8a --- /dev/null +++ b/control-plane/internal/skillkit/target_gemini.go @@ -0,0 +1,66 @@ +package skillkit + +import ( + "errors" + "path/filepath" +) + +// geminiTarget installs into the Gemini CLI by appending a marker block to +// ~/.gemini/GEMINI.md. +type geminiTarget struct{} + +func init() { RegisterTarget(geminiTarget{}) } + +func (geminiTarget) Name() string { return "gemini" } +func (geminiTarget) DisplayName() string { return "Gemini CLI" } +func (geminiTarget) Method() string { return "marker-block" } + +func (geminiTarget) Detected() bool { + return commandAvailable("gemini") || dirExists(filepath.Join(homeDir(), ".gemini")) +} + +func (geminiTarget) TargetPath() (string, error) { + h := homeDir() + if h == "" { + return "", errors.New("could not resolve home directory") + } + return filepath.Join(h, ".gemini", "GEMINI.md"), nil +} + +func (t geminiTarget) Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) { + path, err := t.TargetPath() + if err != nil { + return InstalledTarget{}, err + } + inst, err := installMarkerBlock(skill, canonicalCurrentDir, path) + if err != nil { + return InstalledTarget{}, err + } + inst.TargetName = t.Name() + return inst, nil +} + +func (t geminiTarget) Uninstall() error { + path, err := t.TargetPath() + if err != nil { + return err + } + for _, s := range Catalog { + if err := uninstallMarkerBlock(s, path); err != nil { + return err + } + } + return nil +} + +func (t geminiTarget) Status() (bool, string, error) { + path, err := t.TargetPath() + if err != nil { + return false, "", err + } + v := readMarkerVersion(Catalog[0], path) + if v == "" { + return false, "", nil + } + return true, v, nil +} diff --git a/control-plane/internal/skillkit/target_opencode.go b/control-plane/internal/skillkit/target_opencode.go new file mode 100644 index 000000000..cd3bd11e3 --- /dev/null +++ b/control-plane/internal/skillkit/target_opencode.go @@ -0,0 +1,66 @@ +package skillkit + +import ( + "errors" + "path/filepath" +) + +// opencodeTarget installs into OpenCode by appending a marker block to +// ~/.config/opencode/AGENTS.md. +type opencodeTarget struct{} + +func init() { RegisterTarget(opencodeTarget{}) } + +func (opencodeTarget) Name() string { return "opencode" } +func (opencodeTarget) DisplayName() string { return "OpenCode" } +func (opencodeTarget) Method() string { return "marker-block" } + +func (opencodeTarget) Detected() bool { + return commandAvailable("opencode") || dirExists(filepath.Join(homeDir(), ".config", "opencode")) +} + +func (opencodeTarget) TargetPath() (string, error) { + h := homeDir() + if h == "" { + return "", errors.New("could not resolve home directory") + } + return filepath.Join(h, ".config", "opencode", "AGENTS.md"), nil +} + +func (t opencodeTarget) Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) { + path, err := t.TargetPath() + if err != nil { + return InstalledTarget{}, err + } + inst, err := installMarkerBlock(skill, canonicalCurrentDir, path) + if err != nil { + return InstalledTarget{}, err + } + inst.TargetName = t.Name() + return inst, nil +} + +func (t opencodeTarget) Uninstall() error { + path, err := t.TargetPath() + if err != nil { + return err + } + for _, s := range Catalog { + if err := uninstallMarkerBlock(s, path); err != nil { + return err + } + } + return nil +} + +func (t opencodeTarget) Status() (bool, string, error) { + path, err := t.TargetPath() + if err != nil { + return false, "", err + } + v := readMarkerVersion(Catalog[0], path) + if v == "" { + return false, "", nil + } + return true, v, nil +} diff --git a/control-plane/internal/skillkit/target_windsurf.go b/control-plane/internal/skillkit/target_windsurf.go new file mode 100644 index 000000000..8dac04da5 --- /dev/null +++ b/control-plane/internal/skillkit/target_windsurf.go @@ -0,0 +1,67 @@ +package skillkit + +import ( + "errors" + "path/filepath" +) + +// windsurfTarget installs into Windsurf by appending a marker block to +// ~/.codeium/windsurf/memories/global_rules.md. +type windsurfTarget struct{} + +func init() { RegisterTarget(windsurfTarget{}) } + +func (windsurfTarget) Name() string { return "windsurf" } +func (windsurfTarget) DisplayName() string { return "Windsurf" } +func (windsurfTarget) Method() string { return "marker-block" } + +func (windsurfTarget) Detected() bool { + return dirExists(filepath.Join(homeDir(), ".codeium")) || + dirExists(filepath.Join(homeDir(), "Library", "Application Support", "Windsurf")) +} + +func (windsurfTarget) TargetPath() (string, error) { + h := homeDir() + if h == "" { + return "", errors.New("could not resolve home directory") + } + return filepath.Join(h, ".codeium", "windsurf", "memories", "global_rules.md"), nil +} + +func (t windsurfTarget) Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) { + path, err := t.TargetPath() + if err != nil { + return InstalledTarget{}, err + } + inst, err := installMarkerBlock(skill, canonicalCurrentDir, path) + if err != nil { + return InstalledTarget{}, err + } + inst.TargetName = t.Name() + return inst, nil +} + +func (t windsurfTarget) Uninstall() error { + path, err := t.TargetPath() + if err != nil { + return err + } + for _, s := range Catalog { + if err := uninstallMarkerBlock(s, path); err != nil { + return err + } + } + return nil +} + +func (t windsurfTarget) Status() (bool, string, error) { + path, err := t.TargetPath() + if err != nil { + return false, "", err + } + v := readMarkerVersion(Catalog[0], path) + if v == "" { + return false, "", nil + } + return true, v, nil +} diff --git a/control-plane/internal/skillkit/targets.go b/control-plane/internal/skillkit/targets.go new file mode 100644 index 000000000..df6e73bc3 --- /dev/null +++ b/control-plane/internal/skillkit/targets.go @@ -0,0 +1,154 @@ +package skillkit + +import ( + "fmt" + "os" + "os/exec" + "path/filepath" + "runtime" +) + +// Target is a coding-agent integration the skill can be installed into. +// Each target knows how to detect itself, install, uninstall, and report +// its current installed version (if any). +type Target interface { + Name() string // canonical short name, e.g. "claude-code" + DisplayName() string // pretty name for UI, e.g. "Claude Code" + Detected() bool // is this target installed on the user's machine? + Method() string // "symlink", "marker-block", "manual" + TargetPath() (string, error) // canonical path the target writes to + Install(skill Skill, canonicalCurrentDir string) (InstalledTarget, error) // performs the install (idempotent) + Uninstall() error // removes the integration + Status() (installed bool, version string, err error) // currently installed? +} + +// AllTargets returns the registered list of targets in stable order. New +// targets register themselves in init() and append to this slice. +var allTargets []Target + +// RegisterTarget adds a target to the global registry. Called from init() in +// each per-target file. +func RegisterTarget(t Target) { + allTargets = append(allTargets, t) +} + +// AllTargets returns the registered targets. +func AllTargets() []Target { + return allTargets +} + +// TargetByName looks up a target by its short name. +func TargetByName(name string) (Target, error) { + for _, t := range allTargets { + if t.Name() == name { + return t, nil + } + } + available := make([]string, len(allTargets)) + for i, t := range allTargets { + available[i] = t.Name() + } + return nil, fmt.Errorf("target %q not registered (available: %v)", name, available) +} + +// DetectedTargets returns the subset of registered targets that are +// currently installed on the user's machine. +func DetectedTargets() []Target { + var out []Target + for _, t := range allTargets { + if t.Detected() { + out = append(out, t) + } + } + return out +} + +// ── Shared utilities used by per-target install logic ──────────────────── + +// homeDir returns the user's home directory. +func homeDir() string { + if h, err := os.UserHomeDir(); err == nil { + return h + } + return "" +} + +// commandAvailable returns true if the named binary is on PATH. +func commandAvailable(bin string) bool { + _, err := exec.LookPath(bin) + return err == nil +} + +// dirExists returns true if a directory exists at path. +func dirExists(path string) bool { + if path == "" { + return false + } + info, err := os.Stat(path) + return err == nil && info.IsDir() +} + +// fileExists returns true if a regular file exists at path. +func fileExists(path string) bool { + if path == "" { + return false + } + info, err := os.Stat(path) + return err == nil && !info.IsDir() +} + +// markerStart returns the opening marker line for a skill's marker block. +// Used by file-append targets (Codex, Gemini, OpenCode, Aider, Windsurf). +func markerStart(skill Skill) string { + return fmt.Sprintf("", skill.Name, skill.Version) +} + +// markerStartPattern returns a substring (without version) used to find an +// existing block of THIS skill regardless of installed version, so re-installs +// can replace older versions cleanly. +func markerStartPattern(skill Skill) string { + return fmt.Sprintf("", skill.Name) +} + +// renderPointerBlock returns the marker-bracketed text that file-append +// targets write into the agent's global rules file. The block points the +// agent at the canonical SKILL.md path so updates to the canonical store +// flow through automatically — no need to re-edit every agent rules file. +func renderPointerBlock(skill Skill, canonicalCurrentDir string) string { + skillPath := filepath.Join(canonicalCurrentDir, skill.EntryFile) + return fmt.Sprintf(`%s +## %s + +When the user asks you to architect or build a multi-agent system on +AgentField (composite-intelligence backends, multi-reasoner pipelines, +financial reviewer / clinical triage / research agent / etc.), you MUST +read this skill first: + + %s + +The skill is self-contained and every reference file is one level deep +from SKILL.md. It teaches the philosophy, the SDK primitives, the +canonical scaffold layout, the verification workflow, and the curl +smoke test. + +Skill version: %s +%s`, + markerStart(skill), + skill.Description, + skillPath, + skill.Version, + markerEnd(skill), + ) +} + +// platformInfo is a small helper for diagnostics — used by Cursor's manual +// install path to give the user an OS-appropriate hint about where the +// "Settings → Rules for AI" UI lives. +func platformInfo() string { + return runtime.GOOS +} diff --git a/scripts/install.sh b/scripts/install.sh index 7e5d8d840..ca315afa0 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -16,6 +16,13 @@ SKIP_PATH_CONFIG="${SKIP_PATH_CONFIG:-0}" # Can be set via --staging flag or STAGING=1 environment variable STAGING="${STAGING:-0}" +# Skill install mode (interactive | all | all-targets | none) +# Defaults to "interactive" — runs `af skill install` with the picker after +# the binary lands so first-time users get the skill into Claude Code / Codex +# / etc. without a second step. Override with --no-skill / --all-skills / +# --all-skill-targets, or via SKILL_MODE env var. +SKILL_MODE="${SKILL_MODE:-interactive}" + # Color codes RED='\033[0;31m' GREEN='\033[0;32m' @@ -42,24 +49,44 @@ parse_args() { VERBOSE=1 shift ;; + --no-skill) + SKILL_MODE="none" + shift + ;; + --all-skills) + SKILL_MODE="all" + shift + ;; + --all-skill-targets) + SKILL_MODE="all-targets" + shift + ;; --help|-h) echo "AgentField CLI Installer" echo "" echo "Usage:" echo " curl -fsSL https://agentfield.ai/install.sh | bash" echo " curl -fsSL https://agentfield.ai/install.sh | bash -s -- --staging" + echo " curl -fsSL https://agentfield.ai/install.sh | bash -s -- --all-skills" echo "" echo "Options:" - echo " --staging Install latest prerelease/staging version" - echo " --verbose Enable verbose output" - echo " --help Show this help message" + echo " --staging Install latest prerelease/staging version" + echo " --verbose Enable verbose output" + echo " --no-skill Skip the agentfield-multi-reasoner-builder" + echo " skill install step" + echo " --all-skills Install the skill into every detected" + echo " coding agent (no interactive prompt)" + echo " --all-skill-targets Install the skill into every registered" + echo " coding agent target, even if not detected" + echo " --help Show this help message" echo "" echo "Environment variables:" - echo " VERSION Specific version to install (e.g., v0.1.19)" - echo " STAGING=1 Same as --staging flag" - echo " VERBOSE=1 Same as --verbose flag" - echo " SKIP_PATH_CONFIG=1 Skip PATH configuration" + echo " VERSION Specific version to install (e.g., v0.1.19)" + echo " STAGING=1 Same as --staging flag" + echo " VERBOSE=1 Same as --verbose flag" + echo " SKIP_PATH_CONFIG=1 Skip PATH configuration" echo " AGENTFIELD_INSTALL_DIR Custom install directory" + echo " SKILL_MODE interactive (default) | all | all-targets | none" exit 0 ;; *) @@ -482,6 +509,44 @@ verify_installation() { fi } +# Install the agentfield-multi-reasoner-builder skill into coding-agent +# integrations (Claude Code, Codex, Gemini, OpenCode, Aider, Windsurf, Cursor). +# Delegated to the freshly-installed `af` binary so the install logic stays +# in one place. Honors $SKILL_MODE: interactive | all | all-targets | none. +install_skill() { + local install_dir="$1" + local af_bin="$install_dir/agentfield" + + if [[ ! -x "$af_bin" ]]; then + print_warning "af binary not executable, skipping skill install" + return 0 + fi + + case "$SKILL_MODE" in + none|skip) + printf "\n" + print_info "Skipping skill install (SKILL_MODE=none)" + printf " ${DIM}Run later: ${CYAN}af skill install${NC}\n" 2>/dev/null || \ + printf " Run later: af skill install\n" + return 0 + ;; + all) + printf "\n" + print_info "Installing skill into all detected coding agents..." + "$af_bin" skill install --all || print_warning "Skill install reported errors" + ;; + all-targets) + printf "\n" + print_info "Installing skill into all registered coding agents (even undetected)..." + "$af_bin" skill install --all-targets || print_warning "Skill install reported errors" + ;; + interactive|*) + printf "\n" + "$af_bin" skill install || print_warning "Skill install reported errors" + ;; + esac +} + # Print success message print_success_message() { printf "\n" @@ -622,6 +687,11 @@ main() { # Verify installation verify_installation "$INSTALL_DIR" + # Install the agentfield-multi-reasoner-builder skill into coding agents. + # Default mode is interactive — runs `af skill install` with the picker. + # Override via --no-skill / --all-skills / --all-skill-targets or SKILL_MODE. + install_skill "$INSTALL_DIR" + # Print success message print_success_message } diff --git a/scripts/sync-embedded-skills.sh b/scripts/sync-embedded-skills.sh new file mode 100755 index 000000000..e5dedfe75 --- /dev/null +++ b/scripts/sync-embedded-skills.sh @@ -0,0 +1,80 @@ +#!/usr/bin/env bash +# sync-embedded-skills.sh — keep the Go embed copy of every shipped skill in +# sync with the canonical source-of-truth files in skills/. +# +# The af binary embeds skill content at build time via go:embed in +# control-plane/internal/skillkit/embed.go. The embed directive can only +# reach files inside the skillkit package, so we maintain a mirror at +# control-plane/internal/skillkit/skill_data// that is bytewise +# identical to skills//. +# +# Run this script whenever you edit a skill in skills/ before committing, +# or before running `go build` if you've made local edits. The Makefile's +# build target should also call this. +# +# Usage: +# ./scripts/sync-embedded-skills.sh # sync all shipped skills +# ./scripts/sync-embedded-skills.sh --check # exit non-zero if out of sync (CI) + +set -euo pipefail + +REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +SOURCE_DIR="${REPO_ROOT}/skills" +EMBED_DIR="${REPO_ROOT}/control-plane/internal/skillkit/skill_data" + +# Skills to mirror. Add new skills here when they're added to the catalog. +SKILLS=( + "agentfield-multi-reasoner-builder" +) + +CHECK_ONLY=0 +if [[ "${1:-}" == "--check" ]]; then + CHECK_ONLY=1 +fi + +if [[ ! -d "$SOURCE_DIR" ]]; then + echo "ERROR: source directory $SOURCE_DIR does not exist" >&2 + exit 1 +fi + +mkdir -p "$EMBED_DIR" + +drift_found=0 + +for skill in "${SKILLS[@]}"; do + src="${SOURCE_DIR}/${skill}" + dst="${EMBED_DIR}/${skill}" + + if [[ ! -d "$src" ]]; then + echo "ERROR: skill source $src does not exist" >&2 + exit 1 + fi + + if [[ "$CHECK_ONLY" == "1" ]]; then + if [[ ! -d "$dst" ]] || ! diff -rq "$src" "$dst" >/dev/null 2>&1; then + echo "DRIFT: $skill — embed copy out of sync with source" >&2 + drift_found=1 + fi + continue + fi + + # Sync: rsync if available, otherwise rm + cp -R + if command -v rsync >/dev/null 2>&1; then + rsync -a --delete "${src}/" "${dst}/" + else + rm -rf "$dst" + mkdir -p "$dst" + cp -R "${src}/." "${dst}/" + fi + + echo " ✓ synced $skill" +done + +if [[ "$CHECK_ONLY" == "1" ]]; then + if [[ "$drift_found" == "1" ]]; then + echo "" >&2 + echo "Run ./scripts/sync-embedded-skills.sh to fix the drift, then commit." >&2 + exit 1 + fi + echo "All embedded skills are in sync with sources." +fi From 7585b6f8684f82da91759f96c10a4fe0fd910d9f Mon Sep 17 00:00:00 2001 From: Santosh Date: Wed, 8 Apr 2026 14:15:02 +0530 Subject: [PATCH 4/4] feat(install): default to all-skills install + document in README MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two related changes: 1. Make `--all-skills` the default behaviour of scripts/install.sh. The previous default was "interactive" which ran `af skill install` with a TTY picker. That's broken for the canonical install path `curl -fsSL https://agentfield.ai/install.sh | bash` because there is no TTY for the picker to read from — it would hang or fall through without installing anything. The new default `SKILL_MODE=all` calls `af skill install --all` which installs the agentfield-multi-reasoner-builder skill into every coding agent the binary detects on the user's machine, with no prompts. New flag `--interactive-skill` is added for users running install.sh from a real terminal who do want the picker. The old `--all-skills` flag is kept as a backwards-compat alias so existing docs / READMEs / bookmarks keep working. Help text and SKILL_MODE doc comment updated to reflect the new default. 2. Update README Quick Start to surface the install + skill behaviour. The Quick Start now explicitly tells users that the one-line install drops the skill into every coding agent on their machine — Claude Code, Codex, Gemini, OpenCode, Aider, Windsurf, Cursor — without any prompts or second step. Adds opt-out instructions (`--no-skill` for binary-only) and points existing `af` users at `af skill install` / `af skill install --all` for installing the skill without re-running install.sh. Adds a one-line explanation of what the skill actually does so a first- time reader understands why they would want it. Co-Authored-By: Claude Opus 4.6 (1M context) --- README.md | 12 ++++++++++- scripts/install.sh | 53 ++++++++++++++++++++++++++++++++-------------- 2 files changed, 48 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 9b8e0ca8d..ce4efadd1 100644 --- a/README.md +++ b/README.md @@ -75,11 +75,21 @@ app.run() ## Quick Start ```bash -curl -fsSL https://agentfield.ai/install.sh | bash # Install CLI +# Installs the af CLI AND drops the agentfield-multi-reasoner-builder skill +# into every coding agent on your machine (Claude Code, Codex, Gemini, +# OpenCode, Aider, Windsurf, Cursor) — no prompts, no second step. +curl -fsSL https://agentfield.ai/install.sh | bash + af init my-agent --defaults # Scaffold agent cd my-agent && pip install -r requirements.txt ``` +> **Just want the binary?** `curl -fsSL https://agentfield.ai/install.sh | bash -s -- --no-skill` +> +> **Already have `af` installed and just want the skill?** `af skill install` (interactive picker) or `af skill install --all` (every detected agent). See [`af skill --help`](#) for `list`, `update`, `uninstall`, version pinning, and per-target installs. + +The skill teaches any coding agent how to architect and ship a complete multi-reasoner backend on AgentField — composite-intelligence patterns, deep DAG composition, scaffold-to-curl in one workflow. Once installed, just open Claude Code / Codex / etc. and ask **"build me a multi-reasoner agent that does X"** — the skill fires automatically. + ```bash af server # Terminal 1 → Dashboard at http://localhost:8080 python main.py # Terminal 2 → Agent auto-registers diff --git a/scripts/install.sh b/scripts/install.sh index ca315afa0..567cc2098 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -16,12 +16,22 @@ SKIP_PATH_CONFIG="${SKIP_PATH_CONFIG:-0}" # Can be set via --staging flag or STAGING=1 environment variable STAGING="${STAGING:-0}" -# Skill install mode (interactive | all | all-targets | none) -# Defaults to "interactive" — runs `af skill install` with the picker after -# the binary lands so first-time users get the skill into Claude Code / Codex -# / etc. without a second step. Override with --no-skill / --all-skills / -# --all-skill-targets, or via SKILL_MODE env var. -SKILL_MODE="${SKILL_MODE:-interactive}" +# Skill install mode (all | all-targets | interactive | none) +# +# Defaults to "all" — installs the agentfield-multi-reasoner-builder skill +# into every coding agent the binary detects on the user's machine, without +# any prompts. This is the right default for `curl … | bash` because there +# is no TTY for an interactive picker to read from, and the whole point of +# the one-line install is to just work. +# +# Override with: +# --no-skill → SKILL_MODE=none (skip the skill install) +# --interactive-skill → SKILL_MODE=interactive (run the picker) +# --all-skill-targets → SKILL_MODE=all-targets (install into every +# registered target, +# even ones we did not detect) +# SKILL_MODE= → env var override +SKILL_MODE="${SKILL_MODE:-all}" # Color codes RED='\033[0;31m' @@ -54,6 +64,8 @@ parse_args() { shift ;; --all-skills) + # Backwards-compat alias — "all" is now the default, but the flag + # stays so existing scripts and READMEs keep working. SKILL_MODE="all" shift ;; @@ -61,23 +73,31 @@ parse_args() { SKILL_MODE="all-targets" shift ;; + --interactive-skill) + SKILL_MODE="interactive" + shift + ;; --help|-h) echo "AgentField CLI Installer" echo "" echo "Usage:" - echo " curl -fsSL https://agentfield.ai/install.sh | bash" - echo " curl -fsSL https://agentfield.ai/install.sh | bash -s -- --staging" - echo " curl -fsSL https://agentfield.ai/install.sh | bash -s -- --all-skills" + echo " curl -fsSL https://agentfield.ai/install.sh | bash # binary + skill into all detected agents (no prompts)" + echo " curl -fsSL https://agentfield.ai/install.sh | bash -s -- --no-skill # binary only, skip the skill install" + echo " curl -fsSL https://agentfield.ai/install.sh | bash -s -- --staging # latest prerelease" echo "" echo "Options:" echo " --staging Install latest prerelease/staging version" echo " --verbose Enable verbose output" echo " --no-skill Skip the agentfield-multi-reasoner-builder" - echo " skill install step" - echo " --all-skills Install the skill into every detected" - echo " coding agent (no interactive prompt)" + echo " skill install step (binary only)" + echo " --all-skills Install the skill into every detected coding" + echo " agent (default behaviour — flag kept for" + echo " backwards compatibility with older docs)" echo " --all-skill-targets Install the skill into every registered" echo " coding agent target, even if not detected" + echo " --interactive-skill Run the interactive skill picker (only useful" + echo " when you run install.sh from a real terminal," + echo " not from 'curl … | bash')" echo " --help Show this help message" echo "" echo "Environment variables:" @@ -86,7 +106,7 @@ parse_args() { echo " VERBOSE=1 Same as --verbose flag" echo " SKIP_PATH_CONFIG=1 Skip PATH configuration" echo " AGENTFIELD_INSTALL_DIR Custom install directory" - echo " SKILL_MODE interactive (default) | all | all-targets | none" + echo " SKILL_MODE all (default) | all-targets | interactive | none" exit 0 ;; *) @@ -512,7 +532,7 @@ verify_installation() { # Install the agentfield-multi-reasoner-builder skill into coding-agent # integrations (Claude Code, Codex, Gemini, OpenCode, Aider, Windsurf, Cursor). # Delegated to the freshly-installed `af` binary so the install logic stays -# in one place. Honors $SKILL_MODE: interactive | all | all-targets | none. +# in one place. Honors $SKILL_MODE: all (default) | all-targets | interactive | none. install_skill() { local install_dir="$1" local af_bin="$install_dir/agentfield" @@ -688,8 +708,9 @@ main() { verify_installation "$INSTALL_DIR" # Install the agentfield-multi-reasoner-builder skill into coding agents. - # Default mode is interactive — runs `af skill install` with the picker. - # Override via --no-skill / --all-skills / --all-skill-targets or SKILL_MODE. + # Default mode is `all` — installs into every detected coding agent without + # any prompts (the right behaviour for `curl … | bash`). Override via + # --no-skill / --all-skill-targets / --interactive-skill or SKILL_MODE. install_skill "$INSTALL_DIR" # Print success message