Document callback URL and compose storage#8
Merged
santoshkumarradha merged 1 commit intomainfrom Nov 11, 2025
Merged
Conversation
Adds environment variables to the Docker Compose file to configure the control plane to use PostgreSQL for storage. Updates the README with instructions on setting AGENT_CALLBACK_URL for agents running outside Docker, as they cannot reach localhost.
santoshkumarradha
added a commit
that referenced
this pull request
Apr 8, 2026
Surfaced by the first end-to-end docker test of a codex-built medical-triage
backend. Fixes 5 real bugs that hid behind py_compile + docker compose config
validation, plus pushes the architecture philosophy from "flat orchestrator
fans out specialists" to "deep DAG of reasoners as software APIs".
## Bugs fixed
1. **Broken healthcheck — agentfield/control-plane:latest is distroless.**
The image has no /bin/sh, no wget, no curl. The CMD-based healthcheck
["wget", "--quiet", ...] always failed, blocking every first build with
"dependency failed to start: container is unhealthy". Drop the healthcheck
entirely + switch depends_on to condition: service_started. The agent SDK
already retries connection on startup.
File: control-plane/internal/templates/docker/docker-compose.yml.tmpl
2. **Dead default model — openrouter/anthropic/claude-3.5-sonnet returns 404
from OpenRouter** (litellm.NotFoundError: No endpoints found for
anthropic/claude-3.5-sonnet). Every previously generated example would
crash on first real curl. Replace with openrouter/google/gemini-2.5-flash
(verified working in the live test) across:
- SKILL.md, all 6 reference files
- control-plane/internal/cli/doctor.go (Recommendation block)
- control-plane/internal/cli/init.go (--default-model default)
- control-plane/internal/templates/templates.go (TemplateData doc comment)
- control-plane/internal/templates/python/main.py.tmpl (env default)
3. **90s sync execute timeout undocumented.** The control plane has a hard
90-second timeout on POST /api/v1/execute/<target>. Slow models (minimax-
m2.7, Claude Sonnet, o1) and large fan-outs blow it. Generated systems
would hit HTTP 400 {"error":"execution timeout after 1m30s"} with no
guidance. Document the limit + the async fallback path
(POST /api/v1/execute/async) in verification.md, plus point at
gemini-2.5-flash as the recommended fast default.
4. **Discovery API curl shape was wrong everywhere.** The skill teaches
`.reasoners[] | select(.node_id=="X") | .name` but the actual response
is `.capabilities[].reasoners[]` with `agent_id` (not `node_id`) and
`id` (not `name`). Same for /api/v1/nodes — its default ?health_status=
active filter hides healthy nodes that haven't reported "active" yet,
so use ?health_status=any. Fix in SKILL.md and verification.md.
5. **Python init template violated the skill's own hard rules.** The
scaffold from `af init` was using app.serve(auto_port=True) and
hardcoding agentfield_server, which the skill explicitly rejects. Codex
had to fully rewrite main.py on every build. Update the template to use
app.run(auto_port=False), env-driven AGENT_NODE_ID/AGENTFIELD_SERVER/
AI_MODEL/PORT, and a real AIConfig. The scaffold is now consistent with
the skill's mandatory patterns out of the box.
## New philosophy: reasoners as software APIs
Codex's first build (and the loan-underwriter before it) produced a "fat
orchestrator + flat specialists" star pattern: depth-2 DAG, single-layer
parallelism, every specialist has a 50-line .ai() prompt, no reuse across
branches. That's basically asyncio.gather([llm_call_1, llm_call_2, ...])
with extra ceremony.
The right shape is **deep composition cascade**: each reasoner has a
single cognitive responsibility, the orchestrator pushes calls DOWN into
sub-reasoners, parallelism happens at multiple depths, common sub-reasoners
get reused across branches. Each reasoner has a one-line API contract you
could write down — they are software APIs.
Added to the skill:
- New mandatory section "The unit of intelligence is the reasoner — treat
them as software APIs" in SKILL.md, with bad/good shape ASCII diagrams,
concrete decomposition rules (30-line ceiling, single-judgment rule,
reuse-signal extraction), and depth ≥ 3 minimum
- New "Reasoner Composition Cascade" pattern (#8) in architecture-patterns.md
marked as the master pattern that every other pattern layers onto
- Updated "How to pick a pattern" picker to start from cascade as the
backbone instead of treating it as one option among many
- HARD GATE updated: "If you cannot draw your system as a non-trivial
graph with depth ≥ 3, you have not architected anything"
- Grooming rule conflict resolved: the skip-the-question rule now lives
inside the HARD GATE block so agents see them together, not as
competing instructions in separate sections
## Tested end-to-end
Live test of the v1 medical-triage build:
- docker compose up --build → both containers up
- 9 reasoners discovered through /api/v1/discovery/capabilities
- Real curl with the Maria Hernandez patient case →
CALL_911_NOW with full provenance, 17 second wall clock,
HTTP 200, 16KB structured response
- The adversarial reviewer correctly steel-manned Pulmonary Embolism
(because the chest pain is pleuritic) on top of the AMI primary concern
- Deterministic governance overrides fired correctly when committee
confidence dipped — the safe-default fallback pattern works in production
The build only succeeded after the manual healthcheck patch + the model
swap to gemini-2.5-flash. Both fixes are now baked into the templates so
the next codex run will produce a working build on first try.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
santoshkumarradha
added a commit
that referenced
this pull request
Apr 8, 2026
Surfaced by the first end-to-end docker test of a codex-built medical-triage
backend. Fixes 5 real bugs that hid behind py_compile + docker compose config
validation, plus pushes the architecture philosophy from "flat orchestrator
fans out specialists" to "deep DAG of reasoners as software APIs".
## Bugs fixed
1. **Broken healthcheck — agentfield/control-plane:latest is distroless.**
The image has no /bin/sh, no wget, no curl. The CMD-based healthcheck
["wget", "--quiet", ...] always failed, blocking every first build with
"dependency failed to start: container is unhealthy". Drop the healthcheck
entirely + switch depends_on to condition: service_started. The agent SDK
already retries connection on startup.
File: control-plane/internal/templates/docker/docker-compose.yml.tmpl
2. **Dead default model — openrouter/anthropic/claude-3.5-sonnet returns 404
from OpenRouter** (litellm.NotFoundError: No endpoints found for
anthropic/claude-3.5-sonnet). Every previously generated example would
crash on first real curl. Replace with openrouter/google/gemini-2.5-flash
(verified working in the live test) across:
- SKILL.md, all 6 reference files
- control-plane/internal/cli/doctor.go (Recommendation block)
- control-plane/internal/cli/init.go (--default-model default)
- control-plane/internal/templates/templates.go (TemplateData doc comment)
- control-plane/internal/templates/python/main.py.tmpl (env default)
3. **90s sync execute timeout undocumented.** The control plane has a hard
90-second timeout on POST /api/v1/execute/<target>. Slow models (minimax-
m2.7, Claude Sonnet, o1) and large fan-outs blow it. Generated systems
would hit HTTP 400 {"error":"execution timeout after 1m30s"} with no
guidance. Document the limit + the async fallback path
(POST /api/v1/execute/async) in verification.md, plus point at
gemini-2.5-flash as the recommended fast default.
4. **Discovery API curl shape was wrong everywhere.** The skill teaches
`.reasoners[] | select(.node_id=="X") | .name` but the actual response
is `.capabilities[].reasoners[]` with `agent_id` (not `node_id`) and
`id` (not `name`). Same for /api/v1/nodes — its default ?health_status=
active filter hides healthy nodes that haven't reported "active" yet,
so use ?health_status=any. Fix in SKILL.md and verification.md.
5. **Python init template violated the skill's own hard rules.** The
scaffold from `af init` was using app.serve(auto_port=True) and
hardcoding agentfield_server, which the skill explicitly rejects. Codex
had to fully rewrite main.py on every build. Update the template to use
app.run(auto_port=False), env-driven AGENT_NODE_ID/AGENTFIELD_SERVER/
AI_MODEL/PORT, and a real AIConfig. The scaffold is now consistent with
the skill's mandatory patterns out of the box.
## New philosophy: reasoners as software APIs
Codex's first build (and the loan-underwriter before it) produced a "fat
orchestrator + flat specialists" star pattern: depth-2 DAG, single-layer
parallelism, every specialist has a 50-line .ai() prompt, no reuse across
branches. That's basically asyncio.gather([llm_call_1, llm_call_2, ...])
with extra ceremony.
The right shape is **deep composition cascade**: each reasoner has a
single cognitive responsibility, the orchestrator pushes calls DOWN into
sub-reasoners, parallelism happens at multiple depths, common sub-reasoners
get reused across branches. Each reasoner has a one-line API contract you
could write down — they are software APIs.
Added to the skill:
- New mandatory section "The unit of intelligence is the reasoner — treat
them as software APIs" in SKILL.md, with bad/good shape ASCII diagrams,
concrete decomposition rules (30-line ceiling, single-judgment rule,
reuse-signal extraction), and depth ≥ 3 minimum
- New "Reasoner Composition Cascade" pattern (#8) in architecture-patterns.md
marked as the master pattern that every other pattern layers onto
- Updated "How to pick a pattern" picker to start from cascade as the
backbone instead of treating it as one option among many
- HARD GATE updated: "If you cannot draw your system as a non-trivial
graph with depth ≥ 3, you have not architected anything"
- Grooming rule conflict resolved: the skip-the-question rule now lives
inside the HARD GATE block so agents see them together, not as
competing instructions in separate sections
## Tested end-to-end
Live test of the v1 medical-triage build:
- docker compose up --build → both containers up
- 9 reasoners discovered through /api/v1/discovery/capabilities
- Real curl with the Maria Hernandez patient case →
CALL_911_NOW with full provenance, 17 second wall clock,
HTTP 200, 16KB structured response
- The adversarial reviewer correctly steel-manned Pulmonary Embolism
(because the chest pain is pleuritic) on top of the AMI primary concern
- Deterministic governance overrides fired correctly when committee
confidence dipped — the safe-default fallback pattern works in production
The build only succeeded after the manual healthcheck patch + the model
swap to gemini-2.5-flash. Both fixes are now baked into the templates so
the next codex run will produce a working build on first try.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Testing