feat(ai): AI-native development scaffolding + e2e testnet verification by boboliu-1010 · Pull Request #69 · BofAI/x402

boboliu-1010 · 2026-04-24T08:24:55Z

Summary

Adopts a Claude-Code-native development layout and verifies the three payment schemes on real testnets.

AI scaffolding — root + 7 component CLAUDE.md, full .claude/rules/ knowledge base (common / schemes×3 / networks×2 / python / typescript / testing + index), 3 specialized subagents (code-reviewer, security-reviewer, scheme-author), 2 slash commands (/x402:compound, /x402:create-scenario), and a PostToolUse test-dispatch hook that routes Edit|Write|MultiEdit|NotebookEdit to the right suite by path.

Spec layout — collapses redundant docs/specs/ into specs/ to mirror upstream x402-foundation/x402/specs/. 46 cross-references updated. specs/ now holds both protocol-level (top-level + schemes/) and in-flight feature specs (001-exact-v2-compat/). docs/ is now solutions + design only.

E2E testnet tier — 3 real-chain scenarios (exact_testnet, exact_permit_testnet, exact_gasfree_testnet) now runnable with a populated .env. Two scenario bugs fixed (${VAR:-default} syntax, tx-hash regex) + examples/facilitator/server.py gains FACILITATOR_BASE_FEE env so permit/gasfree mechanisms actually register tokens.

Knowledge base — 4 new docs/solutions.md entries capturing incidents discovered during real-chain validation (GasFree address-mapping trap, TRON multisig permission_id requirement, missing base_fee config, integration runner ${VAR:-default} limitation).

Verified on real chain (2026-04-24)

Scenario	Network	Tx hash
`exact_testnet`	BSC Testnet	`461d17baae60b83031de55bb1635bd5f8453d359887e7e82ebee605ab6c159a0`
`exact_permit_testnet`	BSC Testnet	`0xfc8b32decb99d02cdfc684d3f6f1c7c0a91c8b0ff1f632f98d86d9f334198a23` (block 103475005)
`exact_gasfree_testnet`	TRON Nile	`1d77f242b72293116e65c46b5ad756dd2f8355ebc625078aec0eb4ea54d148d2`

Regression

pytest:  217 passed
vitest:   51 passed
e2e mock: 8 scenarios green (run_all.sh, ~35s, no secrets)
e2e testnet: 3 scenarios green (run_testnet.sh with .env, ~45s real chain)

Scope boundary

This PR is docs + .claude/ scaffolding + reference facilitator enhancements + e2e tier. Does NOT port the MCP package or the matrix harness (P2+ in docs/design/ai-transformation.md — separate initiatives).

Migration notes for reviewers

Spec paths changed: any downstream doc or code referencing docs/specs/<X>.md must update to specs/<X>.md (protocol) or specs/schemes/<X>.md (per-scheme). Run git grep "docs/specs" on your branch before rebasing onto this.
New env var FACILITATOR_BASE_FEE required when running exact_permit or exact_gasfree via the reference facilitator. JSON dict of {symbol: smallest-unit fee}. See examples/facilitator/server.py docstring.
New hook at .claude/hooks/run-tests-for-change.sh runs automatically when Claude edits .py / .ts / .tsx / e2e/ / integration/ files. Docs/config edits skip. Failures surface via asyncRewake.

Known follow-ups (not in this PR)

.github/workflows/check_e2e.yml — confirm it's wired as a PR required-check so run_all.sh blocks merges on regression.
Port 8013 collision with x402-demo / other local SDK sessions — document or move scenarios to 8113.
exact_permit_testnet is verified on BSC USDT, but FACILITATOR_BASE_FEE tuning will vary per deployment.
Pre-existing Dependabot alerts (4 moderate on default branch) — orthogonal.

Commits (13, conventional)

```
84d26e6 docs(gasfree): record testnet win + two new gotchas
ea04ab2 feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
f4c0c34 refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
3130e21 docs(e2e): record real-chain evidence + testnet gotchas
8045ba1 fix(e2e): make testnet scenarios runnable
45be4ca chore(ai): gitignore .DS_Store and .claude/hooks/logs/
e0136ec feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
1983edf feat(e2e): add testnet tier with real chain settlement
3509b70 feat(e2e): expand harness to 8 scenarios across EVM + TRON
900e5bd feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
aec6836 docs(design): draft AI transformation plan
```

Test plan

Fresh clone + `uv run pytest` in `python/x402` — expect 217 passed
`pnpm test --run` in `typescript/packages/x402` — expect 51 passed
`./e2e/scenarios/run_all.sh` — expect "ALL GREEN (8 scenarios)"
Inspect `specs/` layout vs upstream `x402-foundation/x402/specs/` for mirror correctness
Verify `.claude/hooks/run-tests-for-change.sh` is executable and `.claude/settings.json` loads without error
(optional) Fill `.env` from `.env.example` and `./e2e/scenarios/run_testnet.sh` for real-chain sanity

🤖 Generated with Claude Code

Consolidates the P1-P4 roadmap for turning x402 into a Claude-Code- native, agent-native payment platform. References Coinbase x402, sun-protocol/apollo-arena, and BofAI/skills as prior art; enumerates what already exists on main, what already exists in sibling repos, and what actually needs to be built (.claude/ scaffolding, MCP transport, declarative e2e scenarios).

…runner Land the first phase of the AI transformation (see docs/design/ai-transformation.md). What this adds: - CLAUDE.md (root): component index + key reading order + conventions + AI-native layout pointers + safety rules - .claude/rules/common/conventions.md: repo-wide invariants (addressing, encoding, payment ID, pipeline step ordering, commit message format) - .claude/rules/schemes/{exact,exact-permit,exact-gasfree}.md: per-scheme rules of thumb with gotchas cross-referenced to solutions.md - .claude/rules/networks/{tron,evm}.md: network identifiers, signing rules, contract registry pointers - .claude/commands/x402/{compound,create-scenario}.md: two wizards (compound implements the /x402:compound referenced by solutions.md; create-scenario scaffolds e2e scenarios) - .claude/agents/{code-reviewer,security-reviewer,scheme-author}.md: x402-flavored subagent definitions - integration/: generic Python step runner (no external deps) with @json_diff built-in, JSON config format, Markdown report output. Clean-room implementation; pattern inspired by sun-protocol/apollo-arena. - integration/configs/smoke.json: self-test verified passing - e2e/scenarios/exact_permit_happy_path/: canonical scenario template; runnable once facilitator --mock mode lands in P2 Smoke test: python3 integration/run.py --config integration/configs/smoke.json → 2/2 ✓, exit 0 Non-goals for this phase (deferred to P2+): - MCP transport packages (mcp-ts, mcp-py) - Matrix e2e harness - mock-chainrpc and mock-facilitator - Per-package CLAUDE.md files

Adds a full e2e test harness — mock facilitator, fixture resource server, test client, and 8 declarative scenarios covering happy + unhappy paths for all three schemes. Scenarios (all green, ~30s total, fully deterministic, no chain RPC): - exact_happy_path ERC-3009 on BSC testnet - exact_permit_happy_path EIP-2612 on BSC testnet (allowance skipped) - exact_gasfree_happy_path TIP-712 on TRON Nile (GasFree API mocked) - unhappy_verify_invalid_sig / _expired / _network_mismatch - unhappy_settle_insufficient / _revert Key design choices: - Live signing, mock verify — client signs real payloads; the mock facilitator echoes success/failure based on its mode. - One mock process serves both facilitator and GasFree API roles on port 4020 — no second service needed. - Test-only signer subclasses (_AllowanceSkippingEvmSigner, _OfflineTronSigner) bypass on-chain balance/allowance checks without touching SDK code. - Resource server monkey-patches get_verifier_for_network when E2E_SKIP_TX_VERIFICATION=1 so the TRON scenario doesn't try to look up the fake tx hash on a live node. Also: - integration/commands.py: add @start_bg / @wait_http / @stop_bg / @stop_all_bg builtins so scenarios can manage background services. - .github/workflows/check_e2e.yml: runs run_all.sh on every PR. - docs/design/ai-transformation.md: refresh §6.2 scenario table with status column; mark P1 immediate-steps items done. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Extends the e2e harness with a second tier that settles real on-chain transactions on BSC testnet and TRON Nile, alongside the existing mock suite. Scenarios self-skip when their env vars are blank, so the mock path remains the default (and the CI default on PR). New pieces: examples/facilitator/ FastAPI server wrapping X402Facilitator. Registers EVM/TRON mechanisms based on which *_PRIVATE_KEY env vars are set. Run with: python -m examples.facilitator .env.example Template listing every testnet variable (RPC URLs, signing keys, pay-to addresses, GasFree API URL, price specs). .env is already gitignored. e2e/scenarios/testnet/ exact_testnet, exact_permit_testnet, exact_gasfree_testnet — each uses the example facilitator + real chain, with @json_match assertions (tx hash is dynamic so @json_diff won't work). e2e/scenarios/run_testnet.sh Driver that checks required env per scenario and skips when blank. .github/workflows/nightly_testnet.yml Scheduled + workflow_dispatch runner consuming GitHub Secrets. Harness additions: integration/commands.py @json_match New fuzzy assertion builtin: @json_match <file> key.path=literal key.path~regex Supports JSON literals (true/false/null/numbers) and regex for dynamic fields like tx hashes. integration/commands.py @start_bg Now expands ${VAR} before quoting so scenario configs can template from env (needed to thread secrets through nested `env KEY=${SECRET} python -m ...` invocations). Mock suite (8 scenarios) remains the default and all stay green. The testnet suite is a superset — same test_client.py and resource_server.py drive both. Developers fill in .env to run locally; CI uses Secrets. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

… + test hook - Component CLAUDE.md in python/x402, typescript/, typescript/packages/x402, e2e/, examples/, docs/, specs/ — local build/test commands and conventions. - .claude/rules/CLAUDE.md index + python/testing/typescript rule files, so Claude loads the right rule layer per touched path. - .claude/settings.json + hooks/run-tests-for-change.sh: PostToolUse hook dispatches to pytest / vitest / e2e run_all.sh by path prefix; docs/config edits skip. Dropped --timeout=60 (pytest-timeout not a dep; outer 300s hook timeout already covers). - Root CLAUDE.md: refresh the rules table and add component-CLAUDE.md pointer row. Scope: docs + .claude/ only, per the AI-transformation scope. MCP package and tron-contribution/ stay out of this commit. Verified: 217 pytest + 51 vitest + 8 e2e scenarios all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Prevent macOS artefacts and per-session hook run logs from showing up in status / PRs. Logs are regenerated by run-tests-for-change.sh each time Claude edits code; they have no cross-session value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two bugs surfaced when actually running ./e2e/scenarios/run_testnet.sh with populated .env: 1. ${VAR:-default} shell-style default expansion isn't handled by the integration runner — commands.py:151 uses os.path.expandvars(), which only recognizes $VAR / ${VAR} and leaves ${VAR:-x} untouched. The literal string then gets shlex-split on the space, breaking env parsing. Drop the :-default fallback; run_testnet.sh already checks required vars before invoking each scenario. 2. Transaction-hash assertion regex required a 0x prefix, but the SDK returns bare 64-hex hashes on BSC. Loosen to ^(0x)?[0-9a-fA-F]{64}$, matching the gasfree scenario's existing pattern. Verified: exact_testnet now settles a real BSC Testnet tx and passes end to end (7 steps, ~11s). exact_permit_testnet progresses past config to reveal a separate ecosystem issue (BSC testnet USDT doesn't implement EIP-2612 permit — tracked as a follow-up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- solutions.md #7: BSC testnet has no EIP-2612 permit-capable token. Captures why exact_permit_testnet returns 404 and lists the options (Sepolia / custom test token / rely on unit suites), so the next person doesn't rediscover it. - solutions.md #8: integration runner doesn't expand ${VAR:-default}. Captures the os.path.expandvars limitation that caused the testnet scenarios to silently break on a literal fallback string. - testnet/README.md: add "Current status" table with today's verified BSC tx hash as evidence, and a "Gotchas" section linking back to solutions #7 / #8, the tx-hash regex, and the log locations. No code change — just freezing today's testnet run state so future attempts start from known ground. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pecs/ Upstream x402-foundation/x402 keeps all protocol specs under a single top-level specs/ dir. We had a redundant docs/specs/ shadowing what belongs at specs/, which made "which one is authoritative" a coin flip and drifted us further from upstream every time we edited. Moves: - docs/specs/{protocol,roles,config}.md → specs/*.md - docs/specs/{exact,exact-permit,exact-gasfree}.md → specs/schemes/*.md specs/ now hosts both the authoritative protocol specs (top-level .md + schemes/) and the in-flight feature dirs (001-exact-v2-compat/, ...), matching upstream's schemes/ + transports-v*/ shape. Updated 46 cross-references across CLAUDE.md tree, .claude/rules/, .claude/agents/, docs/, and typescript/packages/x402/. docs/ is now just user-facing: solutions.md (hard-won knowledge) and design/ (in-flight design docs). Not authoritative; defers to specs/. Verified: 217 pytest + 51 vitest + 8 e2e scenarios still green after the rename — nothing in the code referenced the old paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…anisms Permit-family facilitator mechanisms (EVM exact_permit, TRON exact_permit, TRON exact_gasfree) all inherit BaseExactPermitFacilitatorMechanism which requires a base_fee: dict[str, int] allow-list at construction time. examples/facilitator/server.py was constructing them without base_fee, so every token in every registry was reported Unsupported and every testnet run of the permit/gasfree tiers silently returned HTTP 404. - server.py now reads FACILITATOR_BASE_FEE as a JSON dict and passes it to all three permit/gasfree mechanism constructors; logs a clear warning when a permit/gasfree scheme is registered without it. - testnet scenario configs now forward FACILITATOR_BASE_FEE into the facilitator subprocess. - .env.example / README updated. - docs/solutions.md #7 corrected — the earlier entry blaming BSC testnet USDT for not implementing EIP-2612 was a misdiagnosis; USDT does implement permit. The real cause was the missing base_fee config. Verified live 2026-04-24: - exact_permit_testnet: BSC tx 0xfc8b32decb99d02cdfc684d3f6f1c7c0a91c8b0ff1f632f98d86d9f334198a23 (USDT permit + transferFrom, block 103475005) - exact_gasfree_testnet: fee_quote / verify / settle all traverse; final settle reports insufficient balance in the derived GasFree custodial wallet when it isn't funded — structural path end-to-end works. Regression: 217 pytest + 51 vitest + 8 mock e2e all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

All three testnet scenarios now verified end-to-end on real chain: - exact_testnet — BSC tx 461d17b...159a0 - exact_permit_testnet — BSC tx 0xfc8b32d...98a23 - exact_gasfree_testnet — TRON Nile tx 1d77f242b72293116e65c46b5ad756dd2f8355ebc625078aec0eb4ea54d148d2 Two new solutions entries from closing the gasfree loop: #9 GasFree gasFreeAddress is per-query, not absolute. The API returns a fresh mapping for every queried address; our SDK queries with the main wallet and writes that layer's gasFreeAddress into the payload. If you deposit to a gasFreeAddress obtained by querying any *other* address (e.g. one resolved from a second hop), the balance won't register. Always deposit to api_client.get_address_info(main_wallet)["gasFreeAddress"]. #10 TRON multisig / witness accounts reject single-key signatures under the default permission_id=0 (owner). If owner_permission.threshold > 1, sign under an active permission via .permission_id(2) — confirmed on the Nile witness account used in our testnet .env. tronpy default path would otherwise return "Validate signature error: sig error" with no hint at the permission cause. Testnet README status table updated: all three rows ✅ verified. Regression: 217 pytest + 51 vitest + 8 mock e2e still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-24T08:31:05Z

Code Review Report

Project: x402 (BankofAI SDK)
PR: main → feat/x402-ai
Review Date: 2026-04-24
Reviewer: AI Code Reviewer (Code Review Skill v1.0.0)

PR Overview

Branch Information

Property	Value
From Branch	`main`
To Branch	`feat/x402-ai`
Commits	11
Files Changed	79
Lines Added	+4579
Lines Removed	-0 (pure addition)

Commit History

Hash	Message
`84d26e6`	docs(gasfree): record testnet win + two new gotchas
`ea04ab2`	feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
`f4c0c34`	refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
`3130e21`	docs(e2e): record real-chain evidence + testnet gotchas
`8045ba1`	fix(e2e): make testnet scenarios runnable
`45be4ca`	chore(ai): gitignore .DS_Store and .claude/hooks/logs/
`e0136ec`	feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
`1983edf`	feat(e2e): add testnet tier with real chain settlement
`3509b70`	feat(e2e): expand harness to 8 scenarios across EVM + TRON
`900e5bd`	feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
`aec6836`	docs(design): draft AI transformation plan

Review Summary

Verdict

Verdict: Request Changes

One Critical defect (hardcoded developer path) must be fixed before merge. Three Major issues (file-descriptor leak, module-level side-effect monkey-patch, unhandled startup crash) should be addressed. Five Minor issues and two Suggestions are noted below.

Findings at a Glance

	Critical	Major	Minor	Suggestion
Count	1	3	5	2

Summary

This PR introduces the full P1 AI-native scaffolding for the x402 repository: a .claude/ rules and agent knowledge base, a component-level CLAUDE.md tree, a generic declarative integration step-runner (integration/), and an eight-scenario e2e harness (e2e/) covering EVM and TRON happy paths plus five failure modes. It also reorganises docs/specs/ → specs/ and adds a testnet tier wired into a nightly GitHub Actions workflow.

The overall architecture is well-considered. The JSON-driven scenario runner, the dual mock/testnet tier split, the GasFree API mock colocation, and the asyncRewake hook pattern all reflect careful design. Secret hygiene is solid: .env.example carries no real values, .env is git-ignored, and CI consumes GitHub Secrets.

However, one defect is blocking: the PostToolUse hook contains a hardcoded absolute developer path (/Users/bobo/code/x402/x402), which silently renders the entire hook infrastructure non-functional on every other machine. Three additional Major issues — a file-descriptor leak in @start_bg, an import-time monkey-patch in the resource server, and an unhandled json.loads crash path in the example facilitator — should be resolved before the testnet tier goes to production.

Change Summary

1. AI-Native Scaffolding (`.claude/`)

File	Change Type	Description
`CLAUDE.md`	Added	Root-level repo guide for Claude Code
`.claude/rules/CLAUDE.md`	Added	Rules knowledge-base index and authoring guide
`.claude/rules/common/conventions.md`	Added	Repo-wide conventions (addressing, amounts, headers, pipeline)
`.claude/rules/networks/evm.md`	Added	EVM network conventions (signing, contracts, smoke tests)
`.claude/rules/networks/tron.md`	Added	TRON network conventions (TIP-712, GasFree, RPC defaults)
`.claude/rules/schemes/exact.md`	Added	`exact` scheme invariants and gotchas
`.claude/rules/schemes/exact-permit.md`	Added	`exact_permit` scheme invariants
`.claude/rules/schemes/exact-gasfree.md`	Added	`exact_gasfree` scheme invariants (TRON-only)
`.claude/rules/python/conventions.md`	Added	Python 3.11+ tooling and idiom rules
`.claude/rules/typescript/conventions.md`	Added	Node 20+/pnpm/ESM/viem conventions
`.claude/rules/testing/scenarios.md`	Added	e2e scenario authoring conventions
`.claude/agents/code-reviewer.md`	Added	x402-flavored code-review agent definition
`.claude/agents/security-reviewer.md`	Added	Security-focused review agent definition
`.claude/agents/scheme-author.md`	Added	Scheme-authoring wizard agent definition
`.claude/commands/x402/compound.md`	Added	`/x402:compound` slash-command wizard
`.claude/commands/x402/create-scenario.md`	Added	`/x402:create-scenario` wizard
`.claude/hooks/run-tests-for-change.sh`	Added	PostToolUse bash hook dispatching pytest/vitest/e2e
`.claude/settings.json`	Added	Hook wiring with `asyncRewake`

Purpose: Establishes Claude Code as a first-class participant in the development workflow by providing scoped rule sheets, agent personas, slash-command wizards, and an automated test hook.

2. Integration Step Runner (`integration/`)

File	Change Type	Description
`integration/run.py`	Added	Generic JSON-driven sequential step executor with Markdown reporter
`integration/commands.py`	Added	`@builtin` registry: `@json_diff`, `@json_match`, `@start_bg`, `@stop_bg`, `@stop_all_bg`, `@wait_http`
`integration/README.md`	Added	CLI reference
`integration/CLAUDE.md`	Added	Component guide
`integration/configs/smoke.json`	Added	Self-test config

Purpose: Provides the infrastructure that e2e scenario configs (config.json) are executed against — a generic, dependency-free Python 3.9+ runner.

3. E2E Test Harness (`e2e/`)

File	Change Type	Description
`e2e/mock_facilitator/app.py`	Added	FastAPI mock serving facilitator + GasFree API with mode-driven behavior
`e2e/mock_facilitator/__main__.py`	Added	CLI entry point for mock facilitator
`e2e/clients/test_client.py`	Added	Real-signature EVM + TRON e2e client
`e2e/servers/resource_server.py`	Added	FastAPI resource server using `@x402_protected`
`e2e/scenarios/*/config.json`	Added	8 mock scenarios + 3 testnet scenarios
`e2e/scenarios//expected_.json`	Added	Diff-asserted expected responses
`e2e/scenarios/run_all.sh`	Added	Mock suite CI entry point
`e2e/scenarios/run_testnet.sh`	Added	Testnet suite with credential-gating
`e2e/README.md`, `e2e/CLAUDE.md`	Added	Documentation

Purpose: End-to-end validation across EVM (BSC) and TRON (Nile) for all three schemes and five failure modes, runnable offline (mock tier) and on real chains (testnet tier).

4. Example Facilitator (`examples/facilitator/`)

File	Change Type	Description
`examples/facilitator/server.py`	Added	Reference FastAPI facilitator wiring EVM + TRON mechanisms
`examples/facilitator/__main__.py`	Added	CLI entry point
`examples/CLAUDE.md`	Added	Component guide

Purpose: Provides a runnable reference facilitator used by the testnet e2e tier and as a canonical implementation example.

5. Docs, Specs Reorganisation, CI

File	Change Type	Description
`specs/` (renamed from `docs/specs/`)	Renamed	Mirrors upstream `x402-foundation/x402/specs/` layout
`docs/design/ai-transformation.md`	Added	AI-native development plan
`docs/solutions.md`	Added	Incident-driven bug-avoidance checklist
`.env.example`	Added	Template for testnet credentials
`.github/workflows/check_e2e.yml`	Added	PR CI: mock e2e suite
`.github/workflows/nightly_testnet.yml`	Added	Nightly: real-chain testnet scenarios
`.gitignore`	Modified	Adds `.DS_Store`, `.claude/hooks/logs/`

Detailed Findings

Critical

[C-01] Hardcoded Developer Absolute Path in PostToolUse Hook

Property	Value
Severity	Critical
Category	Correctness
File	`.claude/hooks/run-tests-for-change.sh` : Line 20

Description

The hook hard-codes an absolute path to the original developer's local checkout:

REPO_ROOT="/Users/bobo/code/x402/x402"

All subsequent path logic depends on this variable. On any other machine, FILE_PATH#$REPO_ROOT/ will not strip the repo prefix, leaving REL equal to the full absolute path. Every case match will fail, and the hook will silently exit 0 (the "*" catch-all logs "skip (no rule for …)" and exits 0). The asyncRewake mechanism will therefore never fire for anyone other than the original author. The entire test automation promised by the hook is non-functional on CI and all other developer machines.

Code

REPO_ROOT="/Users/bobo/code/x402/x402"
LOG_DIR="$REPO_ROOT/.claude/hooks/logs"
...
REL="${FILE_PATH#$REPO_ROOT/}"

Recommendation

Derive REPO_ROOT dynamically from the script's own location (which Claude Code passes as a resolved absolute path via the hook input):

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"

This makes the hook portable across machines, containers, and CI workers. The LOG_DIR derivation then follows naturally.

Major

[MJ-01] File Descriptor Leak in `@start_bg` Builtin

Property	Value
Severity	Major
Category	Correctness
File	`integration/commands.py` : Lines 164–173

Description

log_fd is opened for writing and passed as the stdout of the background subprocess, but the parent process never closes it after Popen returns. The parent retains an open file descriptor for the duration of the step runner process. For suites with many scenarios and frequent restarts, this leaks one file descriptor per @start_bg call. On Linux the default RLIMIT_NOFILE is 1024; large test suites could exhaust this.

Code

log_fd = open(log_path, "w", encoding="utf-8")   # opened, never closed in parent
proc = subprocess.Popen(
    command,
    shell=True,
    cwd=workdir,
    stdout=log_fd,
    stderr=subprocess.STDOUT,
    start_new_session=True,
)
pid_file.write_text(str(proc.pid))
return (0, f"started tag={tag} pid={proc.pid} log={log_path}", "")
# log_fd is never closed here

Recommendation

with open(log_path, "w", encoding="utf-8") as log_fd:
    proc = subprocess.Popen(
        command,
        shell=True,
        cwd=workdir,
        stdout=log_fd,
        stderr=subprocess.STDOUT,
        start_new_session=True,
    )
# After 'with' exits, the parent's copy of the fd is closed.
# The child inherits its own copy and keeps writing.
pid_file.write_text(str(proc.pid))
return (0, f"started tag={tag} pid={proc.pid} log={log_path}", "")

[MJ-02] Module-Level Monkey-Patch Applied at Import Time in Resource Server

Property	Value
Severity	Major
Category	Correctness / Quality
File	`e2e/servers/resource_server.py` : Lines 35–53

Description

_install_noop_tx_verifier_if_requested() is called unconditionally at module-import time (line 53), before build_app(). It replaces tx_verification.get_verifier_for_network with a function that always raises ValueError, globally disabling on-chain transaction verification for the entire Python process whenever E2E_SKIP_TX_VERIFICATION=1 is set.

Two concrete problems:

Silent global mutation: any other code co-located in the same uvicorn worker (including future middleware) that calls get_verifier_for_network will also see the monkey-patched version, even if it wasn't intended to skip verification.
Import-time side effects: if this module is ever imported in a test context (unit tests, introspection tools) with the env var set, verification is silently disabled for the entire test run.

app = build_app() on line 140 also runs all mechanism registrations and environment variable reads at import time — any misconfiguration raises an exception that surfaces as an import error rather than a startup error, which is harder to diagnose.

Code

# line 53 — runs unconditionally at import time
_install_noop_tx_verifier_if_requested()

# line 140 — also runs at import time
app = build_app()

Recommendation

Move the monkey-patch inside build_app() (or inside main()), and gate it explicitly:

def build_app() -> FastAPI:
    # Apply e2e overrides only when explicitly requested, and only during app construction
    if os.environ.get("E2E_SKIP_TX_VERIFICATION") == "1":
        _install_noop_tx_verifier_if_requested()
    ...

The app = build_app() module-level call is unavoidable for uvicorn's "module:app" string import — that pattern is standard — but the monkey-patch should be deferred to inside build_app() rather than before it.

[MJ-03] Unhandled Exceptions in `_env_base_fee()` Crash the Facilitator at Startup

Property	Value
Severity	Major
Category	Correctness
File	`examples/facilitator/server.py` : Lines 69–77

Description

_env_base_fee() parses FACILITATOR_BASE_FEE with json.loads() and casts values via int(). Neither call is guarded:

A malformed JSON string raises json.JSONDecodeError.
A non-integer value raises ValueError.

Both propagate through _register_evm() and _register_tron() into build_app(), which runs at module import time. The crash surfaces as an ImportError / traceback from uvicorn with no actionable error message, making misconfiguration very hard to diagnose in production.

Code

def _env_base_fee() -> dict[str, int] | None:
    raw = os.environ.get("FACILITATOR_BASE_FEE")
    if not raw:
        return None
    import json
    parsed = json.loads(raw)                        # unguarded — JSONDecodeError
    return {str(k).upper(): int(v) for k, v in parsed.items()}  # unguarded — ValueError

Recommendation

def _env_base_fee() -> dict[str, int] | None:
    raw = os.environ.get("FACILITATOR_BASE_FEE")
    if not raw:
        return None
    import json
    try:
        parsed = json.loads(raw)
        return {str(k).upper(): int(v) for k, v in parsed.items()}
    except (json.JSONDecodeError, ValueError, TypeError) as exc:
        print(
            f"error: FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}",
            file=sys.stderr,
        )
        sys.exit(2)

Minor

[MN-01] PostToolUse Hook Mode Names Documented Incorrectly in `e2e/CLAUDE.md`

Property	Value
Severity	Minor
Category	Documentation
File	`e2e/CLAUDE.md` : Gotchas section; `e2e/mock_facilitator/__main__.py` : Lines 20–28

Description

e2e/CLAUDE.md documents the mock facilitator modes as:

--mode {success, invalid_sig, deadline_expired, network_mismatch, insufficient_balance, settle_reverted}

The actual CLI choices in __main__.py and _VALID_MODES in app.py are:

fail_verify_invalid_sig
fail_verify_expired
fail_verify_network_mismatch
fail_settle_insufficient
fail_settle_revert

Every name in the documentation is wrong. A developer following the docs and passing --mode invalid_sig will get an argparse error. The mismatch will confuse anyone adding a new scenario.

Recommendation

Update e2e/CLAUDE.md to reflect the actual mode values, or adopt the shorter documented names as the actual CLI choices (which are more readable). The latter is preferable.

[MN-02] `_resolve_mode` Raises `ValueError` Inside FastAPI Handler — Unhandled 500

Property	Value
Severity	Minor
Category	Correctness
File	`e2e/mock_facilitator/app.py` : Lines 72–77

Description

When an invalid ?mode= query parameter is passed to /verify or /settle, _resolve_mode raises ValueError. FastAPI converts this to an HTTP 500 with a stack trace, rather than a 422 or 400 validation error. For a test tool used interactively via curl or a scenario, this makes debugging harder.

Code

def _resolve_mode(mode_query: str | None) -> str:
    if mode_query:
        if mode_query not in _VALID_MODES:
            raise ValueError(f"invalid mode: {mode_query}")  # becomes HTTP 500

Recommendation

from fastapi import HTTPException

def _resolve_mode(mode_query: str | None) -> str:
    if mode_query:
        if mode_query not in _VALID_MODES:
            raise HTTPException(
                status_code=422,
                detail=f"invalid mode '{mode_query}'. valid: {sorted(_VALID_MODES)}",
            )
    return _default_mode()

[MN-03] Latent Bug: `exact` Scheme Registered with EVM Mechanism for Any Network Prefix

Property	Value
Severity	Minor
Category	Correctness
File	`e2e/clients/test_client.py` : Lines 157–162

Description

The _register_client_mechanisms function registers ExactEvmClientMechanism for the exact scheme regardless of the network prefix:

if scheme == "exact":
    from bankofai.x402.mechanisms.evm.exact import ExactEvmClientMechanism
    x402.register(network, ExactEvmClientMechanism(signer))

If a future scenario sets E2E_NETWORK=tron:nile and E2E_SCHEMES=exact, the EVM mechanism will be silently registered for a TRON network. The payment attempt will fail at signing with an opaque error. The exact_permit branch below it correctly checks the network prefix (startswith("eip155:") vs startswith("tron:")). The exact branch should mirror that guard.

Recommendation

Add a network guard consistent with the exact_permit pattern:

if scheme == "exact":
    if not network.startswith("eip155:"):
        raise SystemExit(f"scheme 'exact' (ERC-3009) requires eip155:* network, got {network}")
    from bankofai.x402.mechanisms.evm.exact import ExactEvmClientMechanism
    x402.register(network, ExactEvmClientMechanism(signer))

[MN-04] `_register_client_mechanisms` Type Annotation Accepts EVM-Only Signer for TRON Paths

Property	Value
Severity	Minor
Category	Code Quality
File	`e2e/clients/test_client.py` : Lines 154–155

Description

The function signature types signer as EvmClientSigner:

def _register_client_mechanisms(
    x402: X402Client, network: str, schemes: list[str], signer: EvmClientSigner
) -> None:

But when network.startswith("tron:"), run() passes an _OfflineTronSigner (subclass of TronClientSigner), not EvmClientSigner. The type annotation is incorrect and will produce a mypy error in strict mode, contradicting the Python conventions rule that the module must stay mypy-green.

Recommendation

Use a union or a common protocol type:

from bankofai.x402.signers.client import EvmClientSigner, TronClientSigner
from typing import Union

def _register_client_mechanisms(
    x402: X402Client,
    network: str,
    schemes: list[str],
    signer: Union[EvmClientSigner, TronClientSigner],
) -> None:

[MN-05] `stop_bg` TOCTOU Race on `os.getpgid` / `os.killpg`

Property	Value
Severity	Minor
Category	Correctness
File	`integration/commands.py` : Lines 229–234

Description

There is a race between reading the PID from the pid file, calling os.getpgid(pid), and calling os.killpg(...). If the process exits and its PID is reused by an unrelated process between these two calls, os.getpgid may succeed and os.killpg may signal the wrong process group. The ProcessLookupError guard catches the case where the PID is gone entirely, but not the reuse case.

This is an inherent limitation of PID-based process management. In a local e2e harness with short-lived services it is unlikely to trigger, but worth acknowledging.

Code

try:
    os.killpg(os.getpgid(pid), signal.SIGTERM)   # TOCTOU: pid may be reused
except ProcessLookupError:
    ...

Recommendation

Since start_bg uses start_new_session=True, the subprocess is its own session leader and its PGID equals its PID. Replace os.getpgid(pid) with pid directly to avoid the extra syscall:

try:
    os.killpg(pid, signal.SIGTERM)   # PGID == PID for session leaders
except ProcessLookupError:
    ...

This does not eliminate the TOCTOU window but removes an unnecessary syscall and makes the code's assumption explicit.

Suggestions

[S-01] Strengthen `TRON_TEST_KEY` Warning Comment

File: e2e/clients/test_client.py : Line 48

Description: The deterministic TRON test key 0x111...1 is all-ones — trivially derivable. While the existing comment ("Not secret, just a fixed 32-byte hex so scenarios are reproducible") is correct, a developer who copies this value into their .env for testnet testing could accidentally use it on mainnet. The Anvil key (ANVIL_KEY_0) is similarly well-known but has strong cultural context as "test-only". Add a more explicit warning.

Suggestion:

# Deterministic test TRON key — all-ones, trivially guessable.
# SAFE FOR LOCAL/MOCK E2E ONLY. Never use on mainnet or any real-value network.
TRON_TEST_KEY = "0x1111111111111111111111111111111111111111111111111111111111111111"

[S-02] `run_all.sh` Does Not Surface Partial Service Leak on Outer Script Crash

File: e2e/scenarios/run_all.sh

Description: The outer loop collects failures without aborting early, which is correct. However, if the script itself is interrupted (Ctrl+C, SIGKILL from CI timeout), background services started by an in-progress scenario may be left running. Each scenario ends with @stop_all_bg, but that step is only reached if the scenario runner itself completes. An EXIT trap on the outer script would ensure cleanup:

Suggestion:

trap 'python3 -m integration.run --config /dev/null 2>/dev/null; pkill -f "e2e.mock_facilitator" 2>/dev/null || true' EXIT

Or more simply, rely on the integration runner's @stop_all_bg already being present in each scenario, and document that users should run @stop_all_bg manually if the suite is killed mid-run.

Positive Observations

Area	Observation
E2E Architecture	JSON-only declarative scenarios with a generic step runner is the right abstraction. Zero custom Python per scenario keeps the harness maintainable and addition of new scenarios trivial.
Tier Separation	Mock tier (offline, fast, always-CI) vs. testnet tier (real-chain, nightly, credential-gated) is exactly correct. The skip-on-missing-env pattern in `run_testnet.sh` prevents CI noise on forks.
Mode-Driven Mock	Single mock facilitator supporting all failure modes via `MOCK_FACILITATOR_MODE` / `?mode=` query param avoids proliferating bespoke mock endpoints. The `_VALID_MODES` set provides a clean invariant.
Secret Hygiene	`.env.example` with blank values, `.env` in `.gitignore`, CI wired to GitHub Secrets — no credentials in the diff.
`asyncRewake` Hook	Using `asyncRewake: true` in `settings.json` so Claude Code wakes when tests fail is a thoughtful productivity improvement over a blocking hook.
`@builtin` Registry	The `@builtin` decorator pattern in `commands.py` is clean — registration, dispatch, and self-documenting `--list-builtins` are all driven by a single decorator call.
GasFree Mock Colocation	Serving the GasFree API from the same mock facilitator process (on `/api/v1/address/{user}` and `/api/v1/config/provider/all`) removes the need for a third mock process per scenario. Elegant.
Fail-Fast with Full Reporting	`integration/run.py` correctly stops at the first failed step (fail-fast) while still writing a full Markdown report, and the outer `run_all.sh` collects all scenario failures before reporting. Both behaviors are appropriate for their layers.
`_coerce_response_payload`	The recursive coercion function in `test_client.py` ensures the output JSON is always serializable without `json.JSONEncoder` subclassing — simple and effective.
Rules Knowledge Base	The `.claude/rules/` tree is well-structured: each rule file has a clear scope trigger, cites the incident that motivated it, and links back to `docs/solutions.md` and `specs/`. The authoring guidelines (`rules/CLAUDE.md`) make the system self-maintaining.

Checklist Results

Category	Items Checked	Pass	Fail	Notes
Correctness	8	4	4	C-01 (hook path), MJ-01 (fd leak), MJ-02 (monkey-patch), MN-03 (latent exact/TRON)
Security	6	6	0	No secrets in diff; no injection vectors; `.env.example` only
Performance	5	5	0	No N+1, no unbounded ops in changed code
Code Quality	7	5	2	MN-04 (type annotation), MJ-02 (import-time side effect)
Testing	5	5	0	8 mock + 3 testnet scenarios covering happy + unhappy paths
Documentation	4	3	1	MN-01 (mode name mismatch in CLAUDE.md)
Compatibility	3	3	0	Specs rename is non-breaking; no wire-format changes
Observability	4	4	0	Request logging in mock facilitator; log artifact upload in CI

Disclaimer

This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between main and feat/x402-ai. Runtime behavior, integration testing, and deployment impact are not covered. No SDK source code (under python/x402/src/ or typescript/packages/) was changed in this PR; all findings pertain exclusively to the new tooling and harness code added by this PR.

Report generated by Code Review Skill v1.0.0
Date: 2026-04-24

Resolves all blocking + major + minor findings from the code review on feat/x402-ai; no SDK source changes, harness + hook + examples only. Critical: - C-01 hook hardcoded developer path → derive REPO_ROOT from ${BASH_SOURCE[0]}; makes .claude/hooks/run-tests-for-change.sh portable across machines and CI. Major: - MJ-01 @start_bg fd leak → wrap log_fd in `with open(...)`; parent closes its copy after Popen, child keeps writing via its inherited fd. - MJ-02 resource_server module-level monkey-patch → move _install_noop_tx_verifier_if_requested() call into build_app() so the side-effect is scoped to app construction, not import time. - MJ-03 unguarded json.loads in _env_base_fee() → catch JSONDecodeError/ValueError/TypeError/AttributeError and exit 2 with a clear stderr message; previously a bad FACILITATOR_BASE_FEE string would crash the facilitator at import time with no actionable error. Minor: - MN-01 stale mode names in e2e/CLAUDE.md and .claude/rules/testing/scenarios.md → updated to authoritative _VALID_MODES set (fail_verify_*, fail_settle_*). - MN-02 _resolve_mode raised ValueError → now raises FastAPI HTTPException(422) with valid-mode list in detail. - MN-03 `exact` scheme in test_client skipped network-prefix guard that `exact_permit` had → add explicit eip155:* check so a stray tron:* scenario fails loud instead of signing with the wrong mechanism. - MN-04 _register_client_mechanisms signer type annotation → widen to `EvmClientSigner | TronClientSigner`; hoist TronClientSigner to module-level import. - MN-05 stop_bg TOCTOU on os.getpgid(pid) → use pid directly; start_bg sets start_new_session=True so pgid == pid for our children. Removes a syscall and makes the invariant explicit. Suggestion: - S-01 TRON_TEST_KEY warning strengthened; now explicitly flags "mainnet-unsafe, address publicly known, funds will be swept". Not addressed: - S-02 (outer run_all.sh EXIT trap) — each scenario already calls @stop_all_bg; adding an outer-script trap creates CI-timeout edge cases without clear benefit. Documented limitation acceptable. Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios still green. Hook portable-path smoke: running against /Users/.../x402/README.md now correctly resolves REL=README.md and skips the docs path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

boboliu-1010 · 2026-04-24T08:45:35Z

Addressed all findings from the review in c4a547c (just pushed).

Finding	Status	Fix
C-01 hook hardcoded path	✅	Derive `REPO_ROOT` from `${BASH_SOURCE[0]}`; portable across machines/CI
MJ-01 fd leak in `@start_bg`	✅	`with open(log_path, ...)` — parent closes its copy after Popen
MJ-02 import-time monkey-patch	✅	Moved `_install_noop_tx_verifier_if_requested()` call into `build_app()`
MJ-03 unguarded `json.loads` in `_env_base_fee`	✅	Catch `JSONDecodeError/ValueError/TypeError/AttributeError`, stderr + exit 2
MN-01 stale mode names in docs	✅	Updated `e2e/CLAUDE.md` + `.claude/rules/testing/scenarios.md` to `_VALID_MODES`
MN-02 `_resolve_mode` → HTTP 500	✅	`HTTPException(422, detail="...valid: [...]")`
MN-03 `exact` missing network guard	✅	Added `eip155:*` check mirroring the `exact_permit` branch
MN-04 signer type annotation	✅	`EvmClientSigner
MN-05 `stop_bg` TOCTOU `os.getpgid`	✅	Use `pid` directly (pgid == pid for session leaders)
S-01 TRON test key warning	✅	Strengthened: "mainnet-unsafe, publicly known, funds will be swept"
S-02 outer `run_all.sh` EXIT trap	⏭ deferred	Each scenario already runs `@stop_all_bg`; outer trap adds CI-timeout edge cases without clear benefit. Documented as known limitation.

Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios still green. Hook portable-path smoke passes.

github-actions · 2026-04-24T10:45:28Z

Code Review Report

Project: x402 (BankofAI SDK)
PR: main -> feat/x402-ai
Review Date: 2026-04-24
Reviewer: AI Code Reviewer (Code Review Skill v1.0.0)

PR Overview

Branch Information

Property	Value
From Branch	`main`
To Branch	`feat/x402-ai`
Commits	12
Files Changed	79
Lines Added	+4,604
Lines Removed	-0

Commit History

Hash	Message
`c4a547c`	fix(review): address PR #69 code review findings
`84d26e6`	docs(gasfree): record testnet win + two new gotchas
`ea04ab2`	feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
`f4c0c34`	refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
`3130e21`	docs(e2e): record real-chain evidence + testnet gotchas
`8045ba1`	fix(e2e): make testnet scenarios runnable
`45be4ca`	chore(ai): gitignore .DS_Store and .claude/hooks/logs/
`e0136ec`	feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
`1983edf`	feat(e2e): add testnet tier with real chain settlement
`3509b70`	feat(e2e): expand harness to 8 scenarios across EVM + TRON
`900e5bd`	feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
`aec6836`	docs(design): draft AI transformation plan

Review Summary

Verdict

Verdict: Request Changes

Findings at a Glance

	Critical	Major	Minor	Suggestion
Count	0	2	5	4

Summary

This PR establishes the AI-native developer infrastructure for the x402 repo. The additions are exclusively additive (zero lines deleted from main): AI scaffolding (.claude/ rules, agents, hooks, settings), a full e2e test harness (e2e/, integration/), a reference facilitator implementation (examples/facilitator/), two CI workflows (mock + nightly testnet), and supporting documentation and configuration.

The overall structure is well-thought-out. The integration runner is clean, the declarative scenario format is correctly implemented, the mock facilitator faithfully implements the x402 wire contract, and the shell scripts use appropriate set -euo pipefail hygiene. Process group management in the background service lifecycle (start_bg/stop_bg) correctly uses start_new_session=True and os.killpg, avoiding orphan-child issues.

Two issues require changes before merge. The reference facilitator (examples/facilitator/server.py) silently drops the request context on malformed /verify and /settle bodies (KeyError → 500), while its own /fee/quote handler uses defensive .get(). Since this file is a primary adoption reference, the inconsistency teaches a bad pattern. Additionally, the CI e2e workflow installs uvicorn, httpx, and fastapi via a bare uv pip install outside the project's dependency manifest — these are test-infrastructure dependencies that should be declared to keep the lockfile authoritative.

Change Summary

1. AI Scaffolding (`.claude/`)

File	Change Type	Description
`.claude/agents/code-reviewer.md`	Added	x402-flavored code review agent with wire-format hotspots
`.claude/agents/security-reviewer.md`	Added	Payment-path / signing review agent
`.claude/agents/scheme-author.md`	Added	Scheme spec + rule authoring agent
`.claude/commands/x402/compound.md`	Added	`/x402:compound` wizard for `docs/solutions.md`
`.claude/commands/x402/create-scenario.md`	Added	`/x402:create-scenario` wizard
`.claude/hooks/run-tests-for-change.sh`	Added	`PostToolUse` hook: run scoped test suite after file edit
`.claude/rules/**`	Added	Language, network, scheme, and testing rule sheets
`.claude/settings.json`	Added	Hook wiring for PostToolUse

Purpose: Provides Claude Code with x402-domain knowledge, behavioral guardrails, and automated test feedback on every code edit.

2. E2E Test Harness (`e2e/` + `integration/`)

File	Change Type	Description
`integration/run.py`	Added	Generic JSON-driven step runner with Markdown report output
`integration/commands.py`	Added	`@builtin` registry: `@start_bg`, `@stop_bg`, `@stop_all_bg`, `@wait_http`, `@json_diff`, `@json_match`
`e2e/mock_facilitator/app.py`	Added	FastAPI mock facilitator with 6 injectable modes + GasFree API stubs
`e2e/mock_facilitator/__main__.py`	Added	CLI entry (`python -m e2e.mock_facilitator`)
`e2e/clients/test_client.py`	Added	Real-signature EVM + TRON client with offline signer variants
`e2e/servers/resource_server.py`	Added	`@x402_protected` FastAPI resource server configurable via `E2E_*` env vars
`e2e/scenarios/*/config.json`	Added	8 mock scenarios (3 happy path + 5 unhappy) + 3 testnet scenarios
`e2e/scenarios/run_all.sh`	Added	Mock suite entry point (CI-triggered per PR)
`e2e/scenarios/run_testnet.sh`	Added	Testnet suite entry point with per-scenario skip logic

Purpose: End-to-end verification of the full EVM + TRON payment flow without (mock tier) or with (testnet tier) real on-chain settlement.

3. Reference Facilitator (`examples/facilitator/`)

File	Change Type	Description
`examples/facilitator/server.py`	Added	FastAPI binding over `X402Facilitator` with env-var-driven EVM + TRON registration
`examples/facilitator/__main__.py`	Added	`python -m examples.facilitator` entry

Purpose: Canonical facilitator deployment pattern used by testnet e2e tier; also serves as adoption reference for SDK users.

4. CI Workflows

File	Change Type	Description
`.github/workflows/check_e2e.yml`	Added	PR-gated mock e2e suite (no secrets required)
`.github/workflows/nightly_testnet.yml`	Added	Nightly testnet run, gated to `BofAI` org, uses GitHub Secrets

5. Documentation and Configuration

File	Change Type	Description
`CLAUDE.md`	Added	Repo-wide AI entrypoint with component map and safety rules
`{python,typescript,e2e,specs,...}/CLAUDE.md`	Added	Component-level AI context (build commands, idioms, non-goals)
`docs/design/ai-transformation.md`	Added	P1–P4 roadmap for Claude-Code-native adoption
`docs/solutions.md`	Added	Hard-won debugging knowledge base (3 entries: TRON hex, deadline bounds, balance source)
`specs/`	Renamed	Spec files moved from `docs/specs/` — no content change
`.env.example`	Added	Template for testnet credentials (git-ignored)
`.gitignore`	Modified	Added `.DS_Store` and `.claude/hooks/logs/`

Detailed Findings

Major

[MJ-01] Missing input validation on `/verify` and `/settle` in the reference facilitator

Property	Value
Severity	Major
Category	Correctness / Code Quality
File	`examples/facilitator/server.py` : Lines 293–303

Description

The /verify and /settle endpoint handlers access body["paymentPayload"] and body["paymentRequirements"] with direct key subscripting. If a caller sends a body that omits either key (malformed client, misconfigured test harness, network truncation), Python raises KeyError, which FastAPI translates to an unhandled HTTP 500 — not the semantically correct 422/400. This is inconsistent with the same file's /fee/quote handler, which uses body.get("accepts", []) defensively.

Since examples/facilitator/server.py is the primary adoption reference for SDK integrators building their own facilitators, this pattern will be copied into production deployments.

Code

# examples/facilitator/server.py — lines 293-303
@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])          # KeyError → 500
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError → 500
    resp = await facilitator.verify(payload, requirements)
    return JSONResponse(resp.model_dump(by_alias=True))

@app.post("/settle")
async def _settle(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])          # KeyError → 500
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError → 500
    resp = await facilitator.settle(payload, requirements)
    return JSONResponse(resp.model_dump(by_alias=True))

Recommendation

from fastapi import HTTPException

@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
    try:
        payload = PaymentPayload(**body["paymentPayload"])
        requirements = PaymentRequirements(**body["paymentRequirements"])
    except KeyError as exc:
        raise HTTPException(status_code=422, detail=f"missing field: {exc}") from exc
    resp = await facilitator.verify(payload, requirements)
    return JSONResponse(resp.model_dump(by_alias=True))

# Same pattern for _settle

[MJ-02] E2E dependency installation bypasses the project lockfile

Property	Value
Severity	Major
Category	Correctness / Compatibility
File	`.github/workflows/check_e2e.yml` : Lines 22–24

Description

uvicorn, httpx, and fastapi are installed via a bare uv pip install after uv sync, rather than being declared in the project's pyproject.toml dependency groups. This means:

Their versions are not pinned in uv.lock — a new release of any of these packages could break CI silently on the next run, with no diff traceable to a PR.
The workflow's caching step (cache-dependency-glob: "python/x402/uv.lock") does not cover these packages, so a stale cache may hold the wrong versions.
Developers running uv sync --group dev locally do not get these packages, causing local/CI divergence when running e2e/scenarios/run_all.sh.

Code

# .github/workflows/check_e2e.yml — lines 22-24
- name: Install SDK in editable mode + e2e deps
  working-directory: ./python/x402
  run: |
    uv sync --all-extras --dev
    uv pip install uvicorn httpx fastapi   # <-- outside the lockfile

Recommendation

Add an e2e optional dependency group to python/x402/pyproject.toml:

[project.optional-dependencies]
e2e = ["uvicorn>=0.29", "httpx>=0.27", "fastapi>=0.111"]

Then replace the workflow step with:

run: uv sync --extra e2e --dev

This ensures versions are pinned in uv.lock, the cache is valid, and local developer environments match CI.

Minor

[MN-01] `**` glob in bash `case` patterns is misleading

Property	Value
Severity	Minor
Category	Code Quality
File	`.claude/hooks/run-tests-for-change.sh` : Lines 57–62

Description

The hook dispatches to the right test suite using case "$REL" in patterns. It includes both *.py and **/*.py variants for each directory:

case "$REL" in
  python/x402/src/*.py | python/x402/tests/*.py | \
  python/x402/src/**/*.py | python/x402/tests/**/*.py)

In bash case statements, * already matches any string including / (pattern matching semantics, not pathname expansion). So python/x402/src/*.py already matches python/x402/src/mechanisms/evm/exact.py. The **/*.py patterns are therefore redundant and may mislead future maintainers into believing bash case needs ** for recursive matching. While the script is functionally correct, the extra patterns add noise.

Recommendation

Remove the redundant ** variants to avoid confusion:

case "$REL" in
  python/x402/src/*.py | python/x402/tests/*.py)
    ...
  typescript/packages/*/src/*.ts | typescript/packages/*/src/*.tsx)
    ...

[MN-02] Mock GasFree provider and user address are identical

Property	Value
Severity	Minor
Category	Testing
File	`e2e/mock_facilitator/app.py` : Lines 128–132

Description

MOCK_GASFREE_PROVIDER and MOCK_GASFREE_ADDRESS are set to the same TRON address. Production GasFree flows have distinct provider (relayer) and user custodial wallet addresses. Using identical addresses in the mock means that any code path in the SDK that validates provider != gasFreeAddress is never exercised.

Code

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"
MOCK_GASFREE_ADDRESS  = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"  # same address

Recommendation

Use two distinct valid TRON addresses:

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"   # relayer / service provider
MOCK_GASFREE_ADDRESS  = "TDqe43YvBVq6DQ96DgNJ9sXSPLJHr2Hk2d"   # user's custodial wallet

[MN-03] Module-level monkey-patch in resource server has process-wide scope

Property	Value
Severity	Minor
Category	Code Quality
File	`e2e/servers/resource_server.py` : Lines 43–50

Description

_install_noop_tx_verifier_if_requested() patches tx_verification.get_verifier_for_network at the module level (called unconditionally during build_app()). The comment acknowledges the side effect but describes it as "scoped to this entry point." In practice, it mutates a module-level function reference for the entire Python process lifetime — and in uvicorn's reload mode, build_app() may be called multiple times, re-applying the patch.

This is acceptable for a test fixture, but the scoping claim in the comment is incorrect and could mislead contributors who import this module in other contexts.

Code

# resource_server.py lines 43-50
def _install_noop_tx_verifier_if_requested() -> None:
    if os.environ.get("E2E_SKIP_TX_VERIFICATION") != "1":
        return
    from bankofai.x402.utils import tx_verification

    def _always_raise(network: str, rpc_url: str | None = None):
        raise ValueError(f"e2e: tx verification disabled for {network}")

    tx_verification.get_verifier_for_network = _always_raise  # process-wide mutation

Recommendation

Update the comment to accurately describe the scope:

# Replaces the module-level `get_verifier_for_network` function for the entire process.
# This is intentional for e2e scenarios: the mock facilitator's deterministic tx hash
# is not a real on-chain transaction, so verification must be skipped globally.
# NEVER import this module in production code.

[MN-04] Parallel arrays in `run_testnet.sh` lack a length guard

Property	Value
Severity	Minor
Category	Code Quality
File	`e2e/scenarios/run_testnet.sh` : Lines 20–28

Description

The testnet runner uses two parallel bash arrays — names[] and required[] — indexed together. If a contributor adds a scenario to names[] without adding its required-vars entry to required[] (or vice versa), the wrong required-var list is silently applied to a scenario, potentially running a testnet scenario without credentials and generating a false "PASS" or a misleading error.

Code

declare -a names=(
  "exact_testnet"
  "exact_permit_testnet"
  "exact_gasfree_testnet"
)
declare -a required=(
  "BSC_TESTNET_FACILITATOR_KEY BSC_TESTNET_CLIENT_KEY BSC_TESTNET_PAY_TO"
  "BSC_TESTNET_FACILITATOR_KEY BSC_TESTNET_CLIENT_KEY BSC_TESTNET_PAY_TO"
  "TRON_NILE_FACILITATOR_KEY TRON_NILE_CLIENT_KEY TRON_NILE_PAY_TO GASFREE_NILE_API_URL"
)
# No assertion that ${#names[@]} == ${#required[@]}

Recommendation

Add a guard immediately after the array declarations:

if [[ ${#names[@]} -ne ${#required[@]} ]]; then
  echo "bug: names[] and required[] arrays are out of sync" >&2
  exit 2
fi

[MN-05] `print()` used for startup error in example facilitator (violates library convention)

Property	Value
Severity	Minor
Category	Code Quality
File	`examples/facilitator/server.py` : Lines 81–84

Description

_env_base_fee() uses print(..., file=sys.stderr) to report a parse error and then calls sys.exit(2). The Python conventions in .claude/rules/python/conventions.md state "No print in library code." While examples/facilitator/server.py is application code (not library code), _env_base_fee() is called during build_app(), which is also invoked at module import time (line 309: app = build_app()). Any downstream import of this module (e.g., for testing or inspection) would trigger sys.exit(2) on a bad env var, making it difficult to unit-test build_app() in isolation.

Code

# examples/facilitator/server.py lines 79-84
    except (json.JSONDecodeError, ValueError, TypeError, AttributeError) as exc:
        print(
            f"error: FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}",
            file=sys.stderr,
        )
        sys.exit(2)

Recommendation

Raise a ValueError from _env_base_fee() and let main() (or build_app()) catch it and call sys.exit(2):

    except (json.JSONDecodeError, ValueError, TypeError, AttributeError) as exc:
        raise ValueError(
            f"FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}"
        ) from exc

Then in build_app() (or main()):

try:
    base_fee = _env_base_fee()
except ValueError as exc:
    logging.getLogger("x402.example_facilitator").error("%s", exc)
    sys.exit(2)

Suggestions

[S-01] Add `trap`-based cleanup to `run_all.sh`

File: e2e/scenarios/run_all.sh

Description: When a scenario fails partway (e.g., the client step), integration/run.py's fail-fast exits before the final @stop_all_bg step runs. The next scenario's @start_bg does kill the previous service via the pid file, but if the two ports bind at exactly the same time during the handoff window, the new service fails to start. Adding a trap guarantees cleanup on any exit path.

Suggestion:

# At the top of run_all.sh, after ROOT_DIR is defined:
trap 'python3 -m integration.run --config /dev/stdin <<< '"'"'{"steps":[{"command":"@stop_all_bg","name":"trap cleanup"}]}'"'"'' EXIT

Or, simpler — call @stop_all_bg directly:

cleanup() { python3 -c "from integration.commands import stop_all_bg; stop_all_bg([], '.')"; }
trap cleanup EXIT

[S-02] Use distinct mock nonce value in GasFree address response

File: e2e/mock_facilitator/app.py : Line 202

Description: The mock GasFree /api/v1/address/{user} response returns "nonce": 0. A nonce of 0 is valid for the first ever GasFree transaction but could mask bugs in replay-protection logic that compares nonces. Using a non-zero nonce (e.g., 42) would ensure that nonce-handling code paths are exercised in mock scenarios.

Suggestion: Change "nonce": 0 to "nonce": 42 in the mock response.

[S-03] Validate `url` scheme in `wait_http` builtin

File: integration/commands.py : Lines 196–213

Description: wait_http passes the url argument directly to urllib.request.urlopen with no scheme validation. Passing a file:// URL or a non-HTTP scheme would silently open a local file (for read) or raise an obscure error. Since scenario configs are version-controlled and the risk is low, this is purely a defensive suggestion.

Suggestion:

from urllib.parse import urlparse

def wait_http(args, workdir):
    ...
    parsed = urlparse(url)
    if parsed.scheme not in ("http", "https"):
        return (2, "", f"@wait_http: url must be http(s), got scheme: {parsed.scheme!r}")
    ...

[S-04] Surface GasFree API mock mode in `supported` response

File: e2e/mock_facilitator/app.py : Lines 138–141

Description: The /supported endpoint returns a hard-coded list of DEFAULT_SUPPORTED_KINDS regardless of the running mode. In a fail_verify_* mode, a client seeing exact_gasfree listed as supported may be surprised when the subsequent /verify returns invalid. Emitting the current mode as an X-Mock-Mode response header (or a meta field in the response) would make debugging mock-tier failures faster.

Suggestion:

@app.get("/supported")
async def supported() -> dict[str, Any]:
    _log({"endpoint": "supported"})
    return {
        "kinds": DEFAULT_SUPPORTED_KINDS,
        "_mockMode": _default_mode(),   # debug aid, ignored by real clients
    }

Positive Observations

Area	Observation
Process lifecycle	`start_bg` uses `start_new_session=True` + `os.killpg` for correct process-group cleanup, avoiding orphaned uvicorn worker processes.
Fail-fast + recovery	`start_bg` proactively kills the previous process with the same tag before starting a new one, giving scenarios graceful recovery from a prior run's leftover services.
No real keys in fixtures	`ANVIL_KEY_0` and `TRON_TEST_KEY` are both well-known, widely-published test vectors. The TRON key carries a prominent warning comment. All real credentials flow through env vars, and `.env` is `.gitignore`d.
CI fork protection	`nightly_testnet.yml` guards with `if: github.repository_owner == 'BofAI'`, correctly preventing fork PRs from accessing secrets.
Testnet self-skip logic	`run_testnet.sh` skips scenarios whose required env vars are blank (rather than failing them). This allows running the full script locally with a partial `.env` without false CI failures.
Wire format fidelity	Mock facilitator `/verify` and `/settle` responses match the x402 v2 wire contract (`isValid`, `invalidReason`, `success`, `errorReason`, `transaction`, `network`) exactly as defined in the protocol spec.
Declarative scenarios	All 8 mock and 3 testnet scenarios are pure JSON — no bespoke Python test logic. The `@builtin` system is the right abstraction for keeping test configs readable.
`json_match` for testnet	Testnet scenarios correctly use `@json_match` (partial field matching with regex for the tx hash) while mock scenarios use `@json_diff` (exact equality). The distinction is principled and correctly implemented.
Offline signer pattern	`_AllowanceSkippingEvmSigner` and `_OfflineTronSigner` cleanly override only the chain-RPC methods, keeping the real signing path intact for mock tests. This correctly validates x402 payment mechanics without a live node.
`FACILITATOR_BASE_FEE` design	Centralizing the base-fee configuration as a single JSON env var (instead of per-token env vars) is ergonomic and correctly propagated to both EVM and TRON mechanisms.

Checklist Results

Category	Items Checked	Pass	Fail	N/A	Notes
Correctness	8	7	1	0	MJ-01: missing key validation in facilitator endpoints
Security	10	10	0	0	No secrets in code; proper fork/secret guard in CI; test keys clearly labeled
Performance	7	7	0	0	No N+1, no unbounded loops, no blocking-in-async
Code Quality	8	4	4	0	MN-01 through MN-05: redundant glob, identical mock addresses, module-scope mutation, parallel arrays, print in startup path
Testing	7	6	1	0	MJ-02: e2e deps not in lockfile; 8 mock + 3 testnet scenarios otherwise well-structured
Documentation	6	6	0	0	Every new public entry point has a docstring and CLAUDE.md
Compatibility	5	5	0	5	No existing APIs modified; all additions are backward-compatible
Observability	4	4	0	0	Request log in mock facilitator; structured Markdown reports from runner

Disclaimer

This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between the specified branches (main...feat/x402-ai). Runtime behavior, integration testing, and deployment impact are not covered. No changes to the SDK's core payment path (python/x402/src/, typescript/packages/) were present in this PR — review of payment-path logic should follow the security-reviewer agent protocol for any subsequent PR touching those paths.

Report generated by Code Review Skill v1.0.0
Date: 2026-04-24

The design doc was a draft v0 internal planning document with heavy organization-specific context (BofAI, SUN.io, sun-protocol/apollo-arena, internal branch names like feat/x402-ai-mcp/budget/matrix that never materialized) and an "Open questions" section. From a community contributor's perspective it read as either (a) required process to follow — which it never was — or (b) out-of-date planning that kept confusing the reader. P1 is already executed and demonstrable in the .claude/ tree + CLAUDE.md + specs/; P2..P4 are aspirational and belong in issues, not checked-in docs. Also removes the now-empty docs/design/ directory. Rewired 7 dangling references: - CLAUDE.md root: drop the "see design doc" line; shorten the AI-native section to one sentence; drop the docs/design/ row from the table. - docs/CLAUDE.md: drop the design/ row and the "Design docs go here" rule. - python/x402/CLAUDE.md: drop "P2" breadcrumb on the mcp stub line. - typescript/CLAUDE.md: drop "P2 in [design]" + "see design doc" on mcp and http rows respectively. - integration/CLAUDE.md: rewrite the matrix-testing non-goal without the dead link. - e2e/scenarios/exact_permit_happy_path/README.md: the "Facilitator binary with --mock" prereq now points to e2e/mock_facilitator/ that ships. - .claude/agents/scheme-author.md: suggest an in-flight feature spec under specs/<NNN>-<slug>/ instead of a design doc. Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-24T11:21:57Z

Code Review Report

Project: x402 (BankofAI SDK)
PR: main -> feat/x402-ai
Review Date: 2026-04-24
Reviewer: AI Code Reviewer (Code Review Skill v1.0.0)

PR Overview

Branch Information

Property	Value
From Branch	`main`
To Branch	`feat/x402-ai`
Commits	13
Files Changed	78
Lines Added	+4339
Lines Removed	-0

Commit History

Hash	Message
`8c8ab6f`	docs: delete ai-transformation design doc; clean dangling refs
`c4a547c`	fix(review): address PR #69 code review findings
`84d26e6`	docs(gasfree): record testnet win + two new gotchas
`ea04ab2`	feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
`f4c0c34`	refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
`3130e21`	docs(e2e): record real-chain evidence + testnet gotchas
`8045ba1`	fix(e2e): make testnet scenarios runnable
`45be4ca`	chore(ai): gitignore .DS_Store and .claude/hooks/logs/
`e0136ec`	feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
`1983edf`	feat(e2e): add testnet tier with real chain settlement
`3509b70`	feat(e2e): expand harness to 8 scenarios across EVM + TRON
`900e5bd`	feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
`aec6836`	docs(design): draft AI transformation plan

Review Summary

Verdict

Verdict: Request Changes

Findings at a Glance

	Critical	Major	Minor	Suggestion
Count	0	2	5	4

Summary

This PR delivers a comprehensive AI-native development scaffold: the complete .claude/ rule and agent tree, a two-tier e2e test harness (8 mock scenarios + 3 testnet scenarios), a generic integration step runner, the reference facilitator server used for testnet validation, CI workflows, and an extensive docs/solutions.md knowledge base. The scope is infra/tooling only — no production payment-path code is changed, so no signing or settlement bugs are introduced.

The overall quality is high. The design is clean, well-structured, and thoughtfully documented. However, two Major issues require fixes before merge: the jq dependency in the PostToolUse hook silently disables all automated test runs when jq is absent, and the reference facilitator server has unhandled KeyErrors on its /verify and /settle endpoints that expose raw 500s on malformed input. Five Minor issues and four Suggestions round out the review.

Change Summary

1. AI-native scaffolding (`.claude/`)

File	Change Type	Description
`.claude/agents/code-reviewer.md`	Added	x402-flavored code reviewer agent definition
`.claude/agents/security-reviewer.md`	Added	Security reviewer with x402 signing checklist
`.claude/agents/scheme-author.md`	Added	Scheme authoring agent
`.claude/commands/x402/compound.md`	Added	`/x402:compound` slash-command wizard
`.claude/commands/x402/create-scenario.md`	Added	`/x402:create-scenario` slash-command wizard
`.claude/hooks/run-tests-for-change.sh`	Added	PostToolUse hook: dispatches test suite after edits
`.claude/rules/**`	Added	Rule sheets for common, schemes, networks, python, typescript, testing
`.claude/settings.json`	Added	Hook wiring (`PostToolUse` on Edit/Write/MultiEdit/NotebookEdit)

Purpose: Encode x402 conventions, domain knowledge, and code-review checklists as machine-readable rules so Claude can contribute safely without reading every file first.

2. E2E test harness (`e2e/`, `integration/`)

File	Change Type	Description
`e2e/clients/test_client.py`	Added	EVM + TRON wallet, 402→retry flow, JSON output
`e2e/mock_facilitator/app.py`	Added	FastAPI mock: facilitator + GasFree API (mode-driven)
`e2e/mock_facilitator/__main__.py`	Added	CLI entry for the mock facilitator
`e2e/servers/resource_server.py`	Added	FastAPI resource server using `@x402_protected`
`e2e/scenarios/*/config.json`	Added	8 mock scenarios (3 happy path, 5 unhappy)
`e2e/scenarios/testnet/*/config.json`	Added	3 testnet scenarios with real chain settlement
`e2e/scenarios/run_all.sh`	Added	Mock suite entry point (CI)
`e2e/scenarios/run_testnet.sh`	Added	Testnet suite entry point (nightly)
`integration/run.py`	Added	Generic sequential step runner with Markdown reporting
`integration/commands.py`	Added	`@json_diff`, `@json_match`, `@start_bg`, `@stop_bg`, `@wait_http` built-ins

Purpose: Provide a no-secrets, deterministic CI test suite for all three schemes across EVM and TRON, plus a nightly testnet tier with real on-chain settlement.

3. Reference facilitator (`examples/facilitator/`)

File	Change Type	Description
`examples/facilitator/server.py`	Added	FastAPI wrapper for `X402Facilitator`; EVM + TRON registration from env vars
`examples/facilitator/__main__.py`	Added	CLI entry

Purpose: Standalone reference implementation used by the testnet e2e tier and as documentation for integrators.

4. CI workflows (`.github/workflows/`)

File	Change Type	Description
`.github/workflows/check_e2e.yml`	Added	PR check: runs the 8 mock scenarios
`.github/workflows/nightly_testnet.yml`	Added	Daily: testnet scenarios with secrets; skipped on forks

Purpose: Wire the mock suite into PR CI and establish a nightly testnet gate.

5. Documentation and miscellaneous

File	Change Type	Description
`docs/solutions.md`	Added	10 documented bugs with symptom/root-cause/fix/rule format
`CLAUDE.md` (root + components)	Added	Project orientation for contributors and AI agents
`specs/` reorganization	Renamed	`docs/specs/` → `specs/` (upstream layout mirror)
`.env.example`	Added	Testnet credential template (no secrets)
`.gitignore`	Modified	Adds `.DS_Store`, `.claude/hooks/logs/`

Detailed Findings

Major

[MJ-01] jq absence silently disables the PostToolUse test hook

Property	Value
Severity	Major
Category	Correctness / Testing
File	`.claude/hooks/run-tests-for-change.sh` : Lines 29–38

Description

The hook extracts FILE_PATH by piping stdin through jq, with 2>/dev/null swallowing all errors:

FILE_PATH="$(printf '%s' "$INPUT" | jq -r '.tool_input.file_path // .tool_response.filePath // empty' 2>/dev/null)"

If jq is not installed on the developer's machine (or a CI runner), the subshell returns an empty string. The very next guard exits 0:

if [[ -z "$FILE_PATH" ]]; then
  exit 0
fi

The result is that every hook invocation silently succeeds without running any test. The developer or CI sees no error — tests are simply never executed. Because jq is not declared as a dependency anywhere in the repository, this creates a silent test-bypass risk that is hard to diagnose.

Recommendation

Add an explicit jq availability check at the top of the script, before any stdin reading:

if ! command -v jq >/dev/null 2>&1; then
  echo "run-tests-for-change: 'jq' not found — install jq to enable test autorun" >&2
  exit 0   # still exit 0 so Claude isn't blocked, but at least it's visible
fi

Also document jq as a requirement in the root CLAUDE.md setup section.

[MJ-02] Unhandled KeyError on malformed bodies in reference facilitator `/verify` and `/settle`

Property	Value
Severity	Major
Category	Correctness / Robustness
File	`examples/facilitator/server.py` : Lines 293–303

Description

The /verify and /settle endpoints access required body keys with plain dict subscript:

@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])       # KeyError if missing
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError if missing

@app.post("/settle")
async def _settle(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])       # KeyError if missing
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError if missing

A missing key propagates as an unhandled exception through FastAPI, returning an opaque HTTP 500 instead of a descriptive 422. Because this file is explicitly labeled "reference implementation" in examples/CLAUDE.md, integrators may copy this pattern into production facilitators. The testnet e2e tier calls these endpoints under real-network conditions; any client sending a subtly malformed body during debugging would receive a 500 with no actionable error message.

Recommendation

Use .get() with a meaningful 422 response, or switch to a typed Pydantic request model so FastAPI validates automatically:

from pydantic import BaseModel

class VerifyRequest(BaseModel):
    paymentPayload: dict
    paymentRequirements: dict

@app.post("/verify")
async def _verify(body: VerifyRequest) -> JSONResponse:
    payload = PaymentPayload(**body.paymentPayload)
    requirements = PaymentRequirements(**body.paymentRequirements)
    ...

FastAPI then returns a 422 with field-level detail on missing/malformed keys at no extra cost.

Minor

[MN-01] `eval` used for test command execution in hook script

Property	Value
Severity	Minor
Category	Code Quality / Security
File	`.claude/hooks/run-tests-for-change.sh` : Line 84

Description

OUTPUT="$(cd "$TEST_CWD" && eval "$TEST_CMD" 2>&1)"

$TEST_CMD is currently populated only from hardcoded case-statement branches (e.g. "uv run pytest -x -q --tb=short"), so the immediate security risk is low. However, the TypeScript branch constructs $TEST_CWD from the user-controlled $PKG_REL (derived from $FILE_PATH), which comes from hook stdin. If $PKG_REL ever propagated into $TEST_CMD (e.g. via a future path-specific flag), the eval would be exploitable. More practically: eval makes the script harder to reason about and harder to test statically.

Recommendation

Replace with direct invocation or bash -c:

# Instead of eval "$TEST_CMD":
cd "$TEST_CWD" || exit 2
OUTPUT="$(bash -c "$TEST_CMD" 2>&1)"

Or, for fully safe invocation, restructure TEST_CMD as an array and use "${TEST_CMD_ARRAY[@]}".

[MN-02] `MOCK_GASFREE_PROVIDER` and `MOCK_GASFREE_ADDRESS` are the same address

Property	Value
Severity	Minor
Category	Testing / Correctness
File	`e2e/mock_facilitator/app.py` : Lines 128–132

Description

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"
MOCK_GASFREE_ADDRESS  = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"

In the real GasFree protocol these are semantically distinct: the provider (serviceProvider) is the relayer that submits on-chain transactions, while the GasFree address (gasFreeAddress) is the user's custodial wallet that holds the tokens. solutions.md entry #9 specifically documents how these two addresses are different levels of an address mapping chain.

Using the same value for both means the mock cannot catch bugs where client or facilitator code inadvertently swaps serviceProvider for the custodial wallet address (or vice versa). It also understates the distinction to developers reading the mock code for protocol guidance.

Recommendation

Use two distinct, clearly labeled addresses:

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"   # relayer/service-provider
MOCK_GASFREE_ADDRESS  = "TEkxiTehnzSmSe2XqrBj4w32RUN966rdz8"    # user custodial wallet

Add a comment citing solutions.md #9 so future maintainers understand why they differ.

[MN-03] Module-level `build_app()` call with conditional side effects at import time

Property	Value
Severity	Minor
Category	Correctness / Code Quality
File	`e2e/servers/resource_server.py` : Lines 119, 141

Description

app = build_app() executes unconditionally at module import time. Inside build_app(), _install_noop_tx_verifier_if_requested() may monkey-patch a global function in bankofai.x402.utils.tx_verification:

def _install_noop_tx_verifier_if_requested() -> None:
    if os.environ.get("E2E_SKIP_TX_VERIFICATION") != "1":
        return
    from bankofai.x402.utils import tx_verification
    tx_verification.get_verifier_for_network = _always_raise  # module-level mutation

Any tool (test runner, IDE import analyzer, python -c "import e2e.servers.resource_server") that imports this module while E2E_SKIP_TX_VERIFICATION=1 is set will silently corrupt the tx_verification module for the rest of the Python process. Because both resource_server.py and examples/facilitator/server.py execute their build_app() at import time, the patch applies before the caller can prevent it.

Recommendation

Guard the module-level app initialization behind if __name__ == "__main__": for the monkey-patch case, or restructure so app is created lazily (e.g., via an application factory pattern where the caller invokes build_app() explicitly). At minimum, add a module docstring note warning that importing this module with E2E_SKIP_TX_VERIFICATION=1 set will mutate a shared utility module.

[MN-04] `run_all.sh` does not continue running remaining scenarios after a mid-run failure

Property	Value
Severity	Minor
Category	Testing
File	`e2e/scenarios/run_all.sh` : Lines 21–28

Description

for s in "${scenarios[@]}"; do
  cfg="e2e/scenarios/$s/config.json"
  if ! python3 -m integration.run --config "$cfg"; then
    failed+=("$s")
  fi
done

Each individual integration/run.py invocation is fail-fast (stops at the first failing step). If a scenario fails before its @stop_all_bg step (e.g. the client step fails), the mock facilitator and resource server processes remain running on ports 4020/4021. run_all.sh then moves on to the next scenario, which tries to start new services on the same ports.

The @start_bg builtin does attempt to kill the previous process via its PID file:

if pid_file.exists():
    old_pid = int(pid_file.read_text().strip())
    os.kill(old_pid, signal.SIGTERM)

This mitigates the issue in most cases, but there is no explicit wait for the port to be freed by the OS after SIGTERM. Under load on CI, the subsequent @start_bg may launch the new process before the TCP socket clears, causing the @wait_http readiness poll to eventually time out and fail the next scenario with a confusing error. Uvicorn's default graceful shutdown takes up to 5 seconds.

Recommendation

Add a brief @stop_all_bg step to the beginning of each scenario config (not just the end) as a "clean start" guard, OR add an explicit sleep/port-wait after @stop_all_bg in each scenario. The simplest fix is a "cleanup" first step:

{
  "name": "ensure no stale services from previous run",
  "command": "@stop_all_bg",
  "timeout": 10
}

This would make each scenario self-sufficient regardless of what the previous one left behind.

[MN-05] Missing per-scenario `README.md` files — convention violated for 10 of 11 directories

Property	Value
Severity	Minor
Category	Documentation
File	`e2e/scenarios/*/`

Description

.claude/rules/testing/scenarios.md explicitly lists README.md as a required file for every scenario directory. Only e2e/scenarios/exact_permit_happy_path/README.md was added. The remaining directories are missing READMEs:

exact_happy_path/
exact_gasfree_happy_path/
unhappy_verify_invalid_sig/
unhappy_verify_expired/
unhappy_verify_network_mismatch/
unhappy_settle_insufficient/
unhappy_settle_revert/
testnet/exact_testnet/
testnet/exact_permit_testnet/
testnet/exact_gasfree_testnet/

This matters for maintainability: a developer reading an unfamiliar scenario directory has no immediate explanation of purpose, preconditions, or run command.

Recommendation

Add minimal READMEs for each missing directory. The exact_permit_happy_path/README.md can serve as the template.

Suggestions

[S-01] Pin GitHub Actions to SHA commits for supply-chain security

File: .github/workflows/check_e2e.yml, nightly_testnet.yml
Description: Both workflows reference actions at mutable major-version tags (actions/checkout@v4, astral-sh/setup-uv@v5, actions/upload-artifact@v4). A compromised action tag could execute arbitrary code in CI with access to all GitHub Secrets.
Suggestion: Pin each action to a specific commit SHA, e.g. actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2. Tools like Dependabot or pin-github-action automate this.

[S-02] `@json_match` spec format cannot express `~` in a plain `=` expected value

File: integration/commands.py : Lines 313–336
Description: The spec parser uses ~ as the regex-operator token. The disambiguation logic spec.index("~") < spec.index("=") means that any =-type assertion whose expected value contains ~ (e.g. body.hash=0x1234~abcd) would be misrouted to the regex branch, causing a confusing test failure. The usage string does not document this limitation.
Suggestion: Document the limitation in the usage= string for @json_match. Alternatively, require explicit ~= (not ~) as the regex separator to eliminate ambiguity entirely.

[S-03] `shell=True` in subprocess calls throughout the integration runner

File: integration/run.py : Line 83; integration/commands.py : Line 167
Description: subprocess.run(command, shell=True, ...) and subprocess.Popen(command, shell=True, ...) are used to execute shell commands. While the commands originate from JSON config files and hardcoded strings (not direct user input), shell=True is a latent risk if the input sources ever expand. Additionally, it requires the shell to parse quoting/escaping, which is where solutions.md entry #8 bug came from.
Suggestion: Consider wrapping known fixed commands (e.g., pnpm test, uv run pytest) as argument lists and using shell=False. For variable commands from config files, document the shell injection risk explicitly in integration/README.md.

[S-04] Verify that `unhappy_settle_revert` uses a unique output path

File: e2e/scenarios/unhappy_settle_revert/config.json
Description: unhappy_settle_insufficient writes to /tmp/x402-e2e-unhappy-settle-response.json. This reviewer did not read unhappy_settle_revert/config.json — if it also writes to the same path, a failure of one scenario could leave stale data that causes the other to produce a false pass or a misleading diff. The verify-scenario group already uses unique paths (-verify-response.json, -verify-expired-response.json, etc.), so this may be already handled.
Suggestion: Verify that unhappy_settle_revert uses a distinct output path (e.g., /tmp/x402-e2e-unhappy-settle-revert-response.json) separate from unhappy_settle_insufficient's /tmp/x402-e2e-unhappy-settle-response.json.

Positive Observations

Area	Observation
`docs/solutions.md`	Exceptional knowledge base. 10 entries, each with consistent symptom/root-cause/fix/rule structure. Entries 1–3 directly map to documented production bugs with commit references — this is exactly the kind of tribal knowledge that prevents regressions.
Two-tier test design	Mock tier (no creds, ~30s) + testnet tier (nightly, real chain) is the right split. Testnet scenarios self-skip when secrets are absent rather than failing, which is the correct design for CI with optional credentials.
`@start_bg` robustness	The kill-before-restart logic (PID file + SIGTERM → SIGKILL escalation) and `start_new_session=True` process group isolation are solid process management. Most custom test harnesses get this wrong.
Declarative scenario format	Pure JSON `config.json` + `expected_*.json` with a generic runner is exactly right. Adding a scenario requires zero Python — copy a directory, edit JSON. This is a significant maintainability win.
`E2E_SKIP_ALLOWANCE` / `E2E_SKIP_TX_VERIFICATION` flags	Explicit environment flags for the two "can't run this offline" checks (allowance and tx verification) are well-designed. The overrides are narrowly scoped (no-chain RPC path only) and clearly documented.
Security defaults	`TRON_TEST_KEY` carries a prominent warning comment. `.env` is gitignored. GitHub Secrets are used for testnet credentials. The nightly workflow adds `github.repository_owner == 'BofAI'` guard to prevent fork-secret leakage.
`asyncRewake` hook integration	The Claude Code hook uses `asyncRewake: true` + `rewakeMessage` to surface test failures back to the model. This closes the feedback loop for AI-driven edits cleanly.
`_AllowanceSkippingEvmSigner` / `_OfflineTronSigner` subclass pattern	Keeping offline overrides as subclasses of the real signers (rather than duck-typed mocks) ensures the e2e client exercises the full SDK code path for everything except the chain RPC calls.
`run_all.sh` continues past failures	The outer script collects failed scenarios and reports them all rather than aborting at the first one. This gives complete CI feedback in a single run.
`integration/run.py` is pure	`run_step` and `format_report` are side-effect-free functions that take inputs and return outputs. State flows through return values. This makes the runner easy to test in isolation.

Checklist Results

Category	Items Checked	Pass	Fail	N/A	Notes
Correctness	8	6	2	0	MJ-02 (KeyError), MN-04 (port reuse)
Security	8	6	1	1	MJ-01 (silent hook bypass), S-03 (shell=True)
Performance	5	4	0	1	No DB/unbounded ops in changed code
Code Quality	8	6	2	0	MN-01 (eval), MN-03 (module-level side effects)
Testing	6	4	1	1	MN-05 (missing READMEs)
Documentation	5	4	1	0	MN-05 (per-scenario READMEs)
Compatibility	4	4	0	0	No breaking changes; specs/ rename is additive
Observability	4	4	0	0	JSONL log in mock facilitator; hook logs in `.claude/hooks/logs/`

Disclaimer

This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between main and feat/x402-ai. Runtime behavior, integration testing, and deployment impact are not covered. No production payment-path code (signing, verification, settlement) was changed in this PR; security findings are scoped to the tooling and test infrastructure introduced here.

Report generated by Code Review Skill v1.0.0
Date: 2026-04-24

bobo and others added 11 commits April 22, 2026 12:56

Will-Guan approved these changes Apr 24, 2026

View reviewed changes

Will-Guan merged commit f2518cc into main Apr 24, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ai): AI-native development scaffolding + e2e testnet verification#69

feat(ai): AI-native development scaffolding + e2e testnet verification#69
Will-Guan merged 13 commits intomainfrom
feat/x402-ai

boboliu-1010 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

boboliu-1010 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

boboliu-1010 commented Apr 24, 2026

Summary

Verified on real chain (2026-04-24)

Regression

Scope boundary

Migration notes for reviewers

Known follow-ups (not in this PR)

Commits (13, conventional)

Test plan

Uh oh!

github-actions Bot commented Apr 24, 2026

Code Review Report

PR Overview

Branch Information

Commit History

Review Summary

Verdict

Findings at a Glance

Summary

Change Summary

1. AI-Native Scaffolding (.claude/)

2. Integration Step Runner (integration/)

3. E2E Test Harness (e2e/)

4. Example Facilitator (examples/facilitator/)

5. Docs, Specs Reorganisation, CI

Detailed Findings

Critical

[C-01] Hardcoded Developer Absolute Path in PostToolUse Hook

Major

[MJ-01] File Descriptor Leak in @start_bg Builtin

[MJ-02] Module-Level Monkey-Patch Applied at Import Time in Resource Server

[MJ-03] Unhandled Exceptions in _env_base_fee() Crash the Facilitator at Startup

Minor

[MN-01] PostToolUse Hook Mode Names Documented Incorrectly in e2e/CLAUDE.md

[MN-02] _resolve_mode Raises ValueError Inside FastAPI Handler — Unhandled 500

[MN-03] Latent Bug: exact Scheme Registered with EVM Mechanism for Any Network Prefix

[MN-04] _register_client_mechanisms Type Annotation Accepts EVM-Only Signer for TRON Paths

[MN-05] stop_bg TOCTOU Race on os.getpgid / os.killpg

Suggestions

[S-01] Strengthen TRON_TEST_KEY Warning Comment

[S-02] run_all.sh Does Not Surface Partial Service Leak on Outer Script Crash

Positive Observations

Checklist Results

Disclaimer

Uh oh!

boboliu-1010 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Code Review Report

PR Overview

Branch Information

Commit History

Review Summary

Verdict

Findings at a Glance

Summary

Change Summary

1. AI Scaffolding (.claude/)

2. E2E Test Harness (e2e/ + integration/)

3. Reference Facilitator (examples/facilitator/)

4. CI Workflows

5. Documentation and Configuration

Detailed Findings

Major

[MJ-01] Missing input validation on /verify and /settle in the reference facilitator

[MJ-02] E2E dependency installation bypasses the project lockfile

Minor

[MN-01] ** glob in bash case patterns is misleading

[MN-02] Mock GasFree provider and user address are identical

[MN-03] Module-level monkey-patch in resource server has process-wide scope

[MN-04] Parallel arrays in run_testnet.sh lack a length guard

[MN-05] print() used for startup error in example facilitator (violates library convention)

Suggestions

[S-01] Add trap-based cleanup to run_all.sh

[S-02] Use distinct mock nonce value in GasFree address response

[S-03] Validate url scheme in wait_http builtin

[S-04] Surface GasFree API mock mode in supported response

Positive Observations

Checklist Results

1. AI-Native Scaffolding (`.claude/`)

2. Integration Step Runner (`integration/`)

3. E2E Test Harness (`e2e/`)

4. Example Facilitator (`examples/facilitator/`)

[MJ-01] File Descriptor Leak in `@start_bg` Builtin

[MJ-03] Unhandled Exceptions in `_env_base_fee()` Crash the Facilitator at Startup

[MN-01] PostToolUse Hook Mode Names Documented Incorrectly in `e2e/CLAUDE.md`

[MN-02] `_resolve_mode` Raises `ValueError` Inside FastAPI Handler — Unhandled 500

[MN-03] Latent Bug: `exact` Scheme Registered with EVM Mechanism for Any Network Prefix

[MN-04] `_register_client_mechanisms` Type Annotation Accepts EVM-Only Signer for TRON Paths

[MN-05] `stop_bg` TOCTOU Race on `os.getpgid` / `os.killpg`

[S-01] Strengthen `TRON_TEST_KEY` Warning Comment

[S-02] `run_all.sh` Does Not Surface Partial Service Leak on Outer Script Crash

1. AI Scaffolding (`.claude/`)

2. E2E Test Harness (`e2e/` + `integration/`)

3. Reference Facilitator (`examples/facilitator/`)

[MJ-01] Missing input validation on `/verify` and `/settle` in the reference facilitator

[MN-01] `**` glob in bash `case` patterns is misleading

[MN-04] Parallel arrays in `run_testnet.sh` lack a length guard

[MN-05] `print()` used for startup error in example facilitator (violates library convention)

[S-01] Add `trap`-based cleanup to `run_all.sh`

[S-03] Validate `url` scheme in `wait_http` builtin

[S-04] Surface GasFree API mock mode in `supported` response

1. AI-native scaffolding (`.claude/`)

2. E2E test harness (`e2e/`, `integration/`)

3. Reference facilitator (`examples/facilitator/`)

4. CI workflows (`.github/workflows/`)

[MJ-02] Unhandled KeyError on malformed bodies in reference facilitator `/verify` and `/settle`

[MN-01] `eval` used for test command execution in hook script

[MN-02] `MOCK_GASFREE_PROVIDER` and `MOCK_GASFREE_ADDRESS` are the same address

[MN-03] Module-level `build_app()` call with conditional side effects at import time

[MN-04] `run_all.sh` does not continue running remaining scenarios after a mid-run failure

[MN-05] Missing per-scenario `README.md` files — convention violated for 10 of 11 directories

[S-02] `@json_match` spec format cannot express `~` in a plain `=` expected value

[S-03] `shell=True` in subprocess calls throughout the integration runner

[S-04] Verify that `unhappy_settle_revert` uses a unique output path