feat(ai): AI-native development scaffolding + e2e testnet verification#69
feat(ai): AI-native development scaffolding + e2e testnet verification#69
Conversation
Consolidates the P1-P4 roadmap for turning x402 into a Claude-Code- native, agent-native payment platform. References Coinbase x402, sun-protocol/apollo-arena, and BofAI/skills as prior art; enumerates what already exists on main, what already exists in sibling repos, and what actually needs to be built (.claude/ scaffolding, MCP transport, declarative e2e scenarios).
…runner
Land the first phase of the AI transformation (see
docs/design/ai-transformation.md).
What this adds:
- CLAUDE.md (root): component index + key reading order + conventions
+ AI-native layout pointers + safety rules
- .claude/rules/common/conventions.md: repo-wide invariants (addressing,
encoding, payment ID, pipeline step ordering, commit message format)
- .claude/rules/schemes/{exact,exact-permit,exact-gasfree}.md: per-scheme
rules of thumb with gotchas cross-referenced to solutions.md
- .claude/rules/networks/{tron,evm}.md: network identifiers, signing rules,
contract registry pointers
- .claude/commands/x402/{compound,create-scenario}.md: two wizards
(compound implements the /x402:compound referenced by solutions.md;
create-scenario scaffolds e2e scenarios)
- .claude/agents/{code-reviewer,security-reviewer,scheme-author}.md:
x402-flavored subagent definitions
- integration/: generic Python step runner (no external deps)
with @json_diff built-in, JSON config format, Markdown report output.
Clean-room implementation; pattern inspired by sun-protocol/apollo-arena.
- integration/configs/smoke.json: self-test verified passing
- e2e/scenarios/exact_permit_happy_path/: canonical scenario template;
runnable once facilitator --mock mode lands in P2
Smoke test:
python3 integration/run.py --config integration/configs/smoke.json
→ 2/2 ✓, exit 0
Non-goals for this phase (deferred to P2+):
- MCP transport packages (mcp-ts, mcp-py)
- Matrix e2e harness
- mock-chainrpc and mock-facilitator
- Per-package CLAUDE.md files
Adds a full e2e test harness — mock facilitator, fixture resource
server, test client, and 8 declarative scenarios covering happy +
unhappy paths for all three schemes.
Scenarios (all green, ~30s total, fully deterministic, no chain RPC):
- exact_happy_path ERC-3009 on BSC testnet
- exact_permit_happy_path EIP-2612 on BSC testnet (allowance skipped)
- exact_gasfree_happy_path TIP-712 on TRON Nile (GasFree API mocked)
- unhappy_verify_invalid_sig / _expired / _network_mismatch
- unhappy_settle_insufficient / _revert
Key design choices:
- Live signing, mock verify — client signs real payloads; the mock
facilitator echoes success/failure based on its mode.
- One mock process serves both facilitator and GasFree API roles on
port 4020 — no second service needed.
- Test-only signer subclasses (_AllowanceSkippingEvmSigner,
_OfflineTronSigner) bypass on-chain balance/allowance checks
without touching SDK code.
- Resource server monkey-patches get_verifier_for_network when
E2E_SKIP_TX_VERIFICATION=1 so the TRON scenario doesn't try to
look up the fake tx hash on a live node.
Also:
- integration/commands.py: add @start_bg / @wait_http / @stop_bg /
@stop_all_bg builtins so scenarios can manage background services.
- .github/workflows/check_e2e.yml: runs run_all.sh on every PR.
- docs/design/ai-transformation.md: refresh §6.2 scenario table with
status column; mark P1 immediate-steps items done.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extends the e2e harness with a second tier that settles real on-chain
transactions on BSC testnet and TRON Nile, alongside the existing mock
suite. Scenarios self-skip when their env vars are blank, so the mock
path remains the default (and the CI default on PR).
New pieces:
examples/facilitator/ FastAPI server wrapping X402Facilitator.
Registers EVM/TRON mechanisms based on
which *_PRIVATE_KEY env vars are set.
Run with: python -m examples.facilitator
.env.example Template listing every testnet variable
(RPC URLs, signing keys, pay-to addresses,
GasFree API URL, price specs). .env is
already gitignored.
e2e/scenarios/testnet/ exact_testnet, exact_permit_testnet,
exact_gasfree_testnet — each uses the
example facilitator + real chain, with
@json_match assertions (tx hash is
dynamic so @json_diff won't work).
e2e/scenarios/run_testnet.sh Driver that checks required env per
scenario and skips when blank.
.github/workflows/nightly_testnet.yml Scheduled + workflow_dispatch
runner consuming GitHub Secrets.
Harness additions:
integration/commands.py @json_match New fuzzy assertion builtin:
@json_match <file> key.path=literal key.path~regex
Supports JSON literals (true/false/null/numbers) and regex for
dynamic fields like tx hashes.
integration/commands.py @start_bg Now expands ${VAR} before quoting so
scenario configs can template from env (needed to thread secrets
through nested `env KEY=${SECRET} python -m ...` invocations).
Mock suite (8 scenarios) remains the default and all stay green. The
testnet suite is a superset — same test_client.py and resource_server.py
drive both. Developers fill in .env to run locally; CI uses Secrets.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + test hook - Component CLAUDE.md in python/x402, typescript/, typescript/packages/x402, e2e/, examples/, docs/, specs/ — local build/test commands and conventions. - .claude/rules/CLAUDE.md index + python/testing/typescript rule files, so Claude loads the right rule layer per touched path. - .claude/settings.json + hooks/run-tests-for-change.sh: PostToolUse hook dispatches to pytest / vitest / e2e run_all.sh by path prefix; docs/config edits skip. Dropped --timeout=60 (pytest-timeout not a dep; outer 300s hook timeout already covers). - Root CLAUDE.md: refresh the rules table and add component-CLAUDE.md pointer row. Scope: docs + .claude/ only, per the AI-transformation scope. MCP package and tron-contribution/ stay out of this commit. Verified: 217 pytest + 51 vitest + 8 e2e scenarios all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevent macOS artefacts and per-session hook run logs from showing up in status / PRs. Logs are regenerated by run-tests-for-change.sh each time Claude edits code; they have no cross-session value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs surfaced when actually running ./e2e/scenarios/run_testnet.sh with
populated .env:
1. ${VAR:-default} shell-style default expansion isn't handled by the
integration runner — commands.py:151 uses os.path.expandvars(), which
only recognizes $VAR / ${VAR} and leaves ${VAR:-x} untouched. The
literal string then gets shlex-split on the space, breaking env
parsing. Drop the :-default fallback; run_testnet.sh already checks
required vars before invoking each scenario.
2. Transaction-hash assertion regex required a 0x prefix, but the SDK
returns bare 64-hex hashes on BSC. Loosen to ^(0x)?[0-9a-fA-F]{64}$,
matching the gasfree scenario's existing pattern.
Verified: exact_testnet now settles a real BSC Testnet tx and passes end
to end (7 steps, ~11s). exact_permit_testnet progresses past config to
reveal a separate ecosystem issue (BSC testnet USDT doesn't implement
EIP-2612 permit — tracked as a follow-up).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- solutions.md #7: BSC testnet has no EIP-2612 permit-capable token. Captures why exact_permit_testnet returns 404 and lists the options (Sepolia / custom test token / rely on unit suites), so the next person doesn't rediscover it. - solutions.md #8: integration runner doesn't expand ${VAR:-default}. Captures the os.path.expandvars limitation that caused the testnet scenarios to silently break on a literal fallback string. - testnet/README.md: add "Current status" table with today's verified BSC tx hash as evidence, and a "Gotchas" section linking back to solutions #7 / #8, the tx-hash regex, and the log locations. No code change — just freezing today's testnet run state so future attempts start from known ground. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pecs/
Upstream x402-foundation/x402 keeps all protocol specs under a single
top-level specs/ dir. We had a redundant docs/specs/ shadowing what
belongs at specs/, which made "which one is authoritative" a coin flip
and drifted us further from upstream every time we edited.
Moves:
- docs/specs/{protocol,roles,config}.md → specs/*.md
- docs/specs/{exact,exact-permit,exact-gasfree}.md → specs/schemes/*.md
specs/ now hosts both the authoritative protocol specs (top-level .md
+ schemes/) and the in-flight feature dirs (001-exact-v2-compat/, ...),
matching upstream's schemes/ + transports-v*/ shape.
Updated 46 cross-references across CLAUDE.md tree, .claude/rules/,
.claude/agents/, docs/, and typescript/packages/x402/.
docs/ is now just user-facing: solutions.md (hard-won knowledge) and
design/ (in-flight design docs). Not authoritative; defers to specs/.
Verified: 217 pytest + 51 vitest + 8 e2e scenarios still green after
the rename — nothing in the code referenced the old paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anisms Permit-family facilitator mechanisms (EVM exact_permit, TRON exact_permit, TRON exact_gasfree) all inherit BaseExactPermitFacilitatorMechanism which requires a base_fee: dict[str, int] allow-list at construction time. examples/facilitator/server.py was constructing them without base_fee, so every token in every registry was reported Unsupported and every testnet run of the permit/gasfree tiers silently returned HTTP 404. - server.py now reads FACILITATOR_BASE_FEE as a JSON dict and passes it to all three permit/gasfree mechanism constructors; logs a clear warning when a permit/gasfree scheme is registered without it. - testnet scenario configs now forward FACILITATOR_BASE_FEE into the facilitator subprocess. - .env.example / README updated. - docs/solutions.md #7 corrected — the earlier entry blaming BSC testnet USDT for not implementing EIP-2612 was a misdiagnosis; USDT does implement permit. The real cause was the missing base_fee config. Verified live 2026-04-24: - exact_permit_testnet: BSC tx 0xfc8b32decb99d02cdfc684d3f6f1c7c0a91c8b0ff1f632f98d86d9f334198a23 (USDT permit + transferFrom, block 103475005) - exact_gasfree_testnet: fee_quote / verify / settle all traverse; final settle reports insufficient balance in the derived GasFree custodial wallet when it isn't funded — structural path end-to-end works. Regression: 217 pytest + 51 vitest + 8 mock e2e all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three testnet scenarios now verified end-to-end on real chain: - exact_testnet — BSC tx 461d17b...159a0 - exact_permit_testnet — BSC tx 0xfc8b32d...98a23 - exact_gasfree_testnet — TRON Nile tx 1d77f242b72293116e65c46b5ad756dd2f8355ebc625078aec0eb4ea54d148d2 Two new solutions entries from closing the gasfree loop: #9 GasFree gasFreeAddress is per-query, not absolute. The API returns a fresh mapping for every queried address; our SDK queries with the main wallet and writes that layer's gasFreeAddress into the payload. If you deposit to a gasFreeAddress obtained by querying any *other* address (e.g. one resolved from a second hop), the balance won't register. Always deposit to api_client.get_address_info(main_wallet)["gasFreeAddress"]. #10 TRON multisig / witness accounts reject single-key signatures under the default permission_id=0 (owner). If owner_permission.threshold > 1, sign under an active permission via .permission_id(2) — confirmed on the Nile witness account used in our testnet .env. tronpy default path would otherwise return "Validate signature error: sig error" with no hint at the permission cause. Testnet README status table updated: all three rows ✅ verified. Regression: 217 pytest + 51 vitest + 8 mock e2e still green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review ReportProject: x402 (BankofAI SDK) PR OverviewBranch Information
Commit History
Review SummaryVerdict
Findings at a Glance
SummaryThis PR introduces the full P1 AI-native scaffolding for the x402 repository: a The overall architecture is well-considered. The JSON-driven scenario runner, the dual mock/testnet tier split, the GasFree API mock colocation, and the However, one defect is blocking: the PostToolUse hook contains a hardcoded absolute developer path ( Change Summary1. AI-Native Scaffolding (
|
| File | Change Type | Description |
|---|---|---|
CLAUDE.md |
Added | Root-level repo guide for Claude Code |
.claude/rules/CLAUDE.md |
Added | Rules knowledge-base index and authoring guide |
.claude/rules/common/conventions.md |
Added | Repo-wide conventions (addressing, amounts, headers, pipeline) |
.claude/rules/networks/evm.md |
Added | EVM network conventions (signing, contracts, smoke tests) |
.claude/rules/networks/tron.md |
Added | TRON network conventions (TIP-712, GasFree, RPC defaults) |
.claude/rules/schemes/exact.md |
Added | exact scheme invariants and gotchas |
.claude/rules/schemes/exact-permit.md |
Added | exact_permit scheme invariants |
.claude/rules/schemes/exact-gasfree.md |
Added | exact_gasfree scheme invariants (TRON-only) |
.claude/rules/python/conventions.md |
Added | Python 3.11+ tooling and idiom rules |
.claude/rules/typescript/conventions.md |
Added | Node 20+/pnpm/ESM/viem conventions |
.claude/rules/testing/scenarios.md |
Added | e2e scenario authoring conventions |
.claude/agents/code-reviewer.md |
Added | x402-flavored code-review agent definition |
.claude/agents/security-reviewer.md |
Added | Security-focused review agent definition |
.claude/agents/scheme-author.md |
Added | Scheme-authoring wizard agent definition |
.claude/commands/x402/compound.md |
Added | /x402:compound slash-command wizard |
.claude/commands/x402/create-scenario.md |
Added | /x402:create-scenario wizard |
.claude/hooks/run-tests-for-change.sh |
Added | PostToolUse bash hook dispatching pytest/vitest/e2e |
.claude/settings.json |
Added | Hook wiring with asyncRewake |
Purpose: Establishes Claude Code as a first-class participant in the development workflow by providing scoped rule sheets, agent personas, slash-command wizards, and an automated test hook.
2. Integration Step Runner (integration/)
| File | Change Type | Description |
|---|---|---|
integration/run.py |
Added | Generic JSON-driven sequential step executor with Markdown reporter |
integration/commands.py |
Added | @builtin registry: @json_diff, @json_match, @start_bg, @stop_bg, @stop_all_bg, @wait_http |
integration/README.md |
Added | CLI reference |
integration/CLAUDE.md |
Added | Component guide |
integration/configs/smoke.json |
Added | Self-test config |
Purpose: Provides the infrastructure that e2e scenario configs (config.json) are executed against — a generic, dependency-free Python 3.9+ runner.
3. E2E Test Harness (e2e/)
| File | Change Type | Description |
|---|---|---|
e2e/mock_facilitator/app.py |
Added | FastAPI mock serving facilitator + GasFree API with mode-driven behavior |
e2e/mock_facilitator/__main__.py |
Added | CLI entry point for mock facilitator |
e2e/clients/test_client.py |
Added | Real-signature EVM + TRON e2e client |
e2e/servers/resource_server.py |
Added | FastAPI resource server using @x402_protected |
e2e/scenarios/*/config.json |
Added | 8 mock scenarios + 3 testnet scenarios |
e2e/scenarios/*/expected_*.json |
Added | Diff-asserted expected responses |
e2e/scenarios/run_all.sh |
Added | Mock suite CI entry point |
e2e/scenarios/run_testnet.sh |
Added | Testnet suite with credential-gating |
e2e/README.md, e2e/CLAUDE.md |
Added | Documentation |
Purpose: End-to-end validation across EVM (BSC) and TRON (Nile) for all three schemes and five failure modes, runnable offline (mock tier) and on real chains (testnet tier).
4. Example Facilitator (examples/facilitator/)
| File | Change Type | Description |
|---|---|---|
examples/facilitator/server.py |
Added | Reference FastAPI facilitator wiring EVM + TRON mechanisms |
examples/facilitator/__main__.py |
Added | CLI entry point |
examples/CLAUDE.md |
Added | Component guide |
Purpose: Provides a runnable reference facilitator used by the testnet e2e tier and as a canonical implementation example.
5. Docs, Specs Reorganisation, CI
| File | Change Type | Description |
|---|---|---|
specs/ (renamed from docs/specs/) |
Renamed | Mirrors upstream x402-foundation/x402/specs/ layout |
docs/design/ai-transformation.md |
Added | AI-native development plan |
docs/solutions.md |
Added | Incident-driven bug-avoidance checklist |
.env.example |
Added | Template for testnet credentials |
.github/workflows/check_e2e.yml |
Added | PR CI: mock e2e suite |
.github/workflows/nightly_testnet.yml |
Added | Nightly: real-chain testnet scenarios |
.gitignore |
Modified | Adds .DS_Store, .claude/hooks/logs/ |
Detailed Findings
Critical
[C-01] Hardcoded Developer Absolute Path in PostToolUse Hook
| Property | Value |
|---|---|
| Severity | Critical |
| Category | Correctness |
| File | .claude/hooks/run-tests-for-change.sh : Line 20 |
Description
The hook hard-codes an absolute path to the original developer's local checkout:
REPO_ROOT="/Users/bobo/code/x402/x402"All subsequent path logic depends on this variable. On any other machine, FILE_PATH#$REPO_ROOT/ will not strip the repo prefix, leaving REL equal to the full absolute path. Every case match will fail, and the hook will silently exit 0 (the "*" catch-all logs "skip (no rule for …)" and exits 0). The asyncRewake mechanism will therefore never fire for anyone other than the original author. The entire test automation promised by the hook is non-functional on CI and all other developer machines.
Code
REPO_ROOT="/Users/bobo/code/x402/x402"
LOG_DIR="$REPO_ROOT/.claude/hooks/logs"
...
REL="${FILE_PATH#$REPO_ROOT/}"Recommendation
Derive REPO_ROOT dynamically from the script's own location (which Claude Code passes as a resolved absolute path via the hook input):
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"This makes the hook portable across machines, containers, and CI workers. The LOG_DIR derivation then follows naturally.
Major
[MJ-01] File Descriptor Leak in @start_bg Builtin
| Property | Value |
|---|---|
| Severity | Major |
| Category | Correctness |
| File | integration/commands.py : Lines 164–173 |
Description
log_fd is opened for writing and passed as the stdout of the background subprocess, but the parent process never closes it after Popen returns. The parent retains an open file descriptor for the duration of the step runner process. For suites with many scenarios and frequent restarts, this leaks one file descriptor per @start_bg call. On Linux the default RLIMIT_NOFILE is 1024; large test suites could exhaust this.
Code
log_fd = open(log_path, "w", encoding="utf-8") # opened, never closed in parent
proc = subprocess.Popen(
command,
shell=True,
cwd=workdir,
stdout=log_fd,
stderr=subprocess.STDOUT,
start_new_session=True,
)
pid_file.write_text(str(proc.pid))
return (0, f"started tag={tag} pid={proc.pid} log={log_path}", "")
# log_fd is never closed hereRecommendation
with open(log_path, "w", encoding="utf-8") as log_fd:
proc = subprocess.Popen(
command,
shell=True,
cwd=workdir,
stdout=log_fd,
stderr=subprocess.STDOUT,
start_new_session=True,
)
# After 'with' exits, the parent's copy of the fd is closed.
# The child inherits its own copy and keeps writing.
pid_file.write_text(str(proc.pid))
return (0, f"started tag={tag} pid={proc.pid} log={log_path}", "")[MJ-02] Module-Level Monkey-Patch Applied at Import Time in Resource Server
| Property | Value |
|---|---|
| Severity | Major |
| Category | Correctness / Quality |
| File | e2e/servers/resource_server.py : Lines 35–53 |
Description
_install_noop_tx_verifier_if_requested() is called unconditionally at module-import time (line 53), before build_app(). It replaces tx_verification.get_verifier_for_network with a function that always raises ValueError, globally disabling on-chain transaction verification for the entire Python process whenever E2E_SKIP_TX_VERIFICATION=1 is set.
Two concrete problems:
- Silent global mutation: any other code co-located in the same
uvicornworker (including future middleware) that callsget_verifier_for_networkwill also see the monkey-patched version, even if it wasn't intended to skip verification. - Import-time side effects: if this module is ever imported in a test context (unit tests, introspection tools) with the env var set, verification is silently disabled for the entire test run.
app = build_app() on line 140 also runs all mechanism registrations and environment variable reads at import time — any misconfiguration raises an exception that surfaces as an import error rather than a startup error, which is harder to diagnose.
Code
# line 53 — runs unconditionally at import time
_install_noop_tx_verifier_if_requested()
# line 140 — also runs at import time
app = build_app()Recommendation
Move the monkey-patch inside build_app() (or inside main()), and gate it explicitly:
def build_app() -> FastAPI:
# Apply e2e overrides only when explicitly requested, and only during app construction
if os.environ.get("E2E_SKIP_TX_VERIFICATION") == "1":
_install_noop_tx_verifier_if_requested()
...The app = build_app() module-level call is unavoidable for uvicorn's "module:app" string import — that pattern is standard — but the monkey-patch should be deferred to inside build_app() rather than before it.
[MJ-03] Unhandled Exceptions in _env_base_fee() Crash the Facilitator at Startup
| Property | Value |
|---|---|
| Severity | Major |
| Category | Correctness |
| File | examples/facilitator/server.py : Lines 69–77 |
Description
_env_base_fee() parses FACILITATOR_BASE_FEE with json.loads() and casts values via int(). Neither call is guarded:
- A malformed JSON string raises
json.JSONDecodeError. - A non-integer value raises
ValueError.
Both propagate through _register_evm() and _register_tron() into build_app(), which runs at module import time. The crash surfaces as an ImportError / traceback from uvicorn with no actionable error message, making misconfiguration very hard to diagnose in production.
Code
def _env_base_fee() -> dict[str, int] | None:
raw = os.environ.get("FACILITATOR_BASE_FEE")
if not raw:
return None
import json
parsed = json.loads(raw) # unguarded — JSONDecodeError
return {str(k).upper(): int(v) for k, v in parsed.items()} # unguarded — ValueErrorRecommendation
def _env_base_fee() -> dict[str, int] | None:
raw = os.environ.get("FACILITATOR_BASE_FEE")
if not raw:
return None
import json
try:
parsed = json.loads(raw)
return {str(k).upper(): int(v) for k, v in parsed.items()}
except (json.JSONDecodeError, ValueError, TypeError) as exc:
print(
f"error: FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}",
file=sys.stderr,
)
sys.exit(2)Minor
[MN-01] PostToolUse Hook Mode Names Documented Incorrectly in e2e/CLAUDE.md
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Documentation |
| File | e2e/CLAUDE.md : Gotchas section; e2e/mock_facilitator/__main__.py : Lines 20–28 |
Description
e2e/CLAUDE.md documents the mock facilitator modes as:
--mode {success, invalid_sig, deadline_expired, network_mismatch, insufficient_balance, settle_reverted}
The actual CLI choices in __main__.py and _VALID_MODES in app.py are:
fail_verify_invalid_sigfail_verify_expiredfail_verify_network_mismatchfail_settle_insufficientfail_settle_revert
Every name in the documentation is wrong. A developer following the docs and passing --mode invalid_sig will get an argparse error. The mismatch will confuse anyone adding a new scenario.
Recommendation
Update e2e/CLAUDE.md to reflect the actual mode values, or adopt the shorter documented names as the actual CLI choices (which are more readable). The latter is preferable.
[MN-02] _resolve_mode Raises ValueError Inside FastAPI Handler — Unhandled 500
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Correctness |
| File | e2e/mock_facilitator/app.py : Lines 72–77 |
Description
When an invalid ?mode= query parameter is passed to /verify or /settle, _resolve_mode raises ValueError. FastAPI converts this to an HTTP 500 with a stack trace, rather than a 422 or 400 validation error. For a test tool used interactively via curl or a scenario, this makes debugging harder.
Code
def _resolve_mode(mode_query: str | None) -> str:
if mode_query:
if mode_query not in _VALID_MODES:
raise ValueError(f"invalid mode: {mode_query}") # becomes HTTP 500Recommendation
from fastapi import HTTPException
def _resolve_mode(mode_query: str | None) -> str:
if mode_query:
if mode_query not in _VALID_MODES:
raise HTTPException(
status_code=422,
detail=f"invalid mode '{mode_query}'. valid: {sorted(_VALID_MODES)}",
)
return _default_mode()[MN-03] Latent Bug: exact Scheme Registered with EVM Mechanism for Any Network Prefix
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Correctness |
| File | e2e/clients/test_client.py : Lines 157–162 |
Description
The _register_client_mechanisms function registers ExactEvmClientMechanism for the exact scheme regardless of the network prefix:
if scheme == "exact":
from bankofai.x402.mechanisms.evm.exact import ExactEvmClientMechanism
x402.register(network, ExactEvmClientMechanism(signer))If a future scenario sets E2E_NETWORK=tron:nile and E2E_SCHEMES=exact, the EVM mechanism will be silently registered for a TRON network. The payment attempt will fail at signing with an opaque error. The exact_permit branch below it correctly checks the network prefix (startswith("eip155:") vs startswith("tron:")). The exact branch should mirror that guard.
Recommendation
Add a network guard consistent with the exact_permit pattern:
if scheme == "exact":
if not network.startswith("eip155:"):
raise SystemExit(f"scheme 'exact' (ERC-3009) requires eip155:* network, got {network}")
from bankofai.x402.mechanisms.evm.exact import ExactEvmClientMechanism
x402.register(network, ExactEvmClientMechanism(signer))[MN-04] _register_client_mechanisms Type Annotation Accepts EVM-Only Signer for TRON Paths
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Code Quality |
| File | e2e/clients/test_client.py : Lines 154–155 |
Description
The function signature types signer as EvmClientSigner:
def _register_client_mechanisms(
x402: X402Client, network: str, schemes: list[str], signer: EvmClientSigner
) -> None:But when network.startswith("tron:"), run() passes an _OfflineTronSigner (subclass of TronClientSigner), not EvmClientSigner. The type annotation is incorrect and will produce a mypy error in strict mode, contradicting the Python conventions rule that the module must stay mypy-green.
Recommendation
Use a union or a common protocol type:
from bankofai.x402.signers.client import EvmClientSigner, TronClientSigner
from typing import Union
def _register_client_mechanisms(
x402: X402Client,
network: str,
schemes: list[str],
signer: Union[EvmClientSigner, TronClientSigner],
) -> None:[MN-05] stop_bg TOCTOU Race on os.getpgid / os.killpg
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Correctness |
| File | integration/commands.py : Lines 229–234 |
Description
There is a race between reading the PID from the pid file, calling os.getpgid(pid), and calling os.killpg(...). If the process exits and its PID is reused by an unrelated process between these two calls, os.getpgid may succeed and os.killpg may signal the wrong process group. The ProcessLookupError guard catches the case where the PID is gone entirely, but not the reuse case.
This is an inherent limitation of PID-based process management. In a local e2e harness with short-lived services it is unlikely to trigger, but worth acknowledging.
Code
try:
os.killpg(os.getpgid(pid), signal.SIGTERM) # TOCTOU: pid may be reused
except ProcessLookupError:
...Recommendation
Since start_bg uses start_new_session=True, the subprocess is its own session leader and its PGID equals its PID. Replace os.getpgid(pid) with pid directly to avoid the extra syscall:
try:
os.killpg(pid, signal.SIGTERM) # PGID == PID for session leaders
except ProcessLookupError:
...This does not eliminate the TOCTOU window but removes an unnecessary syscall and makes the code's assumption explicit.
Suggestions
[S-01] Strengthen TRON_TEST_KEY Warning Comment
File: e2e/clients/test_client.py : Line 48
Description: The deterministic TRON test key 0x111...1 is all-ones — trivially derivable. While the existing comment ("Not secret, just a fixed 32-byte hex so scenarios are reproducible") is correct, a developer who copies this value into their .env for testnet testing could accidentally use it on mainnet. The Anvil key (ANVIL_KEY_0) is similarly well-known but has strong cultural context as "test-only". Add a more explicit warning.
Suggestion:
# Deterministic test TRON key — all-ones, trivially guessable.
# SAFE FOR LOCAL/MOCK E2E ONLY. Never use on mainnet or any real-value network.
TRON_TEST_KEY = "0x1111111111111111111111111111111111111111111111111111111111111111"[S-02] run_all.sh Does Not Surface Partial Service Leak on Outer Script Crash
File: e2e/scenarios/run_all.sh
Description: The outer loop collects failures without aborting early, which is correct. However, if the script itself is interrupted (Ctrl+C, SIGKILL from CI timeout), background services started by an in-progress scenario may be left running. Each scenario ends with @stop_all_bg, but that step is only reached if the scenario runner itself completes. An EXIT trap on the outer script would ensure cleanup:
Suggestion:
trap 'python3 -m integration.run --config /dev/null 2>/dev/null; pkill -f "e2e.mock_facilitator" 2>/dev/null || true' EXITOr more simply, rely on the integration runner's @stop_all_bg already being present in each scenario, and document that users should run @stop_all_bg manually if the suite is killed mid-run.
Positive Observations
| Area | Observation |
|---|---|
| E2E Architecture | JSON-only declarative scenarios with a generic step runner is the right abstraction. Zero custom Python per scenario keeps the harness maintainable and addition of new scenarios trivial. |
| Tier Separation | Mock tier (offline, fast, always-CI) vs. testnet tier (real-chain, nightly, credential-gated) is exactly correct. The skip-on-missing-env pattern in run_testnet.sh prevents CI noise on forks. |
| Mode-Driven Mock | Single mock facilitator supporting all failure modes via MOCK_FACILITATOR_MODE / ?mode= query param avoids proliferating bespoke mock endpoints. The _VALID_MODES set provides a clean invariant. |
| Secret Hygiene | .env.example with blank values, .env in .gitignore, CI wired to GitHub Secrets — no credentials in the diff. |
asyncRewake Hook |
Using asyncRewake: true in settings.json so Claude Code wakes when tests fail is a thoughtful productivity improvement over a blocking hook. |
@builtin Registry |
The @builtin decorator pattern in commands.py is clean — registration, dispatch, and self-documenting --list-builtins are all driven by a single decorator call. |
| GasFree Mock Colocation | Serving the GasFree API from the same mock facilitator process (on /api/v1/address/{user} and /api/v1/config/provider/all) removes the need for a third mock process per scenario. Elegant. |
| Fail-Fast with Full Reporting | integration/run.py correctly stops at the first failed step (fail-fast) while still writing a full Markdown report, and the outer run_all.sh collects all scenario failures before reporting. Both behaviors are appropriate for their layers. |
_coerce_response_payload |
The recursive coercion function in test_client.py ensures the output JSON is always serializable without json.JSONEncoder subclassing — simple and effective. |
| Rules Knowledge Base | The .claude/rules/ tree is well-structured: each rule file has a clear scope trigger, cites the incident that motivated it, and links back to docs/solutions.md and specs/. The authoring guidelines (rules/CLAUDE.md) make the system self-maintaining. |
Checklist Results
| Category | Items Checked | Pass | Fail | N/A | Notes |
|---|---|---|---|---|---|
| Correctness | 8 | 4 | 4 | 0 | C-01 (hook path), MJ-01 (fd leak), MJ-02 (monkey-patch), MN-03 (latent exact/TRON) |
| Security | 6 | 6 | 0 | 0 | No secrets in diff; no injection vectors; .env.example only |
| Performance | 5 | 5 | 0 | 0 | No N+1, no unbounded ops in changed code |
| Code Quality | 7 | 5 | 2 | 0 | MN-04 (type annotation), MJ-02 (import-time side effect) |
| Testing | 5 | 5 | 0 | 0 | 8 mock + 3 testnet scenarios covering happy + unhappy paths |
| Documentation | 4 | 3 | 1 | 0 | MN-01 (mode name mismatch in CLAUDE.md) |
| Compatibility | 3 | 3 | 0 | 0 | Specs rename is non-breaking; no wire-format changes |
| Observability | 4 | 4 | 0 | 0 | Request logging in mock facilitator; log artifact upload in CI |
Disclaimer
This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between main and feat/x402-ai. Runtime behavior, integration testing, and deployment impact are not covered. No SDK source code (under python/x402/src/ or typescript/packages/) was changed in this PR; all findings pertain exclusively to the new tooling and harness code added by this PR.
Report generated by Code Review Skill v1.0.0
Date: 2026-04-24
Resolves all blocking + major + minor findings from the code review on
feat/x402-ai; no SDK source changes, harness + hook + examples only.
Critical:
- C-01 hook hardcoded developer path → derive REPO_ROOT from
${BASH_SOURCE[0]}; makes .claude/hooks/run-tests-for-change.sh
portable across machines and CI.
Major:
- MJ-01 @start_bg fd leak → wrap log_fd in `with open(...)`; parent
closes its copy after Popen, child keeps writing via its inherited fd.
- MJ-02 resource_server module-level monkey-patch → move
_install_noop_tx_verifier_if_requested() call into build_app() so the
side-effect is scoped to app construction, not import time.
- MJ-03 unguarded json.loads in _env_base_fee() → catch
JSONDecodeError/ValueError/TypeError/AttributeError and exit 2 with a
clear stderr message; previously a bad FACILITATOR_BASE_FEE string
would crash the facilitator at import time with no actionable error.
Minor:
- MN-01 stale mode names in e2e/CLAUDE.md and
.claude/rules/testing/scenarios.md → updated to authoritative
_VALID_MODES set (fail_verify_*, fail_settle_*).
- MN-02 _resolve_mode raised ValueError → now raises FastAPI
HTTPException(422) with valid-mode list in detail.
- MN-03 `exact` scheme in test_client skipped network-prefix guard that
`exact_permit` had → add explicit eip155:* check so a stray tron:*
scenario fails loud instead of signing with the wrong mechanism.
- MN-04 _register_client_mechanisms signer type annotation → widen to
`EvmClientSigner | TronClientSigner`; hoist TronClientSigner to
module-level import.
- MN-05 stop_bg TOCTOU on os.getpgid(pid) → use pid directly; start_bg
sets start_new_session=True so pgid == pid for our children. Removes
a syscall and makes the invariant explicit.
Suggestion:
- S-01 TRON_TEST_KEY warning strengthened; now explicitly flags
"mainnet-unsafe, address publicly known, funds will be swept".
Not addressed:
- S-02 (outer run_all.sh EXIT trap) — each scenario already calls
@stop_all_bg; adding an outer-script trap creates CI-timeout edge
cases without clear benefit. Documented limitation acceptable.
Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios still green.
Hook portable-path smoke: running against /Users/.../x402/README.md now
correctly resolves REL=README.md and skips the docs path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Addressed all findings from the review in c4a547c (just pushed).
Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios still green. Hook portable-path smoke passes. |
Code Review ReportProject: x402 (BankofAI SDK) PR OverviewBranch Information
Commit History
Review SummaryVerdict
Findings at a Glance
SummaryThis PR establishes the AI-native developer infrastructure for the x402 repo. The additions are exclusively additive (zero lines deleted from The overall structure is well-thought-out. The integration runner is clean, the declarative scenario format is correctly implemented, the mock facilitator faithfully implements the x402 wire contract, and the shell scripts use appropriate Two issues require changes before merge. The reference facilitator ( Change Summary1. AI Scaffolding (
|
| File | Change Type | Description |
|---|---|---|
.claude/agents/code-reviewer.md |
Added | x402-flavored code review agent with wire-format hotspots |
.claude/agents/security-reviewer.md |
Added | Payment-path / signing review agent |
.claude/agents/scheme-author.md |
Added | Scheme spec + rule authoring agent |
.claude/commands/x402/compound.md |
Added | /x402:compound wizard for docs/solutions.md |
.claude/commands/x402/create-scenario.md |
Added | /x402:create-scenario wizard |
.claude/hooks/run-tests-for-change.sh |
Added | PostToolUse hook: run scoped test suite after file edit |
.claude/rules/** |
Added | Language, network, scheme, and testing rule sheets |
.claude/settings.json |
Added | Hook wiring for PostToolUse |
Purpose: Provides Claude Code with x402-domain knowledge, behavioral guardrails, and automated test feedback on every code edit.
2. E2E Test Harness (e2e/ + integration/)
| File | Change Type | Description |
|---|---|---|
integration/run.py |
Added | Generic JSON-driven step runner with Markdown report output |
integration/commands.py |
Added | @builtin registry: @start_bg, @stop_bg, @stop_all_bg, @wait_http, @json_diff, @json_match |
e2e/mock_facilitator/app.py |
Added | FastAPI mock facilitator with 6 injectable modes + GasFree API stubs |
e2e/mock_facilitator/__main__.py |
Added | CLI entry (python -m e2e.mock_facilitator) |
e2e/clients/test_client.py |
Added | Real-signature EVM + TRON client with offline signer variants |
e2e/servers/resource_server.py |
Added | @x402_protected FastAPI resource server configurable via E2E_* env vars |
e2e/scenarios/*/config.json |
Added | 8 mock scenarios (3 happy path + 5 unhappy) + 3 testnet scenarios |
e2e/scenarios/run_all.sh |
Added | Mock suite entry point (CI-triggered per PR) |
e2e/scenarios/run_testnet.sh |
Added | Testnet suite entry point with per-scenario skip logic |
Purpose: End-to-end verification of the full EVM + TRON payment flow without (mock tier) or with (testnet tier) real on-chain settlement.
3. Reference Facilitator (examples/facilitator/)
| File | Change Type | Description |
|---|---|---|
examples/facilitator/server.py |
Added | FastAPI binding over X402Facilitator with env-var-driven EVM + TRON registration |
examples/facilitator/__main__.py |
Added | python -m examples.facilitator entry |
Purpose: Canonical facilitator deployment pattern used by testnet e2e tier; also serves as adoption reference for SDK users.
4. CI Workflows
| File | Change Type | Description |
|---|---|---|
.github/workflows/check_e2e.yml |
Added | PR-gated mock e2e suite (no secrets required) |
.github/workflows/nightly_testnet.yml |
Added | Nightly testnet run, gated to BofAI org, uses GitHub Secrets |
5. Documentation and Configuration
| File | Change Type | Description |
|---|---|---|
CLAUDE.md |
Added | Repo-wide AI entrypoint with component map and safety rules |
{python,typescript,e2e,specs,...}/CLAUDE.md |
Added | Component-level AI context (build commands, idioms, non-goals) |
docs/design/ai-transformation.md |
Added | P1–P4 roadmap for Claude-Code-native adoption |
docs/solutions.md |
Added | Hard-won debugging knowledge base (3 entries: TRON hex, deadline bounds, balance source) |
specs/ |
Renamed | Spec files moved from docs/specs/ — no content change |
.env.example |
Added | Template for testnet credentials (git-ignored) |
.gitignore |
Modified | Added .DS_Store and .claude/hooks/logs/ |
Detailed Findings
Major
[MJ-01] Missing input validation on /verify and /settle in the reference facilitator
| Property | Value |
|---|---|
| Severity | Major |
| Category | Correctness / Code Quality |
| File | examples/facilitator/server.py : Lines 293–303 |
Description
The /verify and /settle endpoint handlers access body["paymentPayload"] and body["paymentRequirements"] with direct key subscripting. If a caller sends a body that omits either key (malformed client, misconfigured test harness, network truncation), Python raises KeyError, which FastAPI translates to an unhandled HTTP 500 — not the semantically correct 422/400. This is inconsistent with the same file's /fee/quote handler, which uses body.get("accepts", []) defensively.
Since examples/facilitator/server.py is the primary adoption reference for SDK integrators building their own facilitators, this pattern will be copied into production deployments.
Code
# examples/facilitator/server.py — lines 293-303
@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
payload = PaymentPayload(**body["paymentPayload"]) # KeyError → 500
requirements = PaymentRequirements(**body["paymentRequirements"]) # KeyError → 500
resp = await facilitator.verify(payload, requirements)
return JSONResponse(resp.model_dump(by_alias=True))
@app.post("/settle")
async def _settle(body: dict[str, Any]) -> JSONResponse:
payload = PaymentPayload(**body["paymentPayload"]) # KeyError → 500
requirements = PaymentRequirements(**body["paymentRequirements"]) # KeyError → 500
resp = await facilitator.settle(payload, requirements)
return JSONResponse(resp.model_dump(by_alias=True))Recommendation
from fastapi import HTTPException
@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
try:
payload = PaymentPayload(**body["paymentPayload"])
requirements = PaymentRequirements(**body["paymentRequirements"])
except KeyError as exc:
raise HTTPException(status_code=422, detail=f"missing field: {exc}") from exc
resp = await facilitator.verify(payload, requirements)
return JSONResponse(resp.model_dump(by_alias=True))
# Same pattern for _settle[MJ-02] E2E dependency installation bypasses the project lockfile
| Property | Value |
|---|---|
| Severity | Major |
| Category | Correctness / Compatibility |
| File | .github/workflows/check_e2e.yml : Lines 22–24 |
Description
uvicorn, httpx, and fastapi are installed via a bare uv pip install after uv sync, rather than being declared in the project's pyproject.toml dependency groups. This means:
- Their versions are not pinned in
uv.lock— a new release of any of these packages could break CI silently on the next run, with no diff traceable to a PR. - The workflow's caching step (
cache-dependency-glob: "python/x402/uv.lock") does not cover these packages, so a stale cache may hold the wrong versions. - Developers running
uv sync --group devlocally do not get these packages, causing local/CI divergence when runninge2e/scenarios/run_all.sh.
Code
# .github/workflows/check_e2e.yml — lines 22-24
- name: Install SDK in editable mode + e2e deps
working-directory: ./python/x402
run: |
uv sync --all-extras --dev
uv pip install uvicorn httpx fastapi # <-- outside the lockfileRecommendation
Add an e2e optional dependency group to python/x402/pyproject.toml:
[project.optional-dependencies]
e2e = ["uvicorn>=0.29", "httpx>=0.27", "fastapi>=0.111"]Then replace the workflow step with:
run: uv sync --extra e2e --devThis ensures versions are pinned in uv.lock, the cache is valid, and local developer environments match CI.
Minor
[MN-01] ** glob in bash case patterns is misleading
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Code Quality |
| File | .claude/hooks/run-tests-for-change.sh : Lines 57–62 |
Description
The hook dispatches to the right test suite using case "$REL" in patterns. It includes both *.py and **/*.py variants for each directory:
case "$REL" in
python/x402/src/*.py | python/x402/tests/*.py | \
python/x402/src/**/*.py | python/x402/tests/**/*.py)In bash case statements, * already matches any string including / (pattern matching semantics, not pathname expansion). So python/x402/src/*.py already matches python/x402/src/mechanisms/evm/exact.py. The **/*.py patterns are therefore redundant and may mislead future maintainers into believing bash case needs ** for recursive matching. While the script is functionally correct, the extra patterns add noise.
Recommendation
Remove the redundant ** variants to avoid confusion:
case "$REL" in
python/x402/src/*.py | python/x402/tests/*.py)
...
typescript/packages/*/src/*.ts | typescript/packages/*/src/*.tsx)
...[MN-02] Mock GasFree provider and user address are identical
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Testing |
| File | e2e/mock_facilitator/app.py : Lines 128–132 |
Description
MOCK_GASFREE_PROVIDER and MOCK_GASFREE_ADDRESS are set to the same TRON address. Production GasFree flows have distinct provider (relayer) and user custodial wallet addresses. Using identical addresses in the mock means that any code path in the SDK that validates provider != gasFreeAddress is never exercised.
Code
MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"
MOCK_GASFREE_ADDRESS = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC" # same addressRecommendation
Use two distinct valid TRON addresses:
MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC" # relayer / service provider
MOCK_GASFREE_ADDRESS = "TDqe43YvBVq6DQ96DgNJ9sXSPLJHr2Hk2d" # user's custodial wallet[MN-03] Module-level monkey-patch in resource server has process-wide scope
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Code Quality |
| File | e2e/servers/resource_server.py : Lines 43–50 |
Description
_install_noop_tx_verifier_if_requested() patches tx_verification.get_verifier_for_network at the module level (called unconditionally during build_app()). The comment acknowledges the side effect but describes it as "scoped to this entry point." In practice, it mutates a module-level function reference for the entire Python process lifetime — and in uvicorn's reload mode, build_app() may be called multiple times, re-applying the patch.
This is acceptable for a test fixture, but the scoping claim in the comment is incorrect and could mislead contributors who import this module in other contexts.
Code
# resource_server.py lines 43-50
def _install_noop_tx_verifier_if_requested() -> None:
if os.environ.get("E2E_SKIP_TX_VERIFICATION") != "1":
return
from bankofai.x402.utils import tx_verification
def _always_raise(network: str, rpc_url: str | None = None):
raise ValueError(f"e2e: tx verification disabled for {network}")
tx_verification.get_verifier_for_network = _always_raise # process-wide mutationRecommendation
Update the comment to accurately describe the scope:
# Replaces the module-level `get_verifier_for_network` function for the entire process.
# This is intentional for e2e scenarios: the mock facilitator's deterministic tx hash
# is not a real on-chain transaction, so verification must be skipped globally.
# NEVER import this module in production code.[MN-04] Parallel arrays in run_testnet.sh lack a length guard
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Code Quality |
| File | e2e/scenarios/run_testnet.sh : Lines 20–28 |
Description
The testnet runner uses two parallel bash arrays — names[] and required[] — indexed together. If a contributor adds a scenario to names[] without adding its required-vars entry to required[] (or vice versa), the wrong required-var list is silently applied to a scenario, potentially running a testnet scenario without credentials and generating a false "PASS" or a misleading error.
Code
declare -a names=(
"exact_testnet"
"exact_permit_testnet"
"exact_gasfree_testnet"
)
declare -a required=(
"BSC_TESTNET_FACILITATOR_KEY BSC_TESTNET_CLIENT_KEY BSC_TESTNET_PAY_TO"
"BSC_TESTNET_FACILITATOR_KEY BSC_TESTNET_CLIENT_KEY BSC_TESTNET_PAY_TO"
"TRON_NILE_FACILITATOR_KEY TRON_NILE_CLIENT_KEY TRON_NILE_PAY_TO GASFREE_NILE_API_URL"
)
# No assertion that ${#names[@]} == ${#required[@]}Recommendation
Add a guard immediately after the array declarations:
if [[ ${#names[@]} -ne ${#required[@]} ]]; then
echo "bug: names[] and required[] arrays are out of sync" >&2
exit 2
fi[MN-05] print() used for startup error in example facilitator (violates library convention)
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Code Quality |
| File | examples/facilitator/server.py : Lines 81–84 |
Description
_env_base_fee() uses print(..., file=sys.stderr) to report a parse error and then calls sys.exit(2). The Python conventions in .claude/rules/python/conventions.md state "No print in library code." While examples/facilitator/server.py is application code (not library code), _env_base_fee() is called during build_app(), which is also invoked at module import time (line 309: app = build_app()). Any downstream import of this module (e.g., for testing or inspection) would trigger sys.exit(2) on a bad env var, making it difficult to unit-test build_app() in isolation.
Code
# examples/facilitator/server.py lines 79-84
except (json.JSONDecodeError, ValueError, TypeError, AttributeError) as exc:
print(
f"error: FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}",
file=sys.stderr,
)
sys.exit(2)Recommendation
Raise a ValueError from _env_base_fee() and let main() (or build_app()) catch it and call sys.exit(2):
except (json.JSONDecodeError, ValueError, TypeError, AttributeError) as exc:
raise ValueError(
f"FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}"
) from excThen in build_app() (or main()):
try:
base_fee = _env_base_fee()
except ValueError as exc:
logging.getLogger("x402.example_facilitator").error("%s", exc)
sys.exit(2)Suggestions
[S-01] Add trap-based cleanup to run_all.sh
File: e2e/scenarios/run_all.sh
Description: When a scenario fails partway (e.g., the client step), integration/run.py's fail-fast exits before the final @stop_all_bg step runs. The next scenario's @start_bg does kill the previous service via the pid file, but if the two ports bind at exactly the same time during the handoff window, the new service fails to start. Adding a trap guarantees cleanup on any exit path.
Suggestion:
# At the top of run_all.sh, after ROOT_DIR is defined:
trap 'python3 -m integration.run --config /dev/stdin <<< '"'"'{"steps":[{"command":"@stop_all_bg","name":"trap cleanup"}]}'"'"'' EXITOr, simpler — call @stop_all_bg directly:
cleanup() { python3 -c "from integration.commands import stop_all_bg; stop_all_bg([], '.')"; }
trap cleanup EXIT[S-02] Use distinct mock nonce value in GasFree address response
File: e2e/mock_facilitator/app.py : Line 202
Description: The mock GasFree /api/v1/address/{user} response returns "nonce": 0. A nonce of 0 is valid for the first ever GasFree transaction but could mask bugs in replay-protection logic that compares nonces. Using a non-zero nonce (e.g., 42) would ensure that nonce-handling code paths are exercised in mock scenarios.
Suggestion: Change "nonce": 0 to "nonce": 42 in the mock response.
[S-03] Validate url scheme in wait_http builtin
File: integration/commands.py : Lines 196–213
Description: wait_http passes the url argument directly to urllib.request.urlopen with no scheme validation. Passing a file:// URL or a non-HTTP scheme would silently open a local file (for read) or raise an obscure error. Since scenario configs are version-controlled and the risk is low, this is purely a defensive suggestion.
Suggestion:
from urllib.parse import urlparse
def wait_http(args, workdir):
...
parsed = urlparse(url)
if parsed.scheme not in ("http", "https"):
return (2, "", f"@wait_http: url must be http(s), got scheme: {parsed.scheme!r}")
...[S-04] Surface GasFree API mock mode in supported response
File: e2e/mock_facilitator/app.py : Lines 138–141
Description: The /supported endpoint returns a hard-coded list of DEFAULT_SUPPORTED_KINDS regardless of the running mode. In a fail_verify_* mode, a client seeing exact_gasfree listed as supported may be surprised when the subsequent /verify returns invalid. Emitting the current mode as an X-Mock-Mode response header (or a meta field in the response) would make debugging mock-tier failures faster.
Suggestion:
@app.get("/supported")
async def supported() -> dict[str, Any]:
_log({"endpoint": "supported"})
return {
"kinds": DEFAULT_SUPPORTED_KINDS,
"_mockMode": _default_mode(), # debug aid, ignored by real clients
}Positive Observations
| Area | Observation |
|---|---|
| Process lifecycle | start_bg uses start_new_session=True + os.killpg for correct process-group cleanup, avoiding orphaned uvicorn worker processes. |
| Fail-fast + recovery | start_bg proactively kills the previous process with the same tag before starting a new one, giving scenarios graceful recovery from a prior run's leftover services. |
| No real keys in fixtures | ANVIL_KEY_0 and TRON_TEST_KEY are both well-known, widely-published test vectors. The TRON key carries a prominent warning comment. All real credentials flow through env vars, and .env is .gitignored. |
| CI fork protection | nightly_testnet.yml guards with if: github.repository_owner == 'BofAI', correctly preventing fork PRs from accessing secrets. |
| Testnet self-skip logic | run_testnet.sh skips scenarios whose required env vars are blank (rather than failing them). This allows running the full script locally with a partial .env without false CI failures. |
| Wire format fidelity | Mock facilitator /verify and /settle responses match the x402 v2 wire contract (isValid, invalidReason, success, errorReason, transaction, network) exactly as defined in the protocol spec. |
| Declarative scenarios | All 8 mock and 3 testnet scenarios are pure JSON — no bespoke Python test logic. The @builtin system is the right abstraction for keeping test configs readable. |
json_match for testnet |
Testnet scenarios correctly use @json_match (partial field matching with regex for the tx hash) while mock scenarios use @json_diff (exact equality). The distinction is principled and correctly implemented. |
| Offline signer pattern | _AllowanceSkippingEvmSigner and _OfflineTronSigner cleanly override only the chain-RPC methods, keeping the real signing path intact for mock tests. This correctly validates x402 payment mechanics without a live node. |
FACILITATOR_BASE_FEE design |
Centralizing the base-fee configuration as a single JSON env var (instead of per-token env vars) is ergonomic and correctly propagated to both EVM and TRON mechanisms. |
Checklist Results
| Category | Items Checked | Pass | Fail | N/A | Notes |
|---|---|---|---|---|---|
| Correctness | 8 | 7 | 1 | 0 | MJ-01: missing key validation in facilitator endpoints |
| Security | 10 | 10 | 0 | 0 | No secrets in code; proper fork/secret guard in CI; test keys clearly labeled |
| Performance | 7 | 7 | 0 | 0 | No N+1, no unbounded loops, no blocking-in-async |
| Code Quality | 8 | 4 | 4 | 0 | MN-01 through MN-05: redundant glob, identical mock addresses, module-scope mutation, parallel arrays, print in startup path |
| Testing | 7 | 6 | 1 | 0 | MJ-02: e2e deps not in lockfile; 8 mock + 3 testnet scenarios otherwise well-structured |
| Documentation | 6 | 6 | 0 | 0 | Every new public entry point has a docstring and CLAUDE.md |
| Compatibility | 5 | 5 | 0 | 5 | No existing APIs modified; all additions are backward-compatible |
| Observability | 4 | 4 | 0 | 0 | Request log in mock facilitator; structured Markdown reports from runner |
Disclaimer
This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between the specified branches (main...feat/x402-ai). Runtime behavior, integration testing, and deployment impact are not covered. No changes to the SDK's core payment path (python/x402/src/, typescript/packages/) were present in this PR — review of payment-path logic should follow the security-reviewer agent protocol for any subsequent PR touching those paths.
Report generated by Code Review Skill v1.0.0
Date: 2026-04-24
The design doc was a draft v0 internal planning document with heavy organization-specific context (BofAI, SUN.io, sun-protocol/apollo-arena, internal branch names like feat/x402-ai-mcp/budget/matrix that never materialized) and an "Open questions" section. From a community contributor's perspective it read as either (a) required process to follow — which it never was — or (b) out-of-date planning that kept confusing the reader. P1 is already executed and demonstrable in the .claude/ tree + CLAUDE.md + specs/; P2..P4 are aspirational and belong in issues, not checked-in docs. Also removes the now-empty docs/design/ directory. Rewired 7 dangling references: - CLAUDE.md root: drop the "see design doc" line; shorten the AI-native section to one sentence; drop the docs/design/ row from the table. - docs/CLAUDE.md: drop the design/ row and the "Design docs go here" rule. - python/x402/CLAUDE.md: drop "P2" breadcrumb on the mcp stub line. - typescript/CLAUDE.md: drop "P2 in [design]" + "see design doc" on mcp and http rows respectively. - integration/CLAUDE.md: rewrite the matrix-testing non-goal without the dead link. - e2e/scenarios/exact_permit_happy_path/README.md: the "Facilitator binary with --mock" prereq now points to e2e/mock_facilitator/ that ships. - .claude/agents/scheme-author.md: suggest an in-flight feature spec under specs/<NNN>-<slug>/ instead of a design doc. Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review ReportProject: x402 (BankofAI SDK) PR OverviewBranch Information
Commit History
Review SummaryVerdict
Findings at a Glance
SummaryThis PR delivers a comprehensive AI-native development scaffold: the complete The overall quality is high. The design is clean, well-structured, and thoughtfully documented. However, two Major issues require fixes before merge: the Change Summary1. AI-native scaffolding (
|
| File | Change Type | Description |
|---|---|---|
.claude/agents/code-reviewer.md |
Added | x402-flavored code reviewer agent definition |
.claude/agents/security-reviewer.md |
Added | Security reviewer with x402 signing checklist |
.claude/agents/scheme-author.md |
Added | Scheme authoring agent |
.claude/commands/x402/compound.md |
Added | /x402:compound slash-command wizard |
.claude/commands/x402/create-scenario.md |
Added | /x402:create-scenario slash-command wizard |
.claude/hooks/run-tests-for-change.sh |
Added | PostToolUse hook: dispatches test suite after edits |
.claude/rules/** |
Added | Rule sheets for common, schemes, networks, python, typescript, testing |
.claude/settings.json |
Added | Hook wiring (PostToolUse on Edit/Write/MultiEdit/NotebookEdit) |
Purpose: Encode x402 conventions, domain knowledge, and code-review checklists as machine-readable rules so Claude can contribute safely without reading every file first.
2. E2E test harness (e2e/, integration/)
| File | Change Type | Description |
|---|---|---|
e2e/clients/test_client.py |
Added | EVM + TRON wallet, 402→retry flow, JSON output |
e2e/mock_facilitator/app.py |
Added | FastAPI mock: facilitator + GasFree API (mode-driven) |
e2e/mock_facilitator/__main__.py |
Added | CLI entry for the mock facilitator |
e2e/servers/resource_server.py |
Added | FastAPI resource server using @x402_protected |
e2e/scenarios/*/config.json |
Added | 8 mock scenarios (3 happy path, 5 unhappy) |
e2e/scenarios/testnet/*/config.json |
Added | 3 testnet scenarios with real chain settlement |
e2e/scenarios/run_all.sh |
Added | Mock suite entry point (CI) |
e2e/scenarios/run_testnet.sh |
Added | Testnet suite entry point (nightly) |
integration/run.py |
Added | Generic sequential step runner with Markdown reporting |
integration/commands.py |
Added | @json_diff, @json_match, @start_bg, @stop_bg, @wait_http built-ins |
Purpose: Provide a no-secrets, deterministic CI test suite for all three schemes across EVM and TRON, plus a nightly testnet tier with real on-chain settlement.
3. Reference facilitator (examples/facilitator/)
| File | Change Type | Description |
|---|---|---|
examples/facilitator/server.py |
Added | FastAPI wrapper for X402Facilitator; EVM + TRON registration from env vars |
examples/facilitator/__main__.py |
Added | CLI entry |
Purpose: Standalone reference implementation used by the testnet e2e tier and as documentation for integrators.
4. CI workflows (.github/workflows/)
| File | Change Type | Description |
|---|---|---|
.github/workflows/check_e2e.yml |
Added | PR check: runs the 8 mock scenarios |
.github/workflows/nightly_testnet.yml |
Added | Daily: testnet scenarios with secrets; skipped on forks |
Purpose: Wire the mock suite into PR CI and establish a nightly testnet gate.
5. Documentation and miscellaneous
| File | Change Type | Description |
|---|---|---|
docs/solutions.md |
Added | 10 documented bugs with symptom/root-cause/fix/rule format |
CLAUDE.md (root + components) |
Added | Project orientation for contributors and AI agents |
specs/ reorganization |
Renamed | docs/specs/ → specs/ (upstream layout mirror) |
.env.example |
Added | Testnet credential template (no secrets) |
.gitignore |
Modified | Adds .DS_Store, .claude/hooks/logs/ |
Detailed Findings
Major
[MJ-01] jq absence silently disables the PostToolUse test hook
| Property | Value |
|---|---|
| Severity | Major |
| Category | Correctness / Testing |
| File | .claude/hooks/run-tests-for-change.sh : Lines 29–38 |
Description
The hook extracts FILE_PATH by piping stdin through jq, with 2>/dev/null swallowing all errors:
FILE_PATH="$(printf '%s' "$INPUT" | jq -r '.tool_input.file_path // .tool_response.filePath // empty' 2>/dev/null)"If jq is not installed on the developer's machine (or a CI runner), the subshell returns an empty string. The very next guard exits 0:
if [[ -z "$FILE_PATH" ]]; then
exit 0
fiThe result is that every hook invocation silently succeeds without running any test. The developer or CI sees no error — tests are simply never executed. Because jq is not declared as a dependency anywhere in the repository, this creates a silent test-bypass risk that is hard to diagnose.
Recommendation
Add an explicit jq availability check at the top of the script, before any stdin reading:
if ! command -v jq >/dev/null 2>&1; then
echo "run-tests-for-change: 'jq' not found — install jq to enable test autorun" >&2
exit 0 # still exit 0 so Claude isn't blocked, but at least it's visible
fiAlso document jq as a requirement in the root CLAUDE.md setup section.
[MJ-02] Unhandled KeyError on malformed bodies in reference facilitator /verify and /settle
| Property | Value |
|---|---|
| Severity | Major |
| Category | Correctness / Robustness |
| File | examples/facilitator/server.py : Lines 293–303 |
Description
The /verify and /settle endpoints access required body keys with plain dict subscript:
@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
payload = PaymentPayload(**body["paymentPayload"]) # KeyError if missing
requirements = PaymentRequirements(**body["paymentRequirements"]) # KeyError if missing
@app.post("/settle")
async def _settle(body: dict[str, Any]) -> JSONResponse:
payload = PaymentPayload(**body["paymentPayload"]) # KeyError if missing
requirements = PaymentRequirements(**body["paymentRequirements"]) # KeyError if missingA missing key propagates as an unhandled exception through FastAPI, returning an opaque HTTP 500 instead of a descriptive 422. Because this file is explicitly labeled "reference implementation" in examples/CLAUDE.md, integrators may copy this pattern into production facilitators. The testnet e2e tier calls these endpoints under real-network conditions; any client sending a subtly malformed body during debugging would receive a 500 with no actionable error message.
Recommendation
Use .get() with a meaningful 422 response, or switch to a typed Pydantic request model so FastAPI validates automatically:
from pydantic import BaseModel
class VerifyRequest(BaseModel):
paymentPayload: dict
paymentRequirements: dict
@app.post("/verify")
async def _verify(body: VerifyRequest) -> JSONResponse:
payload = PaymentPayload(**body.paymentPayload)
requirements = PaymentRequirements(**body.paymentRequirements)
...FastAPI then returns a 422 with field-level detail on missing/malformed keys at no extra cost.
Minor
[MN-01] eval used for test command execution in hook script
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Code Quality / Security |
| File | .claude/hooks/run-tests-for-change.sh : Line 84 |
Description
OUTPUT="$(cd "$TEST_CWD" && eval "$TEST_CMD" 2>&1)"$TEST_CMD is currently populated only from hardcoded case-statement branches (e.g. "uv run pytest -x -q --tb=short"), so the immediate security risk is low. However, the TypeScript branch constructs $TEST_CWD from the user-controlled $PKG_REL (derived from $FILE_PATH), which comes from hook stdin. If $PKG_REL ever propagated into $TEST_CMD (e.g. via a future path-specific flag), the eval would be exploitable. More practically: eval makes the script harder to reason about and harder to test statically.
Recommendation
Replace with direct invocation or bash -c:
# Instead of eval "$TEST_CMD":
cd "$TEST_CWD" || exit 2
OUTPUT="$(bash -c "$TEST_CMD" 2>&1)"Or, for fully safe invocation, restructure TEST_CMD as an array and use "${TEST_CMD_ARRAY[@]}".
[MN-02] MOCK_GASFREE_PROVIDER and MOCK_GASFREE_ADDRESS are the same address
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Testing / Correctness |
| File | e2e/mock_facilitator/app.py : Lines 128–132 |
Description
MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"
MOCK_GASFREE_ADDRESS = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"In the real GasFree protocol these are semantically distinct: the provider (serviceProvider) is the relayer that submits on-chain transactions, while the GasFree address (gasFreeAddress) is the user's custodial wallet that holds the tokens. solutions.md entry #9 specifically documents how these two addresses are different levels of an address mapping chain.
Using the same value for both means the mock cannot catch bugs where client or facilitator code inadvertently swaps serviceProvider for the custodial wallet address (or vice versa). It also understates the distinction to developers reading the mock code for protocol guidance.
Recommendation
Use two distinct, clearly labeled addresses:
MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC" # relayer/service-provider
MOCK_GASFREE_ADDRESS = "TEkxiTehnzSmSe2XqrBj4w32RUN966rdz8" # user custodial walletAdd a comment citing solutions.md #9 so future maintainers understand why they differ.
[MN-03] Module-level build_app() call with conditional side effects at import time
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Correctness / Code Quality |
| File | e2e/servers/resource_server.py : Lines 119, 141 |
Description
app = build_app() executes unconditionally at module import time. Inside build_app(), _install_noop_tx_verifier_if_requested() may monkey-patch a global function in bankofai.x402.utils.tx_verification:
def _install_noop_tx_verifier_if_requested() -> None:
if os.environ.get("E2E_SKIP_TX_VERIFICATION") != "1":
return
from bankofai.x402.utils import tx_verification
tx_verification.get_verifier_for_network = _always_raise # module-level mutationAny tool (test runner, IDE import analyzer, python -c "import e2e.servers.resource_server") that imports this module while E2E_SKIP_TX_VERIFICATION=1 is set will silently corrupt the tx_verification module for the rest of the Python process. Because both resource_server.py and examples/facilitator/server.py execute their build_app() at import time, the patch applies before the caller can prevent it.
Recommendation
Guard the module-level app initialization behind if __name__ == "__main__": for the monkey-patch case, or restructure so app is created lazily (e.g., via an application factory pattern where the caller invokes build_app() explicitly). At minimum, add a module docstring note warning that importing this module with E2E_SKIP_TX_VERIFICATION=1 set will mutate a shared utility module.
[MN-04] run_all.sh does not continue running remaining scenarios after a mid-run failure
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Testing |
| File | e2e/scenarios/run_all.sh : Lines 21–28 |
Description
for s in "${scenarios[@]}"; do
cfg="e2e/scenarios/$s/config.json"
if ! python3 -m integration.run --config "$cfg"; then
failed+=("$s")
fi
doneEach individual integration/run.py invocation is fail-fast (stops at the first failing step). If a scenario fails before its @stop_all_bg step (e.g. the client step fails), the mock facilitator and resource server processes remain running on ports 4020/4021. run_all.sh then moves on to the next scenario, which tries to start new services on the same ports.
The @start_bg builtin does attempt to kill the previous process via its PID file:
if pid_file.exists():
old_pid = int(pid_file.read_text().strip())
os.kill(old_pid, signal.SIGTERM)This mitigates the issue in most cases, but there is no explicit wait for the port to be freed by the OS after SIGTERM. Under load on CI, the subsequent @start_bg may launch the new process before the TCP socket clears, causing the @wait_http readiness poll to eventually time out and fail the next scenario with a confusing error. Uvicorn's default graceful shutdown takes up to 5 seconds.
Recommendation
Add a brief @stop_all_bg step to the beginning of each scenario config (not just the end) as a "clean start" guard, OR add an explicit sleep/port-wait after @stop_all_bg in each scenario. The simplest fix is a "cleanup" first step:
{
"name": "ensure no stale services from previous run",
"command": "@stop_all_bg",
"timeout": 10
}This would make each scenario self-sufficient regardless of what the previous one left behind.
[MN-05] Missing per-scenario README.md files — convention violated for 10 of 11 directories
| Property | Value |
|---|---|
| Severity | Minor |
| Category | Documentation |
| File | e2e/scenarios/*/ |
Description
.claude/rules/testing/scenarios.md explicitly lists README.md as a required file for every scenario directory. Only e2e/scenarios/exact_permit_happy_path/README.md was added. The remaining directories are missing READMEs:
exact_happy_path/exact_gasfree_happy_path/unhappy_verify_invalid_sig/unhappy_verify_expired/unhappy_verify_network_mismatch/unhappy_settle_insufficient/unhappy_settle_revert/testnet/exact_testnet/testnet/exact_permit_testnet/testnet/exact_gasfree_testnet/
This matters for maintainability: a developer reading an unfamiliar scenario directory has no immediate explanation of purpose, preconditions, or run command.
Recommendation
Add minimal READMEs for each missing directory. The exact_permit_happy_path/README.md can serve as the template.
Suggestions
[S-01] Pin GitHub Actions to SHA commits for supply-chain security
File: .github/workflows/check_e2e.yml, nightly_testnet.yml
Description: Both workflows reference actions at mutable major-version tags (actions/checkout@v4, astral-sh/setup-uv@v5, actions/upload-artifact@v4). A compromised action tag could execute arbitrary code in CI with access to all GitHub Secrets.
Suggestion: Pin each action to a specific commit SHA, e.g. actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2. Tools like Dependabot or pin-github-action automate this.
[S-02] @json_match spec format cannot express ~ in a plain = expected value
File: integration/commands.py : Lines 313–336
Description: The spec parser uses ~ as the regex-operator token. The disambiguation logic spec.index("~") < spec.index("=") means that any =-type assertion whose expected value contains ~ (e.g. body.hash=0x1234~abcd) would be misrouted to the regex branch, causing a confusing test failure. The usage string does not document this limitation.
Suggestion: Document the limitation in the usage= string for @json_match. Alternatively, require explicit ~= (not ~) as the regex separator to eliminate ambiguity entirely.
[S-03] shell=True in subprocess calls throughout the integration runner
File: integration/run.py : Line 83; integration/commands.py : Line 167
Description: subprocess.run(command, shell=True, ...) and subprocess.Popen(command, shell=True, ...) are used to execute shell commands. While the commands originate from JSON config files and hardcoded strings (not direct user input), shell=True is a latent risk if the input sources ever expand. Additionally, it requires the shell to parse quoting/escaping, which is where solutions.md entry #8 bug came from.
Suggestion: Consider wrapping known fixed commands (e.g., pnpm test, uv run pytest) as argument lists and using shell=False. For variable commands from config files, document the shell injection risk explicitly in integration/README.md.
[S-04] Verify that unhappy_settle_revert uses a unique output path
File: e2e/scenarios/unhappy_settle_revert/config.json
Description: unhappy_settle_insufficient writes to /tmp/x402-e2e-unhappy-settle-response.json. This reviewer did not read unhappy_settle_revert/config.json — if it also writes to the same path, a failure of one scenario could leave stale data that causes the other to produce a false pass or a misleading diff. The verify-scenario group already uses unique paths (-verify-response.json, -verify-expired-response.json, etc.), so this may be already handled.
Suggestion: Verify that unhappy_settle_revert uses a distinct output path (e.g., /tmp/x402-e2e-unhappy-settle-revert-response.json) separate from unhappy_settle_insufficient's /tmp/x402-e2e-unhappy-settle-response.json.
Positive Observations
| Area | Observation |
|---|---|
docs/solutions.md |
Exceptional knowledge base. 10 entries, each with consistent symptom/root-cause/fix/rule structure. Entries 1–3 directly map to documented production bugs with commit references — this is exactly the kind of tribal knowledge that prevents regressions. |
| Two-tier test design | Mock tier (no creds, ~30s) + testnet tier (nightly, real chain) is the right split. Testnet scenarios self-skip when secrets are absent rather than failing, which is the correct design for CI with optional credentials. |
@start_bg robustness |
The kill-before-restart logic (PID file + SIGTERM → SIGKILL escalation) and start_new_session=True process group isolation are solid process management. Most custom test harnesses get this wrong. |
| Declarative scenario format | Pure JSON config.json + expected_*.json with a generic runner is exactly right. Adding a scenario requires zero Python — copy a directory, edit JSON. This is a significant maintainability win. |
E2E_SKIP_ALLOWANCE / E2E_SKIP_TX_VERIFICATION flags |
Explicit environment flags for the two "can't run this offline" checks (allowance and tx verification) are well-designed. The overrides are narrowly scoped (no-chain RPC path only) and clearly documented. |
| Security defaults | TRON_TEST_KEY carries a prominent warning comment. .env is gitignored. GitHub Secrets are used for testnet credentials. The nightly workflow adds github.repository_owner == 'BofAI' guard to prevent fork-secret leakage. |
asyncRewake hook integration |
The Claude Code hook uses asyncRewake: true + rewakeMessage to surface test failures back to the model. This closes the feedback loop for AI-driven edits cleanly. |
_AllowanceSkippingEvmSigner / _OfflineTronSigner subclass pattern |
Keeping offline overrides as subclasses of the real signers (rather than duck-typed mocks) ensures the e2e client exercises the full SDK code path for everything except the chain RPC calls. |
run_all.sh continues past failures |
The outer script collects failed scenarios and reports them all rather than aborting at the first one. This gives complete CI feedback in a single run. |
integration/run.py is pure |
run_step and format_report are side-effect-free functions that take inputs and return outputs. State flows through return values. This makes the runner easy to test in isolation. |
Checklist Results
| Category | Items Checked | Pass | Fail | N/A | Notes |
|---|---|---|---|---|---|
| Correctness | 8 | 6 | 2 | 0 | MJ-02 (KeyError), MN-04 (port reuse) |
| Security | 8 | 6 | 1 | 1 | MJ-01 (silent hook bypass), S-03 (shell=True) |
| Performance | 5 | 4 | 0 | 1 | No DB/unbounded ops in changed code |
| Code Quality | 8 | 6 | 2 | 0 | MN-01 (eval), MN-03 (module-level side effects) |
| Testing | 6 | 4 | 1 | 1 | MN-05 (missing READMEs) |
| Documentation | 5 | 4 | 1 | 0 | MN-05 (per-scenario READMEs) |
| Compatibility | 4 | 4 | 0 | 0 | No breaking changes; specs/ rename is additive |
| Observability | 4 | 4 | 0 | 0 | JSONL log in mock facilitator; hook logs in .claude/hooks/logs/ |
Disclaimer
This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between main and feat/x402-ai. Runtime behavior, integration testing, and deployment impact are not covered. No production payment-path code (signing, verification, settlement) was changed in this PR; security findings are scoped to the tooling and test infrastructure introduced here.
Report generated by Code Review Skill v1.0.0
Date: 2026-04-24
Summary
Adopts a Claude-Code-native development layout and verifies the three payment schemes on real testnets.
AI scaffolding — root + 7 component
CLAUDE.md, full.claude/rules/knowledge base (common / schemes×3 / networks×2 / python / typescript / testing + index), 3 specialized subagents (code-reviewer,security-reviewer,scheme-author), 2 slash commands (/x402:compound,/x402:create-scenario), and aPostToolUsetest-dispatch hook that routesEdit|Write|MultiEdit|NotebookEditto the right suite by path.Spec layout — collapses redundant
docs/specs/intospecs/to mirror upstreamx402-foundation/x402/specs/. 46 cross-references updated.specs/now holds both protocol-level (top-level +schemes/) and in-flight feature specs (001-exact-v2-compat/).docs/is now solutions + design only.E2E testnet tier — 3 real-chain scenarios (
exact_testnet,exact_permit_testnet,exact_gasfree_testnet) now runnable with a populated.env. Two scenario bugs fixed (${VAR:-default}syntax, tx-hash regex) +examples/facilitator/server.pygainsFACILITATOR_BASE_FEEenv so permit/gasfree mechanisms actually register tokens.Knowledge base — 4 new
docs/solutions.mdentries capturing incidents discovered during real-chain validation (GasFree address-mapping trap, TRON multisigpermission_idrequirement, missing base_fee config, integration runner${VAR:-default}limitation).Verified on real chain (2026-04-24)
exact_testnet461d17baae60b83031de55bb1635bd5f8453d359887e7e82ebee605ab6c159a0exact_permit_testnet0xfc8b32decb99d02cdfc684d3f6f1c7c0a91c8b0ff1f632f98d86d9f334198a23(block 103475005)exact_gasfree_testnet1d77f242b72293116e65c46b5ad756dd2f8355ebc625078aec0eb4ea54d148d2Regression
Scope boundary
This PR is docs +
.claude/scaffolding + reference facilitator enhancements + e2e tier. Does NOT port the MCP package or the matrix harness (P2+ indocs/design/ai-transformation.md— separate initiatives).Migration notes for reviewers
docs/specs/<X>.mdmust update tospecs/<X>.md(protocol) orspecs/schemes/<X>.md(per-scheme). Rungit grep "docs/specs"on your branch before rebasing onto this.FACILITATOR_BASE_FEErequired when runningexact_permitorexact_gasfreevia the reference facilitator. JSON dict of{symbol: smallest-unit fee}. Seeexamples/facilitator/server.pydocstring..claude/hooks/run-tests-for-change.shruns automatically when Claude edits.py/.ts/.tsx/e2e//integration/files. Docs/config edits skip. Failures surface viaasyncRewake.Known follow-ups (not in this PR)
.github/workflows/check_e2e.yml— confirm it's wired as a PR required-check sorun_all.shblocks merges on regression.x402-demo/ other local SDK sessions — document or move scenarios to 8113.exact_permit_testnetis verified on BSC USDT, butFACILITATOR_BASE_FEEtuning will vary per deployment.Commits (13, conventional)
```
84d26e6 docs(gasfree): record testnet win + two new gotchas
ea04ab2 feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
f4c0c34 refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
3130e21 docs(e2e): record real-chain evidence + testnet gotchas
8045ba1 fix(e2e): make testnet scenarios runnable
45be4ca chore(ai): gitignore .DS_Store and .claude/hooks/logs/
e0136ec feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
1983edf feat(e2e): add testnet tier with real chain settlement
3509b70 feat(e2e): expand harness to 8 scenarios across EVM + TRON
900e5bd feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
aec6836 docs(design): draft AI transformation plan
```
Test plan
🤖 Generated with Claude Code