Skip to content

feat(ai): AI-native development scaffolding + e2e testnet verification#69

Merged
Will-Guan merged 13 commits intomainfrom
feat/x402-ai
Apr 24, 2026
Merged

feat(ai): AI-native development scaffolding + e2e testnet verification#69
Will-Guan merged 13 commits intomainfrom
feat/x402-ai

Conversation

@boboliu-1010
Copy link
Copy Markdown
Collaborator

Summary

Adopts a Claude-Code-native development layout and verifies the three payment schemes on real testnets.

AI scaffolding — root + 7 component CLAUDE.md, full .claude/rules/ knowledge base (common / schemes×3 / networks×2 / python / typescript / testing + index), 3 specialized subagents (code-reviewer, security-reviewer, scheme-author), 2 slash commands (/x402:compound, /x402:create-scenario), and a PostToolUse test-dispatch hook that routes Edit|Write|MultiEdit|NotebookEdit to the right suite by path.

Spec layout — collapses redundant docs/specs/ into specs/ to mirror upstream x402-foundation/x402/specs/. 46 cross-references updated. specs/ now holds both protocol-level (top-level + schemes/) and in-flight feature specs (001-exact-v2-compat/). docs/ is now solutions + design only.

E2E testnet tier — 3 real-chain scenarios (exact_testnet, exact_permit_testnet, exact_gasfree_testnet) now runnable with a populated .env. Two scenario bugs fixed (${VAR:-default} syntax, tx-hash regex) + examples/facilitator/server.py gains FACILITATOR_BASE_FEE env so permit/gasfree mechanisms actually register tokens.

Knowledge base — 4 new docs/solutions.md entries capturing incidents discovered during real-chain validation (GasFree address-mapping trap, TRON multisig permission_id requirement, missing base_fee config, integration runner ${VAR:-default} limitation).

Verified on real chain (2026-04-24)

Scenario Network Tx hash
exact_testnet BSC Testnet 461d17baae60b83031de55bb1635bd5f8453d359887e7e82ebee605ab6c159a0
exact_permit_testnet BSC Testnet 0xfc8b32decb99d02cdfc684d3f6f1c7c0a91c8b0ff1f632f98d86d9f334198a23 (block 103475005)
exact_gasfree_testnet TRON Nile 1d77f242b72293116e65c46b5ad756dd2f8355ebc625078aec0eb4ea54d148d2

Regression

pytest:  217 passed
vitest:   51 passed
e2e mock: 8 scenarios green (run_all.sh, ~35s, no secrets)
e2e testnet: 3 scenarios green (run_testnet.sh with .env, ~45s real chain)

Scope boundary

This PR is docs + .claude/ scaffolding + reference facilitator enhancements + e2e tier. Does NOT port the MCP package or the matrix harness (P2+ in docs/design/ai-transformation.md — separate initiatives).

Migration notes for reviewers

  • Spec paths changed: any downstream doc or code referencing docs/specs/<X>.md must update to specs/<X>.md (protocol) or specs/schemes/<X>.md (per-scheme). Run git grep "docs/specs" on your branch before rebasing onto this.
  • New env var FACILITATOR_BASE_FEE required when running exact_permit or exact_gasfree via the reference facilitator. JSON dict of {symbol: smallest-unit fee}. See examples/facilitator/server.py docstring.
  • New hook at .claude/hooks/run-tests-for-change.sh runs automatically when Claude edits .py / .ts / .tsx / e2e/ / integration/ files. Docs/config edits skip. Failures surface via asyncRewake.

Known follow-ups (not in this PR)

  • .github/workflows/check_e2e.yml — confirm it's wired as a PR required-check so run_all.sh blocks merges on regression.
  • Port 8013 collision with x402-demo / other local SDK sessions — document or move scenarios to 8113.
  • exact_permit_testnet is verified on BSC USDT, but FACILITATOR_BASE_FEE tuning will vary per deployment.
  • Pre-existing Dependabot alerts (4 moderate on default branch) — orthogonal.

Commits (13, conventional)

```
84d26e6 docs(gasfree): record testnet win + two new gotchas
ea04ab2 feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
f4c0c34 refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
3130e21 docs(e2e): record real-chain evidence + testnet gotchas
8045ba1 fix(e2e): make testnet scenarios runnable
45be4ca chore(ai): gitignore .DS_Store and .claude/hooks/logs/
e0136ec feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
1983edf feat(e2e): add testnet tier with real chain settlement
3509b70 feat(e2e): expand harness to 8 scenarios across EVM + TRON
900e5bd feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
aec6836 docs(design): draft AI transformation plan
```

Test plan

  • Fresh clone + `uv run pytest` in `python/x402` — expect 217 passed
  • `pnpm test --run` in `typescript/packages/x402` — expect 51 passed
  • `./e2e/scenarios/run_all.sh` — expect "ALL GREEN (8 scenarios)"
  • Inspect `specs/` layout vs upstream `x402-foundation/x402/specs/` for mirror correctness
  • Verify `.claude/hooks/run-tests-for-change.sh` is executable and `.claude/settings.json` loads without error
  • (optional) Fill `.env` from `.env.example` and `./e2e/scenarios/run_testnet.sh` for real-chain sanity

🤖 Generated with Claude Code

bobo and others added 11 commits April 22, 2026 12:56
Consolidates the P1-P4 roadmap for turning x402 into a Claude-Code-
native, agent-native payment platform. References Coinbase x402,
sun-protocol/apollo-arena, and BofAI/skills as prior art; enumerates
what already exists on main, what already exists in sibling repos,
and what actually needs to be built (.claude/ scaffolding, MCP
transport, declarative e2e scenarios).
…runner

Land the first phase of the AI transformation (see
docs/design/ai-transformation.md).

What this adds:
- CLAUDE.md (root): component index + key reading order + conventions
  + AI-native layout pointers + safety rules
- .claude/rules/common/conventions.md: repo-wide invariants (addressing,
  encoding, payment ID, pipeline step ordering, commit message format)
- .claude/rules/schemes/{exact,exact-permit,exact-gasfree}.md: per-scheme
  rules of thumb with gotchas cross-referenced to solutions.md
- .claude/rules/networks/{tron,evm}.md: network identifiers, signing rules,
  contract registry pointers
- .claude/commands/x402/{compound,create-scenario}.md: two wizards
  (compound implements the /x402:compound referenced by solutions.md;
  create-scenario scaffolds e2e scenarios)
- .claude/agents/{code-reviewer,security-reviewer,scheme-author}.md:
  x402-flavored subagent definitions
- integration/: generic Python step runner (no external deps)
  with @json_diff built-in, JSON config format, Markdown report output.
  Clean-room implementation; pattern inspired by sun-protocol/apollo-arena.
- integration/configs/smoke.json: self-test verified passing
- e2e/scenarios/exact_permit_happy_path/: canonical scenario template;
  runnable once facilitator --mock mode lands in P2

Smoke test:
  python3 integration/run.py --config integration/configs/smoke.json
  → 2/2 ✓, exit 0

Non-goals for this phase (deferred to P2+):
- MCP transport packages (mcp-ts, mcp-py)
- Matrix e2e harness
- mock-chainrpc and mock-facilitator
- Per-package CLAUDE.md files
Adds a full e2e test harness — mock facilitator, fixture resource
server, test client, and 8 declarative scenarios covering happy +
unhappy paths for all three schemes.

Scenarios (all green, ~30s total, fully deterministic, no chain RPC):
  - exact_happy_path          ERC-3009 on BSC testnet
  - exact_permit_happy_path   EIP-2612 on BSC testnet (allowance skipped)
  - exact_gasfree_happy_path  TIP-712 on TRON Nile (GasFree API mocked)
  - unhappy_verify_invalid_sig / _expired / _network_mismatch
  - unhappy_settle_insufficient / _revert

Key design choices:
  - Live signing, mock verify — client signs real payloads; the mock
    facilitator echoes success/failure based on its mode.
  - One mock process serves both facilitator and GasFree API roles on
    port 4020 — no second service needed.
  - Test-only signer subclasses (_AllowanceSkippingEvmSigner,
    _OfflineTronSigner) bypass on-chain balance/allowance checks
    without touching SDK code.
  - Resource server monkey-patches get_verifier_for_network when
    E2E_SKIP_TX_VERIFICATION=1 so the TRON scenario doesn't try to
    look up the fake tx hash on a live node.

Also:
  - integration/commands.py: add @start_bg / @wait_http / @stop_bg /
    @stop_all_bg builtins so scenarios can manage background services.
  - .github/workflows/check_e2e.yml: runs run_all.sh on every PR.
  - docs/design/ai-transformation.md: refresh §6.2 scenario table with
    status column; mark P1 immediate-steps items done.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extends the e2e harness with a second tier that settles real on-chain
transactions on BSC testnet and TRON Nile, alongside the existing mock
suite. Scenarios self-skip when their env vars are blank, so the mock
path remains the default (and the CI default on PR).

New pieces:

  examples/facilitator/            FastAPI server wrapping X402Facilitator.
                                   Registers EVM/TRON mechanisms based on
                                   which *_PRIVATE_KEY env vars are set.
                                   Run with: python -m examples.facilitator

  .env.example                     Template listing every testnet variable
                                   (RPC URLs, signing keys, pay-to addresses,
                                   GasFree API URL, price specs). .env is
                                   already gitignored.

  e2e/scenarios/testnet/           exact_testnet, exact_permit_testnet,
                                   exact_gasfree_testnet — each uses the
                                   example facilitator + real chain, with
                                   @json_match assertions (tx hash is
                                   dynamic so @json_diff won't work).

  e2e/scenarios/run_testnet.sh     Driver that checks required env per
                                   scenario and skips when blank.

  .github/workflows/nightly_testnet.yml   Scheduled + workflow_dispatch
                                          runner consuming GitHub Secrets.

Harness additions:

  integration/commands.py @json_match  New fuzzy assertion builtin:
    @json_match <file> key.path=literal key.path~regex
    Supports JSON literals (true/false/null/numbers) and regex for
    dynamic fields like tx hashes.

  integration/commands.py @start_bg  Now expands ${VAR} before quoting so
    scenario configs can template from env (needed to thread secrets
    through nested `env KEY=${SECRET} python -m ...` invocations).

Mock suite (8 scenarios) remains the default and all stay green. The
testnet suite is a superset — same test_client.py and resource_server.py
drive both. Developers fill in .env to run locally; CI uses Secrets.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… + test hook

- Component CLAUDE.md in python/x402, typescript/, typescript/packages/x402,
  e2e/, examples/, docs/, specs/ — local build/test commands and conventions.
- .claude/rules/CLAUDE.md index + python/testing/typescript rule files, so
  Claude loads the right rule layer per touched path.
- .claude/settings.json + hooks/run-tests-for-change.sh: PostToolUse hook
  dispatches to pytest / vitest / e2e run_all.sh by path prefix; docs/config
  edits skip. Dropped --timeout=60 (pytest-timeout not a dep; outer 300s
  hook timeout already covers).
- Root CLAUDE.md: refresh the rules table and add component-CLAUDE.md
  pointer row.

Scope: docs + .claude/ only, per the AI-transformation scope. MCP package
and tron-contribution/ stay out of this commit.

Verified: 217 pytest + 51 vitest + 8 e2e scenarios all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prevent macOS artefacts and per-session hook run logs from showing up in
status / PRs. Logs are regenerated by run-tests-for-change.sh each time
Claude edits code; they have no cross-session value.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs surfaced when actually running ./e2e/scenarios/run_testnet.sh with
populated .env:

1. ${VAR:-default} shell-style default expansion isn't handled by the
   integration runner — commands.py:151 uses os.path.expandvars(), which
   only recognizes $VAR / ${VAR} and leaves ${VAR:-x} untouched. The
   literal string then gets shlex-split on the space, breaking env
   parsing. Drop the :-default fallback; run_testnet.sh already checks
   required vars before invoking each scenario.

2. Transaction-hash assertion regex required a 0x prefix, but the SDK
   returns bare 64-hex hashes on BSC. Loosen to ^(0x)?[0-9a-fA-F]{64}$,
   matching the gasfree scenario's existing pattern.

Verified: exact_testnet now settles a real BSC Testnet tx and passes end
to end (7 steps, ~11s). exact_permit_testnet progresses past config to
reveal a separate ecosystem issue (BSC testnet USDT doesn't implement
EIP-2612 permit — tracked as a follow-up).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- solutions.md #7: BSC testnet has no EIP-2612 permit-capable token.
  Captures why exact_permit_testnet returns 404 and lists the options
  (Sepolia / custom test token / rely on unit suites), so the next
  person doesn't rediscover it.
- solutions.md #8: integration runner doesn't expand ${VAR:-default}.
  Captures the os.path.expandvars limitation that caused the testnet
  scenarios to silently break on a literal fallback string.
- testnet/README.md: add "Current status" table with today's verified
  BSC tx hash as evidence, and a "Gotchas" section linking back to
  solutions #7 / #8, the tx-hash regex, and the log locations.

No code change — just freezing today's testnet run state so future
attempts start from known ground.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pecs/

Upstream x402-foundation/x402 keeps all protocol specs under a single
top-level specs/ dir. We had a redundant docs/specs/ shadowing what
belongs at specs/, which made "which one is authoritative" a coin flip
and drifted us further from upstream every time we edited.

Moves:
- docs/specs/{protocol,roles,config}.md → specs/*.md
- docs/specs/{exact,exact-permit,exact-gasfree}.md → specs/schemes/*.md

specs/ now hosts both the authoritative protocol specs (top-level .md
+ schemes/) and the in-flight feature dirs (001-exact-v2-compat/, ...),
matching upstream's schemes/ + transports-v*/ shape.

Updated 46 cross-references across CLAUDE.md tree, .claude/rules/,
.claude/agents/, docs/, and typescript/packages/x402/.

docs/ is now just user-facing: solutions.md (hard-won knowledge) and
design/ (in-flight design docs). Not authoritative; defers to specs/.

Verified: 217 pytest + 51 vitest + 8 e2e scenarios still green after
the rename — nothing in the code referenced the old paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…anisms

Permit-family facilitator mechanisms (EVM exact_permit, TRON exact_permit,
TRON exact_gasfree) all inherit BaseExactPermitFacilitatorMechanism which
requires a base_fee: dict[str, int] allow-list at construction time.
examples/facilitator/server.py was constructing them without base_fee, so
every token in every registry was reported Unsupported and every testnet
run of the permit/gasfree tiers silently returned HTTP 404.

- server.py now reads FACILITATOR_BASE_FEE as a JSON dict and passes it
  to all three permit/gasfree mechanism constructors; logs a clear warning
  when a permit/gasfree scheme is registered without it.
- testnet scenario configs now forward FACILITATOR_BASE_FEE into the
  facilitator subprocess.
- .env.example / README updated.
- docs/solutions.md #7 corrected — the earlier entry blaming BSC testnet
  USDT for not implementing EIP-2612 was a misdiagnosis; USDT does
  implement permit. The real cause was the missing base_fee config.

Verified live 2026-04-24:
- exact_permit_testnet: BSC tx 0xfc8b32decb99d02cdfc684d3f6f1c7c0a91c8b0ff1f632f98d86d9f334198a23 (USDT permit + transferFrom, block 103475005)
- exact_gasfree_testnet: fee_quote / verify / settle all traverse; final
  settle reports insufficient balance in the derived GasFree custodial
  wallet when it isn't funded — structural path end-to-end works.

Regression: 217 pytest + 51 vitest + 8 mock e2e all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All three testnet scenarios now verified end-to-end on real chain:
- exact_testnet         — BSC tx 461d17b...159a0
- exact_permit_testnet  — BSC tx 0xfc8b32d...98a23
- exact_gasfree_testnet — TRON Nile tx 1d77f242b72293116e65c46b5ad756dd2f8355ebc625078aec0eb4ea54d148d2

Two new solutions entries from closing the gasfree loop:

#9 GasFree gasFreeAddress is per-query, not absolute. The API returns a
   fresh mapping for every queried address; our SDK queries with the main
   wallet and writes that layer's gasFreeAddress into the payload. If you
   deposit to a gasFreeAddress obtained by querying any *other* address
   (e.g. one resolved from a second hop), the balance won't register.
   Always deposit to api_client.get_address_info(main_wallet)["gasFreeAddress"].

#10 TRON multisig / witness accounts reject single-key signatures under
    the default permission_id=0 (owner). If owner_permission.threshold > 1,
    sign under an active permission via .permission_id(2) — confirmed on
    the Nile witness account used in our testnet .env. tronpy default path
    would otherwise return "Validate signature error: sig error" with no
    hint at the permission cause.

Testnet README status table updated: all three rows ✅ verified.

Regression: 217 pytest + 51 vitest + 8 mock e2e still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Code Review Report

Project: x402 (BankofAI SDK)
PR: mainfeat/x402-ai
Review Date: 2026-04-24
Reviewer: AI Code Reviewer (Code Review Skill v1.0.0)


PR Overview

Branch Information

Property Value
From Branch main
To Branch feat/x402-ai
Commits 11
Files Changed 79
Lines Added +4579
Lines Removed -0 (pure addition)

Commit History

Hash Message
84d26e6 docs(gasfree): record testnet win + two new gotchas
ea04ab2 feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
f4c0c34 refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
3130e21 docs(e2e): record real-chain evidence + testnet gotchas
8045ba1 fix(e2e): make testnet scenarios runnable
45be4ca chore(ai): gitignore .DS_Store and .claude/hooks/logs/
e0136ec feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
1983edf feat(e2e): add testnet tier with real chain settlement
3509b70 feat(e2e): expand harness to 8 scenarios across EVM + TRON
900e5bd feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
aec6836 docs(design): draft AI transformation plan

Review Summary

Verdict

Verdict: Request Changes

One Critical defect (hardcoded developer path) must be fixed before merge. Three Major issues (file-descriptor leak, module-level side-effect monkey-patch, unhandled startup crash) should be addressed. Five Minor issues and two Suggestions are noted below.

Findings at a Glance

Critical Major Minor Suggestion
Count 1 3 5 2

Summary

This PR introduces the full P1 AI-native scaffolding for the x402 repository: a .claude/ rules and agent knowledge base, a component-level CLAUDE.md tree, a generic declarative integration step-runner (integration/), and an eight-scenario e2e harness (e2e/) covering EVM and TRON happy paths plus five failure modes. It also reorganises docs/specs/specs/ and adds a testnet tier wired into a nightly GitHub Actions workflow.

The overall architecture is well-considered. The JSON-driven scenario runner, the dual mock/testnet tier split, the GasFree API mock colocation, and the asyncRewake hook pattern all reflect careful design. Secret hygiene is solid: .env.example carries no real values, .env is git-ignored, and CI consumes GitHub Secrets.

However, one defect is blocking: the PostToolUse hook contains a hardcoded absolute developer path (/Users/bobo/code/x402/x402), which silently renders the entire hook infrastructure non-functional on every other machine. Three additional Major issues — a file-descriptor leak in @start_bg, an import-time monkey-patch in the resource server, and an unhandled json.loads crash path in the example facilitator — should be resolved before the testnet tier goes to production.


Change Summary

1. AI-Native Scaffolding (.claude/)

File Change Type Description
CLAUDE.md Added Root-level repo guide for Claude Code
.claude/rules/CLAUDE.md Added Rules knowledge-base index and authoring guide
.claude/rules/common/conventions.md Added Repo-wide conventions (addressing, amounts, headers, pipeline)
.claude/rules/networks/evm.md Added EVM network conventions (signing, contracts, smoke tests)
.claude/rules/networks/tron.md Added TRON network conventions (TIP-712, GasFree, RPC defaults)
.claude/rules/schemes/exact.md Added exact scheme invariants and gotchas
.claude/rules/schemes/exact-permit.md Added exact_permit scheme invariants
.claude/rules/schemes/exact-gasfree.md Added exact_gasfree scheme invariants (TRON-only)
.claude/rules/python/conventions.md Added Python 3.11+ tooling and idiom rules
.claude/rules/typescript/conventions.md Added Node 20+/pnpm/ESM/viem conventions
.claude/rules/testing/scenarios.md Added e2e scenario authoring conventions
.claude/agents/code-reviewer.md Added x402-flavored code-review agent definition
.claude/agents/security-reviewer.md Added Security-focused review agent definition
.claude/agents/scheme-author.md Added Scheme-authoring wizard agent definition
.claude/commands/x402/compound.md Added /x402:compound slash-command wizard
.claude/commands/x402/create-scenario.md Added /x402:create-scenario wizard
.claude/hooks/run-tests-for-change.sh Added PostToolUse bash hook dispatching pytest/vitest/e2e
.claude/settings.json Added Hook wiring with asyncRewake

Purpose: Establishes Claude Code as a first-class participant in the development workflow by providing scoped rule sheets, agent personas, slash-command wizards, and an automated test hook.


2. Integration Step Runner (integration/)

File Change Type Description
integration/run.py Added Generic JSON-driven sequential step executor with Markdown reporter
integration/commands.py Added @builtin registry: @json_diff, @json_match, @start_bg, @stop_bg, @stop_all_bg, @wait_http
integration/README.md Added CLI reference
integration/CLAUDE.md Added Component guide
integration/configs/smoke.json Added Self-test config

Purpose: Provides the infrastructure that e2e scenario configs (config.json) are executed against — a generic, dependency-free Python 3.9+ runner.


3. E2E Test Harness (e2e/)

File Change Type Description
e2e/mock_facilitator/app.py Added FastAPI mock serving facilitator + GasFree API with mode-driven behavior
e2e/mock_facilitator/__main__.py Added CLI entry point for mock facilitator
e2e/clients/test_client.py Added Real-signature EVM + TRON e2e client
e2e/servers/resource_server.py Added FastAPI resource server using @x402_protected
e2e/scenarios/*/config.json Added 8 mock scenarios + 3 testnet scenarios
e2e/scenarios/*/expected_*.json Added Diff-asserted expected responses
e2e/scenarios/run_all.sh Added Mock suite CI entry point
e2e/scenarios/run_testnet.sh Added Testnet suite with credential-gating
e2e/README.md, e2e/CLAUDE.md Added Documentation

Purpose: End-to-end validation across EVM (BSC) and TRON (Nile) for all three schemes and five failure modes, runnable offline (mock tier) and on real chains (testnet tier).


4. Example Facilitator (examples/facilitator/)

File Change Type Description
examples/facilitator/server.py Added Reference FastAPI facilitator wiring EVM + TRON mechanisms
examples/facilitator/__main__.py Added CLI entry point
examples/CLAUDE.md Added Component guide

Purpose: Provides a runnable reference facilitator used by the testnet e2e tier and as a canonical implementation example.


5. Docs, Specs Reorganisation, CI

File Change Type Description
specs/ (renamed from docs/specs/) Renamed Mirrors upstream x402-foundation/x402/specs/ layout
docs/design/ai-transformation.md Added AI-native development plan
docs/solutions.md Added Incident-driven bug-avoidance checklist
.env.example Added Template for testnet credentials
.github/workflows/check_e2e.yml Added PR CI: mock e2e suite
.github/workflows/nightly_testnet.yml Added Nightly: real-chain testnet scenarios
.gitignore Modified Adds .DS_Store, .claude/hooks/logs/

Detailed Findings


Critical

[C-01] Hardcoded Developer Absolute Path in PostToolUse Hook

Property Value
Severity Critical
Category Correctness
File .claude/hooks/run-tests-for-change.sh : Line 20

Description

The hook hard-codes an absolute path to the original developer's local checkout:

REPO_ROOT="/Users/bobo/code/x402/x402"

All subsequent path logic depends on this variable. On any other machine, FILE_PATH#$REPO_ROOT/ will not strip the repo prefix, leaving REL equal to the full absolute path. Every case match will fail, and the hook will silently exit 0 (the "*" catch-all logs "skip (no rule for …)" and exits 0). The asyncRewake mechanism will therefore never fire for anyone other than the original author. The entire test automation promised by the hook is non-functional on CI and all other developer machines.

Code

REPO_ROOT="/Users/bobo/code/x402/x402"
LOG_DIR="$REPO_ROOT/.claude/hooks/logs"
...
REL="${FILE_PATH#$REPO_ROOT/}"

Recommendation

Derive REPO_ROOT dynamically from the script's own location (which Claude Code passes as a resolved absolute path via the hook input):

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"

This makes the hook portable across machines, containers, and CI workers. The LOG_DIR derivation then follows naturally.


Major

[MJ-01] File Descriptor Leak in @start_bg Builtin

Property Value
Severity Major
Category Correctness
File integration/commands.py : Lines 164–173

Description

log_fd is opened for writing and passed as the stdout of the background subprocess, but the parent process never closes it after Popen returns. The parent retains an open file descriptor for the duration of the step runner process. For suites with many scenarios and frequent restarts, this leaks one file descriptor per @start_bg call. On Linux the default RLIMIT_NOFILE is 1024; large test suites could exhaust this.

Code

log_fd = open(log_path, "w", encoding="utf-8")   # opened, never closed in parent
proc = subprocess.Popen(
    command,
    shell=True,
    cwd=workdir,
    stdout=log_fd,
    stderr=subprocess.STDOUT,
    start_new_session=True,
)
pid_file.write_text(str(proc.pid))
return (0, f"started tag={tag} pid={proc.pid} log={log_path}", "")
# log_fd is never closed here

Recommendation

with open(log_path, "w", encoding="utf-8") as log_fd:
    proc = subprocess.Popen(
        command,
        shell=True,
        cwd=workdir,
        stdout=log_fd,
        stderr=subprocess.STDOUT,
        start_new_session=True,
    )
# After 'with' exits, the parent's copy of the fd is closed.
# The child inherits its own copy and keeps writing.
pid_file.write_text(str(proc.pid))
return (0, f"started tag={tag} pid={proc.pid} log={log_path}", "")

[MJ-02] Module-Level Monkey-Patch Applied at Import Time in Resource Server

Property Value
Severity Major
Category Correctness / Quality
File e2e/servers/resource_server.py : Lines 35–53

Description

_install_noop_tx_verifier_if_requested() is called unconditionally at module-import time (line 53), before build_app(). It replaces tx_verification.get_verifier_for_network with a function that always raises ValueError, globally disabling on-chain transaction verification for the entire Python process whenever E2E_SKIP_TX_VERIFICATION=1 is set.

Two concrete problems:

  1. Silent global mutation: any other code co-located in the same uvicorn worker (including future middleware) that calls get_verifier_for_network will also see the monkey-patched version, even if it wasn't intended to skip verification.
  2. Import-time side effects: if this module is ever imported in a test context (unit tests, introspection tools) with the env var set, verification is silently disabled for the entire test run.

app = build_app() on line 140 also runs all mechanism registrations and environment variable reads at import time — any misconfiguration raises an exception that surfaces as an import error rather than a startup error, which is harder to diagnose.

Code

# line 53 — runs unconditionally at import time
_install_noop_tx_verifier_if_requested()

# line 140 — also runs at import time
app = build_app()

Recommendation

Move the monkey-patch inside build_app() (or inside main()), and gate it explicitly:

def build_app() -> FastAPI:
    # Apply e2e overrides only when explicitly requested, and only during app construction
    if os.environ.get("E2E_SKIP_TX_VERIFICATION") == "1":
        _install_noop_tx_verifier_if_requested()
    ...

The app = build_app() module-level call is unavoidable for uvicorn's "module:app" string import — that pattern is standard — but the monkey-patch should be deferred to inside build_app() rather than before it.


[MJ-03] Unhandled Exceptions in _env_base_fee() Crash the Facilitator at Startup

Property Value
Severity Major
Category Correctness
File examples/facilitator/server.py : Lines 69–77

Description

_env_base_fee() parses FACILITATOR_BASE_FEE with json.loads() and casts values via int(). Neither call is guarded:

  • A malformed JSON string raises json.JSONDecodeError.
  • A non-integer value raises ValueError.

Both propagate through _register_evm() and _register_tron() into build_app(), which runs at module import time. The crash surfaces as an ImportError / traceback from uvicorn with no actionable error message, making misconfiguration very hard to diagnose in production.

Code

def _env_base_fee() -> dict[str, int] | None:
    raw = os.environ.get("FACILITATOR_BASE_FEE")
    if not raw:
        return None
    import json
    parsed = json.loads(raw)                        # unguarded — JSONDecodeError
    return {str(k).upper(): int(v) for k, v in parsed.items()}  # unguarded — ValueError

Recommendation

def _env_base_fee() -> dict[str, int] | None:
    raw = os.environ.get("FACILITATOR_BASE_FEE")
    if not raw:
        return None
    import json
    try:
        parsed = json.loads(raw)
        return {str(k).upper(): int(v) for k, v in parsed.items()}
    except (json.JSONDecodeError, ValueError, TypeError) as exc:
        print(
            f"error: FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}",
            file=sys.stderr,
        )
        sys.exit(2)

Minor

[MN-01] PostToolUse Hook Mode Names Documented Incorrectly in e2e/CLAUDE.md

Property Value
Severity Minor
Category Documentation
File e2e/CLAUDE.md : Gotchas section; e2e/mock_facilitator/__main__.py : Lines 20–28

Description

e2e/CLAUDE.md documents the mock facilitator modes as:

--mode {success, invalid_sig, deadline_expired, network_mismatch, insufficient_balance, settle_reverted}

The actual CLI choices in __main__.py and _VALID_MODES in app.py are:

  • fail_verify_invalid_sig
  • fail_verify_expired
  • fail_verify_network_mismatch
  • fail_settle_insufficient
  • fail_settle_revert

Every name in the documentation is wrong. A developer following the docs and passing --mode invalid_sig will get an argparse error. The mismatch will confuse anyone adding a new scenario.

Recommendation

Update e2e/CLAUDE.md to reflect the actual mode values, or adopt the shorter documented names as the actual CLI choices (which are more readable). The latter is preferable.


[MN-02] _resolve_mode Raises ValueError Inside FastAPI Handler — Unhandled 500

Property Value
Severity Minor
Category Correctness
File e2e/mock_facilitator/app.py : Lines 72–77

Description

When an invalid ?mode= query parameter is passed to /verify or /settle, _resolve_mode raises ValueError. FastAPI converts this to an HTTP 500 with a stack trace, rather than a 422 or 400 validation error. For a test tool used interactively via curl or a scenario, this makes debugging harder.

Code

def _resolve_mode(mode_query: str | None) -> str:
    if mode_query:
        if mode_query not in _VALID_MODES:
            raise ValueError(f"invalid mode: {mode_query}")  # becomes HTTP 500

Recommendation

from fastapi import HTTPException

def _resolve_mode(mode_query: str | None) -> str:
    if mode_query:
        if mode_query not in _VALID_MODES:
            raise HTTPException(
                status_code=422,
                detail=f"invalid mode '{mode_query}'. valid: {sorted(_VALID_MODES)}",
            )
    return _default_mode()

[MN-03] Latent Bug: exact Scheme Registered with EVM Mechanism for Any Network Prefix

Property Value
Severity Minor
Category Correctness
File e2e/clients/test_client.py : Lines 157–162

Description

The _register_client_mechanisms function registers ExactEvmClientMechanism for the exact scheme regardless of the network prefix:

if scheme == "exact":
    from bankofai.x402.mechanisms.evm.exact import ExactEvmClientMechanism
    x402.register(network, ExactEvmClientMechanism(signer))

If a future scenario sets E2E_NETWORK=tron:nile and E2E_SCHEMES=exact, the EVM mechanism will be silently registered for a TRON network. The payment attempt will fail at signing with an opaque error. The exact_permit branch below it correctly checks the network prefix (startswith("eip155:") vs startswith("tron:")). The exact branch should mirror that guard.

Recommendation

Add a network guard consistent with the exact_permit pattern:

if scheme == "exact":
    if not network.startswith("eip155:"):
        raise SystemExit(f"scheme 'exact' (ERC-3009) requires eip155:* network, got {network}")
    from bankofai.x402.mechanisms.evm.exact import ExactEvmClientMechanism
    x402.register(network, ExactEvmClientMechanism(signer))

[MN-04] _register_client_mechanisms Type Annotation Accepts EVM-Only Signer for TRON Paths

Property Value
Severity Minor
Category Code Quality
File e2e/clients/test_client.py : Lines 154–155

Description

The function signature types signer as EvmClientSigner:

def _register_client_mechanisms(
    x402: X402Client, network: str, schemes: list[str], signer: EvmClientSigner
) -> None:

But when network.startswith("tron:"), run() passes an _OfflineTronSigner (subclass of TronClientSigner), not EvmClientSigner. The type annotation is incorrect and will produce a mypy error in strict mode, contradicting the Python conventions rule that the module must stay mypy-green.

Recommendation

Use a union or a common protocol type:

from bankofai.x402.signers.client import EvmClientSigner, TronClientSigner
from typing import Union

def _register_client_mechanisms(
    x402: X402Client,
    network: str,
    schemes: list[str],
    signer: Union[EvmClientSigner, TronClientSigner],
) -> None:

[MN-05] stop_bg TOCTOU Race on os.getpgid / os.killpg

Property Value
Severity Minor
Category Correctness
File integration/commands.py : Lines 229–234

Description

There is a race between reading the PID from the pid file, calling os.getpgid(pid), and calling os.killpg(...). If the process exits and its PID is reused by an unrelated process between these two calls, os.getpgid may succeed and os.killpg may signal the wrong process group. The ProcessLookupError guard catches the case where the PID is gone entirely, but not the reuse case.

This is an inherent limitation of PID-based process management. In a local e2e harness with short-lived services it is unlikely to trigger, but worth acknowledging.

Code

try:
    os.killpg(os.getpgid(pid), signal.SIGTERM)   # TOCTOU: pid may be reused
except ProcessLookupError:
    ...

Recommendation

Since start_bg uses start_new_session=True, the subprocess is its own session leader and its PGID equals its PID. Replace os.getpgid(pid) with pid directly to avoid the extra syscall:

try:
    os.killpg(pid, signal.SIGTERM)   # PGID == PID for session leaders
except ProcessLookupError:
    ...

This does not eliminate the TOCTOU window but removes an unnecessary syscall and makes the code's assumption explicit.


Suggestions

[S-01] Strengthen TRON_TEST_KEY Warning Comment

File: e2e/clients/test_client.py : Line 48

Description: The deterministic TRON test key 0x111...1 is all-ones — trivially derivable. While the existing comment ("Not secret, just a fixed 32-byte hex so scenarios are reproducible") is correct, a developer who copies this value into their .env for testnet testing could accidentally use it on mainnet. The Anvil key (ANVIL_KEY_0) is similarly well-known but has strong cultural context as "test-only". Add a more explicit warning.

Suggestion:

# Deterministic test TRON key — all-ones, trivially guessable.
# SAFE FOR LOCAL/MOCK E2E ONLY. Never use on mainnet or any real-value network.
TRON_TEST_KEY = "0x1111111111111111111111111111111111111111111111111111111111111111"

[S-02] run_all.sh Does Not Surface Partial Service Leak on Outer Script Crash

File: e2e/scenarios/run_all.sh

Description: The outer loop collects failures without aborting early, which is correct. However, if the script itself is interrupted (Ctrl+C, SIGKILL from CI timeout), background services started by an in-progress scenario may be left running. Each scenario ends with @stop_all_bg, but that step is only reached if the scenario runner itself completes. An EXIT trap on the outer script would ensure cleanup:

Suggestion:

trap 'python3 -m integration.run --config /dev/null 2>/dev/null; pkill -f "e2e.mock_facilitator" 2>/dev/null || true' EXIT

Or more simply, rely on the integration runner's @stop_all_bg already being present in each scenario, and document that users should run @stop_all_bg manually if the suite is killed mid-run.


Positive Observations

Area Observation
E2E Architecture JSON-only declarative scenarios with a generic step runner is the right abstraction. Zero custom Python per scenario keeps the harness maintainable and addition of new scenarios trivial.
Tier Separation Mock tier (offline, fast, always-CI) vs. testnet tier (real-chain, nightly, credential-gated) is exactly correct. The skip-on-missing-env pattern in run_testnet.sh prevents CI noise on forks.
Mode-Driven Mock Single mock facilitator supporting all failure modes via MOCK_FACILITATOR_MODE / ?mode= query param avoids proliferating bespoke mock endpoints. The _VALID_MODES set provides a clean invariant.
Secret Hygiene .env.example with blank values, .env in .gitignore, CI wired to GitHub Secrets — no credentials in the diff.
asyncRewake Hook Using asyncRewake: true in settings.json so Claude Code wakes when tests fail is a thoughtful productivity improvement over a blocking hook.
@builtin Registry The @builtin decorator pattern in commands.py is clean — registration, dispatch, and self-documenting --list-builtins are all driven by a single decorator call.
GasFree Mock Colocation Serving the GasFree API from the same mock facilitator process (on /api/v1/address/{user} and /api/v1/config/provider/all) removes the need for a third mock process per scenario. Elegant.
Fail-Fast with Full Reporting integration/run.py correctly stops at the first failed step (fail-fast) while still writing a full Markdown report, and the outer run_all.sh collects all scenario failures before reporting. Both behaviors are appropriate for their layers.
_coerce_response_payload The recursive coercion function in test_client.py ensures the output JSON is always serializable without json.JSONEncoder subclassing — simple and effective.
Rules Knowledge Base The .claude/rules/ tree is well-structured: each rule file has a clear scope trigger, cites the incident that motivated it, and links back to docs/solutions.md and specs/. The authoring guidelines (rules/CLAUDE.md) make the system self-maintaining.

Checklist Results

Category Items Checked Pass Fail N/A Notes
Correctness 8 4 4 0 C-01 (hook path), MJ-01 (fd leak), MJ-02 (monkey-patch), MN-03 (latent exact/TRON)
Security 6 6 0 0 No secrets in diff; no injection vectors; .env.example only
Performance 5 5 0 0 No N+1, no unbounded ops in changed code
Code Quality 7 5 2 0 MN-04 (type annotation), MJ-02 (import-time side effect)
Testing 5 5 0 0 8 mock + 3 testnet scenarios covering happy + unhappy paths
Documentation 4 3 1 0 MN-01 (mode name mismatch in CLAUDE.md)
Compatibility 3 3 0 0 Specs rename is non-breaking; no wire-format changes
Observability 4 4 0 0 Request logging in mock facilitator; log artifact upload in CI

Disclaimer

This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between main and feat/x402-ai. Runtime behavior, integration testing, and deployment impact are not covered. No SDK source code (under python/x402/src/ or typescript/packages/) was changed in this PR; all findings pertain exclusively to the new tooling and harness code added by this PR.


Report generated by Code Review Skill v1.0.0
Date: 2026-04-24

Resolves all blocking + major + minor findings from the code review on
feat/x402-ai; no SDK source changes, harness + hook + examples only.

Critical:
- C-01 hook hardcoded developer path → derive REPO_ROOT from
  ${BASH_SOURCE[0]}; makes .claude/hooks/run-tests-for-change.sh
  portable across machines and CI.

Major:
- MJ-01 @start_bg fd leak → wrap log_fd in `with open(...)`; parent
  closes its copy after Popen, child keeps writing via its inherited fd.
- MJ-02 resource_server module-level monkey-patch → move
  _install_noop_tx_verifier_if_requested() call into build_app() so the
  side-effect is scoped to app construction, not import time.
- MJ-03 unguarded json.loads in _env_base_fee() → catch
  JSONDecodeError/ValueError/TypeError/AttributeError and exit 2 with a
  clear stderr message; previously a bad FACILITATOR_BASE_FEE string
  would crash the facilitator at import time with no actionable error.

Minor:
- MN-01 stale mode names in e2e/CLAUDE.md and
  .claude/rules/testing/scenarios.md → updated to authoritative
  _VALID_MODES set (fail_verify_*, fail_settle_*).
- MN-02 _resolve_mode raised ValueError → now raises FastAPI
  HTTPException(422) with valid-mode list in detail.
- MN-03 `exact` scheme in test_client skipped network-prefix guard that
  `exact_permit` had → add explicit eip155:* check so a stray tron:*
  scenario fails loud instead of signing with the wrong mechanism.
- MN-04 _register_client_mechanisms signer type annotation → widen to
  `EvmClientSigner | TronClientSigner`; hoist TronClientSigner to
  module-level import.
- MN-05 stop_bg TOCTOU on os.getpgid(pid) → use pid directly; start_bg
  sets start_new_session=True so pgid == pid for our children. Removes
  a syscall and makes the invariant explicit.

Suggestion:
- S-01 TRON_TEST_KEY warning strengthened; now explicitly flags
  "mainnet-unsafe, address publicly known, funds will be swept".

Not addressed:
- S-02 (outer run_all.sh EXIT trap) — each scenario already calls
  @stop_all_bg; adding an outer-script trap creates CI-timeout edge
  cases without clear benefit. Documented limitation acceptable.

Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios still green.
Hook portable-path smoke: running against /Users/.../x402/README.md now
correctly resolves REL=README.md and skips the docs path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@boboliu-1010
Copy link
Copy Markdown
Collaborator Author

Addressed all findings from the review in c4a547c (just pushed).

Finding Status Fix
C-01 hook hardcoded path Derive REPO_ROOT from ${BASH_SOURCE[0]}; portable across machines/CI
MJ-01 fd leak in @start_bg with open(log_path, ...) — parent closes its copy after Popen
MJ-02 import-time monkey-patch Moved _install_noop_tx_verifier_if_requested() call into build_app()
MJ-03 unguarded json.loads in _env_base_fee Catch JSONDecodeError/ValueError/TypeError/AttributeError, stderr + exit 2
MN-01 stale mode names in docs Updated e2e/CLAUDE.md + .claude/rules/testing/scenarios.md to _VALID_MODES
MN-02 _resolve_mode → HTTP 500 HTTPException(422, detail="...valid: [...]")
MN-03 exact missing network guard Added eip155:* check mirroring the exact_permit branch
MN-04 signer type annotation `EvmClientSigner
MN-05 stop_bg TOCTOU os.getpgid Use pid directly (pgid == pid for session leaders)
S-01 TRON test key warning Strengthened: "mainnet-unsafe, publicly known, funds will be swept"
S-02 outer run_all.sh EXIT trap ⏭ deferred Each scenario already runs @stop_all_bg; outer trap adds CI-timeout edge cases without clear benefit. Documented as known limitation.

Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios still green. Hook portable-path smoke passes.

@github-actions
Copy link
Copy Markdown

Code Review Report

Project: x402 (BankofAI SDK)
PR: main -> feat/x402-ai
Review Date: 2026-04-24
Reviewer: AI Code Reviewer (Code Review Skill v1.0.0)


PR Overview

Branch Information

Property Value
From Branch main
To Branch feat/x402-ai
Commits 12
Files Changed 79
Lines Added +4,604
Lines Removed -0

Commit History

Hash Message
c4a547c fix(review): address PR #69 code review findings
84d26e6 docs(gasfree): record testnet win + two new gotchas
ea04ab2 feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
f4c0c34 refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
3130e21 docs(e2e): record real-chain evidence + testnet gotchas
8045ba1 fix(e2e): make testnet scenarios runnable
45be4ca chore(ai): gitignore .DS_Store and .claude/hooks/logs/
e0136ec feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
1983edf feat(e2e): add testnet tier with real chain settlement
3509b70 feat(e2e): expand harness to 8 scenarios across EVM + TRON
900e5bd feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
aec6836 docs(design): draft AI transformation plan

Review Summary

Verdict

Verdict: Request Changes

Findings at a Glance

Critical Major Minor Suggestion
Count 0 2 5 4

Summary

This PR establishes the AI-native developer infrastructure for the x402 repo. The additions are exclusively additive (zero lines deleted from main): AI scaffolding (.claude/ rules, agents, hooks, settings), a full e2e test harness (e2e/, integration/), a reference facilitator implementation (examples/facilitator/), two CI workflows (mock + nightly testnet), and supporting documentation and configuration.

The overall structure is well-thought-out. The integration runner is clean, the declarative scenario format is correctly implemented, the mock facilitator faithfully implements the x402 wire contract, and the shell scripts use appropriate set -euo pipefail hygiene. Process group management in the background service lifecycle (start_bg/stop_bg) correctly uses start_new_session=True and os.killpg, avoiding orphan-child issues.

Two issues require changes before merge. The reference facilitator (examples/facilitator/server.py) silently drops the request context on malformed /verify and /settle bodies (KeyError → 500), while its own /fee/quote handler uses defensive .get(). Since this file is a primary adoption reference, the inconsistency teaches a bad pattern. Additionally, the CI e2e workflow installs uvicorn, httpx, and fastapi via a bare uv pip install outside the project's dependency manifest — these are test-infrastructure dependencies that should be declared to keep the lockfile authoritative.


Change Summary

1. AI Scaffolding (.claude/)

File Change Type Description
.claude/agents/code-reviewer.md Added x402-flavored code review agent with wire-format hotspots
.claude/agents/security-reviewer.md Added Payment-path / signing review agent
.claude/agents/scheme-author.md Added Scheme spec + rule authoring agent
.claude/commands/x402/compound.md Added /x402:compound wizard for docs/solutions.md
.claude/commands/x402/create-scenario.md Added /x402:create-scenario wizard
.claude/hooks/run-tests-for-change.sh Added PostToolUse hook: run scoped test suite after file edit
.claude/rules/** Added Language, network, scheme, and testing rule sheets
.claude/settings.json Added Hook wiring for PostToolUse

Purpose: Provides Claude Code with x402-domain knowledge, behavioral guardrails, and automated test feedback on every code edit.


2. E2E Test Harness (e2e/ + integration/)

File Change Type Description
integration/run.py Added Generic JSON-driven step runner with Markdown report output
integration/commands.py Added @builtin registry: @start_bg, @stop_bg, @stop_all_bg, @wait_http, @json_diff, @json_match
e2e/mock_facilitator/app.py Added FastAPI mock facilitator with 6 injectable modes + GasFree API stubs
e2e/mock_facilitator/__main__.py Added CLI entry (python -m e2e.mock_facilitator)
e2e/clients/test_client.py Added Real-signature EVM + TRON client with offline signer variants
e2e/servers/resource_server.py Added @x402_protected FastAPI resource server configurable via E2E_* env vars
e2e/scenarios/*/config.json Added 8 mock scenarios (3 happy path + 5 unhappy) + 3 testnet scenarios
e2e/scenarios/run_all.sh Added Mock suite entry point (CI-triggered per PR)
e2e/scenarios/run_testnet.sh Added Testnet suite entry point with per-scenario skip logic

Purpose: End-to-end verification of the full EVM + TRON payment flow without (mock tier) or with (testnet tier) real on-chain settlement.


3. Reference Facilitator (examples/facilitator/)

File Change Type Description
examples/facilitator/server.py Added FastAPI binding over X402Facilitator with env-var-driven EVM + TRON registration
examples/facilitator/__main__.py Added python -m examples.facilitator entry

Purpose: Canonical facilitator deployment pattern used by testnet e2e tier; also serves as adoption reference for SDK users.


4. CI Workflows

File Change Type Description
.github/workflows/check_e2e.yml Added PR-gated mock e2e suite (no secrets required)
.github/workflows/nightly_testnet.yml Added Nightly testnet run, gated to BofAI org, uses GitHub Secrets

5. Documentation and Configuration

File Change Type Description
CLAUDE.md Added Repo-wide AI entrypoint with component map and safety rules
{python,typescript,e2e,specs,...}/CLAUDE.md Added Component-level AI context (build commands, idioms, non-goals)
docs/design/ai-transformation.md Added P1–P4 roadmap for Claude-Code-native adoption
docs/solutions.md Added Hard-won debugging knowledge base (3 entries: TRON hex, deadline bounds, balance source)
specs/ Renamed Spec files moved from docs/specs/ — no content change
.env.example Added Template for testnet credentials (git-ignored)
.gitignore Modified Added .DS_Store and .claude/hooks/logs/

Detailed Findings


Major

[MJ-01] Missing input validation on /verify and /settle in the reference facilitator

Property Value
Severity Major
Category Correctness / Code Quality
File examples/facilitator/server.py : Lines 293–303

Description

The /verify and /settle endpoint handlers access body["paymentPayload"] and body["paymentRequirements"] with direct key subscripting. If a caller sends a body that omits either key (malformed client, misconfigured test harness, network truncation), Python raises KeyError, which FastAPI translates to an unhandled HTTP 500 — not the semantically correct 422/400. This is inconsistent with the same file's /fee/quote handler, which uses body.get("accepts", []) defensively.

Since examples/facilitator/server.py is the primary adoption reference for SDK integrators building their own facilitators, this pattern will be copied into production deployments.

Code

# examples/facilitator/server.py — lines 293-303
@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])          # KeyError → 500
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError → 500
    resp = await facilitator.verify(payload, requirements)
    return JSONResponse(resp.model_dump(by_alias=True))

@app.post("/settle")
async def _settle(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])          # KeyError → 500
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError → 500
    resp = await facilitator.settle(payload, requirements)
    return JSONResponse(resp.model_dump(by_alias=True))

Recommendation

from fastapi import HTTPException

@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
    try:
        payload = PaymentPayload(**body["paymentPayload"])
        requirements = PaymentRequirements(**body["paymentRequirements"])
    except KeyError as exc:
        raise HTTPException(status_code=422, detail=f"missing field: {exc}") from exc
    resp = await facilitator.verify(payload, requirements)
    return JSONResponse(resp.model_dump(by_alias=True))

# Same pattern for _settle

[MJ-02] E2E dependency installation bypasses the project lockfile

Property Value
Severity Major
Category Correctness / Compatibility
File .github/workflows/check_e2e.yml : Lines 22–24

Description

uvicorn, httpx, and fastapi are installed via a bare uv pip install after uv sync, rather than being declared in the project's pyproject.toml dependency groups. This means:

  1. Their versions are not pinned in uv.lock — a new release of any of these packages could break CI silently on the next run, with no diff traceable to a PR.
  2. The workflow's caching step (cache-dependency-glob: "python/x402/uv.lock") does not cover these packages, so a stale cache may hold the wrong versions.
  3. Developers running uv sync --group dev locally do not get these packages, causing local/CI divergence when running e2e/scenarios/run_all.sh.

Code

# .github/workflows/check_e2e.yml — lines 22-24
- name: Install SDK in editable mode + e2e deps
  working-directory: ./python/x402
  run: |
    uv sync --all-extras --dev
    uv pip install uvicorn httpx fastapi   # <-- outside the lockfile

Recommendation

Add an e2e optional dependency group to python/x402/pyproject.toml:

[project.optional-dependencies]
e2e = ["uvicorn>=0.29", "httpx>=0.27", "fastapi>=0.111"]

Then replace the workflow step with:

run: uv sync --extra e2e --dev

This ensures versions are pinned in uv.lock, the cache is valid, and local developer environments match CI.


Minor

[MN-01] ** glob in bash case patterns is misleading

Property Value
Severity Minor
Category Code Quality
File .claude/hooks/run-tests-for-change.sh : Lines 57–62

Description

The hook dispatches to the right test suite using case "$REL" in patterns. It includes both *.py and **/*.py variants for each directory:

case "$REL" in
  python/x402/src/*.py | python/x402/tests/*.py | \
  python/x402/src/**/*.py | python/x402/tests/**/*.py)

In bash case statements, * already matches any string including / (pattern matching semantics, not pathname expansion). So python/x402/src/*.py already matches python/x402/src/mechanisms/evm/exact.py. The **/*.py patterns are therefore redundant and may mislead future maintainers into believing bash case needs ** for recursive matching. While the script is functionally correct, the extra patterns add noise.

Recommendation

Remove the redundant ** variants to avoid confusion:

case "$REL" in
  python/x402/src/*.py | python/x402/tests/*.py)
    ...
  typescript/packages/*/src/*.ts | typescript/packages/*/src/*.tsx)
    ...

[MN-02] Mock GasFree provider and user address are identical

Property Value
Severity Minor
Category Testing
File e2e/mock_facilitator/app.py : Lines 128–132

Description

MOCK_GASFREE_PROVIDER and MOCK_GASFREE_ADDRESS are set to the same TRON address. Production GasFree flows have distinct provider (relayer) and user custodial wallet addresses. Using identical addresses in the mock means that any code path in the SDK that validates provider != gasFreeAddress is never exercised.

Code

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"
MOCK_GASFREE_ADDRESS  = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"  # same address

Recommendation

Use two distinct valid TRON addresses:

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"   # relayer / service provider
MOCK_GASFREE_ADDRESS  = "TDqe43YvBVq6DQ96DgNJ9sXSPLJHr2Hk2d"   # user's custodial wallet

[MN-03] Module-level monkey-patch in resource server has process-wide scope

Property Value
Severity Minor
Category Code Quality
File e2e/servers/resource_server.py : Lines 43–50

Description

_install_noop_tx_verifier_if_requested() patches tx_verification.get_verifier_for_network at the module level (called unconditionally during build_app()). The comment acknowledges the side effect but describes it as "scoped to this entry point." In practice, it mutates a module-level function reference for the entire Python process lifetime — and in uvicorn's reload mode, build_app() may be called multiple times, re-applying the patch.

This is acceptable for a test fixture, but the scoping claim in the comment is incorrect and could mislead contributors who import this module in other contexts.

Code

# resource_server.py lines 43-50
def _install_noop_tx_verifier_if_requested() -> None:
    if os.environ.get("E2E_SKIP_TX_VERIFICATION") != "1":
        return
    from bankofai.x402.utils import tx_verification

    def _always_raise(network: str, rpc_url: str | None = None):
        raise ValueError(f"e2e: tx verification disabled for {network}")

    tx_verification.get_verifier_for_network = _always_raise  # process-wide mutation

Recommendation

Update the comment to accurately describe the scope:

# Replaces the module-level `get_verifier_for_network` function for the entire process.
# This is intentional for e2e scenarios: the mock facilitator's deterministic tx hash
# is not a real on-chain transaction, so verification must be skipped globally.
# NEVER import this module in production code.

[MN-04] Parallel arrays in run_testnet.sh lack a length guard

Property Value
Severity Minor
Category Code Quality
File e2e/scenarios/run_testnet.sh : Lines 20–28

Description

The testnet runner uses two parallel bash arrays — names[] and required[] — indexed together. If a contributor adds a scenario to names[] without adding its required-vars entry to required[] (or vice versa), the wrong required-var list is silently applied to a scenario, potentially running a testnet scenario without credentials and generating a false "PASS" or a misleading error.

Code

declare -a names=(
  "exact_testnet"
  "exact_permit_testnet"
  "exact_gasfree_testnet"
)
declare -a required=(
  "BSC_TESTNET_FACILITATOR_KEY BSC_TESTNET_CLIENT_KEY BSC_TESTNET_PAY_TO"
  "BSC_TESTNET_FACILITATOR_KEY BSC_TESTNET_CLIENT_KEY BSC_TESTNET_PAY_TO"
  "TRON_NILE_FACILITATOR_KEY TRON_NILE_CLIENT_KEY TRON_NILE_PAY_TO GASFREE_NILE_API_URL"
)
# No assertion that ${#names[@]} == ${#required[@]}

Recommendation

Add a guard immediately after the array declarations:

if [[ ${#names[@]} -ne ${#required[@]} ]]; then
  echo "bug: names[] and required[] arrays are out of sync" >&2
  exit 2
fi

[MN-05] print() used for startup error in example facilitator (violates library convention)

Property Value
Severity Minor
Category Code Quality
File examples/facilitator/server.py : Lines 81–84

Description

_env_base_fee() uses print(..., file=sys.stderr) to report a parse error and then calls sys.exit(2). The Python conventions in .claude/rules/python/conventions.md state "No print in library code." While examples/facilitator/server.py is application code (not library code), _env_base_fee() is called during build_app(), which is also invoked at module import time (line 309: app = build_app()). Any downstream import of this module (e.g., for testing or inspection) would trigger sys.exit(2) on a bad env var, making it difficult to unit-test build_app() in isolation.

Code

# examples/facilitator/server.py lines 79-84
    except (json.JSONDecodeError, ValueError, TypeError, AttributeError) as exc:
        print(
            f"error: FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}",
            file=sys.stderr,
        )
        sys.exit(2)

Recommendation

Raise a ValueError from _env_base_fee() and let main() (or build_app()) catch it and call sys.exit(2):

    except (json.JSONDecodeError, ValueError, TypeError, AttributeError) as exc:
        raise ValueError(
            f"FACILITATOR_BASE_FEE is not valid JSON or contains non-integer values: {exc}"
        ) from exc

Then in build_app() (or main()):

try:
    base_fee = _env_base_fee()
except ValueError as exc:
    logging.getLogger("x402.example_facilitator").error("%s", exc)
    sys.exit(2)

Suggestions

[S-01] Add trap-based cleanup to run_all.sh

File: e2e/scenarios/run_all.sh

Description: When a scenario fails partway (e.g., the client step), integration/run.py's fail-fast exits before the final @stop_all_bg step runs. The next scenario's @start_bg does kill the previous service via the pid file, but if the two ports bind at exactly the same time during the handoff window, the new service fails to start. Adding a trap guarantees cleanup on any exit path.

Suggestion:

# At the top of run_all.sh, after ROOT_DIR is defined:
trap 'python3 -m integration.run --config /dev/stdin <<< '"'"'{"steps":[{"command":"@stop_all_bg","name":"trap cleanup"}]}'"'"'' EXIT

Or, simpler — call @stop_all_bg directly:

cleanup() { python3 -c "from integration.commands import stop_all_bg; stop_all_bg([], '.')"; }
trap cleanup EXIT

[S-02] Use distinct mock nonce value in GasFree address response

File: e2e/mock_facilitator/app.py : Line 202

Description: The mock GasFree /api/v1/address/{user} response returns "nonce": 0. A nonce of 0 is valid for the first ever GasFree transaction but could mask bugs in replay-protection logic that compares nonces. Using a non-zero nonce (e.g., 42) would ensure that nonce-handling code paths are exercised in mock scenarios.

Suggestion: Change "nonce": 0 to "nonce": 42 in the mock response.


[S-03] Validate url scheme in wait_http builtin

File: integration/commands.py : Lines 196–213

Description: wait_http passes the url argument directly to urllib.request.urlopen with no scheme validation. Passing a file:// URL or a non-HTTP scheme would silently open a local file (for read) or raise an obscure error. Since scenario configs are version-controlled and the risk is low, this is purely a defensive suggestion.

Suggestion:

from urllib.parse import urlparse

def wait_http(args, workdir):
    ...
    parsed = urlparse(url)
    if parsed.scheme not in ("http", "https"):
        return (2, "", f"@wait_http: url must be http(s), got scheme: {parsed.scheme!r}")
    ...

[S-04] Surface GasFree API mock mode in supported response

File: e2e/mock_facilitator/app.py : Lines 138–141

Description: The /supported endpoint returns a hard-coded list of DEFAULT_SUPPORTED_KINDS regardless of the running mode. In a fail_verify_* mode, a client seeing exact_gasfree listed as supported may be surprised when the subsequent /verify returns invalid. Emitting the current mode as an X-Mock-Mode response header (or a meta field in the response) would make debugging mock-tier failures faster.

Suggestion:

@app.get("/supported")
async def supported() -> dict[str, Any]:
    _log({"endpoint": "supported"})
    return {
        "kinds": DEFAULT_SUPPORTED_KINDS,
        "_mockMode": _default_mode(),   # debug aid, ignored by real clients
    }

Positive Observations

Area Observation
Process lifecycle start_bg uses start_new_session=True + os.killpg for correct process-group cleanup, avoiding orphaned uvicorn worker processes.
Fail-fast + recovery start_bg proactively kills the previous process with the same tag before starting a new one, giving scenarios graceful recovery from a prior run's leftover services.
No real keys in fixtures ANVIL_KEY_0 and TRON_TEST_KEY are both well-known, widely-published test vectors. The TRON key carries a prominent warning comment. All real credentials flow through env vars, and .env is .gitignored.
CI fork protection nightly_testnet.yml guards with if: github.repository_owner == 'BofAI', correctly preventing fork PRs from accessing secrets.
Testnet self-skip logic run_testnet.sh skips scenarios whose required env vars are blank (rather than failing them). This allows running the full script locally with a partial .env without false CI failures.
Wire format fidelity Mock facilitator /verify and /settle responses match the x402 v2 wire contract (isValid, invalidReason, success, errorReason, transaction, network) exactly as defined in the protocol spec.
Declarative scenarios All 8 mock and 3 testnet scenarios are pure JSON — no bespoke Python test logic. The @builtin system is the right abstraction for keeping test configs readable.
json_match for testnet Testnet scenarios correctly use @json_match (partial field matching with regex for the tx hash) while mock scenarios use @json_diff (exact equality). The distinction is principled and correctly implemented.
Offline signer pattern _AllowanceSkippingEvmSigner and _OfflineTronSigner cleanly override only the chain-RPC methods, keeping the real signing path intact for mock tests. This correctly validates x402 payment mechanics without a live node.
FACILITATOR_BASE_FEE design Centralizing the base-fee configuration as a single JSON env var (instead of per-token env vars) is ergonomic and correctly propagated to both EVM and TRON mechanisms.

Checklist Results

Category Items Checked Pass Fail N/A Notes
Correctness 8 7 1 0 MJ-01: missing key validation in facilitator endpoints
Security 10 10 0 0 No secrets in code; proper fork/secret guard in CI; test keys clearly labeled
Performance 7 7 0 0 No N+1, no unbounded loops, no blocking-in-async
Code Quality 8 4 4 0 MN-01 through MN-05: redundant glob, identical mock addresses, module-scope mutation, parallel arrays, print in startup path
Testing 7 6 1 0 MJ-02: e2e deps not in lockfile; 8 mock + 3 testnet scenarios otherwise well-structured
Documentation 6 6 0 0 Every new public entry point has a docstring and CLAUDE.md
Compatibility 5 5 0 5 No existing APIs modified; all additions are backward-compatible
Observability 4 4 0 0 Request log in mock facilitator; structured Markdown reports from runner

Disclaimer

This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between the specified branches (main...feat/x402-ai). Runtime behavior, integration testing, and deployment impact are not covered. No changes to the SDK's core payment path (python/x402/src/, typescript/packages/) were present in this PR — review of payment-path logic should follow the security-reviewer agent protocol for any subsequent PR touching those paths.


Report generated by Code Review Skill v1.0.0
Date: 2026-04-24

The design doc was a draft v0 internal planning document with heavy
organization-specific context (BofAI, SUN.io, sun-protocol/apollo-arena,
internal branch names like feat/x402-ai-mcp/budget/matrix that never
materialized) and an "Open questions" section. From a community
contributor's perspective it read as either (a) required process to
follow — which it never was — or (b) out-of-date planning that kept
confusing the reader. P1 is already executed and demonstrable in the
.claude/ tree + CLAUDE.md + specs/; P2..P4 are aspirational and belong
in issues, not checked-in docs.

Also removes the now-empty docs/design/ directory.

Rewired 7 dangling references:
- CLAUDE.md root: drop the "see design doc" line; shorten the AI-native
  section to one sentence; drop the docs/design/ row from the table.
- docs/CLAUDE.md: drop the design/ row and the "Design docs go here" rule.
- python/x402/CLAUDE.md: drop "P2" breadcrumb on the mcp stub line.
- typescript/CLAUDE.md: drop "P2 in [design]" + "see design doc" on mcp
  and http rows respectively.
- integration/CLAUDE.md: rewrite the matrix-testing non-goal without the
  dead link.
- e2e/scenarios/exact_permit_happy_path/README.md: the "Facilitator binary
  with --mock" prereq now points to e2e/mock_facilitator/ that ships.
- .claude/agents/scheme-author.md: suggest an in-flight feature spec under
  specs/<NNN>-<slug>/ instead of a design doc.

Regression: 217 pytest + 51 vitest + 8 e2e mock scenarios all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Code Review Report

Project: x402 (BankofAI SDK)
PR: main -> feat/x402-ai
Review Date: 2026-04-24
Reviewer: AI Code Reviewer (Code Review Skill v1.0.0)


PR Overview

Branch Information

Property Value
From Branch main
To Branch feat/x402-ai
Commits 13
Files Changed 78
Lines Added +4339
Lines Removed -0

Commit History

Hash Message
8c8ab6f docs: delete ai-transformation design doc; clean dangling refs
c4a547c fix(review): address PR #69 code review findings
84d26e6 docs(gasfree): record testnet win + two new gotchas
ea04ab2 feat(facilitator): FACILITATOR_BASE_FEE env for permit / gasfree mechanisms
f4c0c34 refactor(specs): mirror upstream layout — collapse docs/specs/ into specs/
3130e21 docs(e2e): record real-chain evidence + testnet gotchas
8045ba1 fix(e2e): make testnet scenarios runnable
45be4ca chore(ai): gitignore .DS_Store and .claude/hooks/logs/
e0136ec feat(ai): complete P1 scaffolding — component CLAUDE.md + rules index + test hook
1983edf feat(e2e): add testnet tier with real chain settlement
3509b70 feat(e2e): expand harness to 8 scenarios across EVM + TRON
900e5bd feat(ai): P1 scaffolding — .claude tree, root CLAUDE.md, integration runner
aec6836 docs(design): draft AI transformation plan

Review Summary

Verdict

Verdict: Request Changes

Findings at a Glance

Critical Major Minor Suggestion
Count 0 2 5 4

Summary

This PR delivers a comprehensive AI-native development scaffold: the complete .claude/ rule and agent tree, a two-tier e2e test harness (8 mock scenarios + 3 testnet scenarios), a generic integration step runner, the reference facilitator server used for testnet validation, CI workflows, and an extensive docs/solutions.md knowledge base. The scope is infra/tooling only — no production payment-path code is changed, so no signing or settlement bugs are introduced.

The overall quality is high. The design is clean, well-structured, and thoughtfully documented. However, two Major issues require fixes before merge: the jq dependency in the PostToolUse hook silently disables all automated test runs when jq is absent, and the reference facilitator server has unhandled KeyErrors on its /verify and /settle endpoints that expose raw 500s on malformed input. Five Minor issues and four Suggestions round out the review.


Change Summary

1. AI-native scaffolding (.claude/)

File Change Type Description
.claude/agents/code-reviewer.md Added x402-flavored code reviewer agent definition
.claude/agents/security-reviewer.md Added Security reviewer with x402 signing checklist
.claude/agents/scheme-author.md Added Scheme authoring agent
.claude/commands/x402/compound.md Added /x402:compound slash-command wizard
.claude/commands/x402/create-scenario.md Added /x402:create-scenario slash-command wizard
.claude/hooks/run-tests-for-change.sh Added PostToolUse hook: dispatches test suite after edits
.claude/rules/** Added Rule sheets for common, schemes, networks, python, typescript, testing
.claude/settings.json Added Hook wiring (PostToolUse on Edit/Write/MultiEdit/NotebookEdit)

Purpose: Encode x402 conventions, domain knowledge, and code-review checklists as machine-readable rules so Claude can contribute safely without reading every file first.


2. E2E test harness (e2e/, integration/)

File Change Type Description
e2e/clients/test_client.py Added EVM + TRON wallet, 402→retry flow, JSON output
e2e/mock_facilitator/app.py Added FastAPI mock: facilitator + GasFree API (mode-driven)
e2e/mock_facilitator/__main__.py Added CLI entry for the mock facilitator
e2e/servers/resource_server.py Added FastAPI resource server using @x402_protected
e2e/scenarios/*/config.json Added 8 mock scenarios (3 happy path, 5 unhappy)
e2e/scenarios/testnet/*/config.json Added 3 testnet scenarios with real chain settlement
e2e/scenarios/run_all.sh Added Mock suite entry point (CI)
e2e/scenarios/run_testnet.sh Added Testnet suite entry point (nightly)
integration/run.py Added Generic sequential step runner with Markdown reporting
integration/commands.py Added @json_diff, @json_match, @start_bg, @stop_bg, @wait_http built-ins

Purpose: Provide a no-secrets, deterministic CI test suite for all three schemes across EVM and TRON, plus a nightly testnet tier with real on-chain settlement.


3. Reference facilitator (examples/facilitator/)

File Change Type Description
examples/facilitator/server.py Added FastAPI wrapper for X402Facilitator; EVM + TRON registration from env vars
examples/facilitator/__main__.py Added CLI entry

Purpose: Standalone reference implementation used by the testnet e2e tier and as documentation for integrators.


4. CI workflows (.github/workflows/)

File Change Type Description
.github/workflows/check_e2e.yml Added PR check: runs the 8 mock scenarios
.github/workflows/nightly_testnet.yml Added Daily: testnet scenarios with secrets; skipped on forks

Purpose: Wire the mock suite into PR CI and establish a nightly testnet gate.


5. Documentation and miscellaneous

File Change Type Description
docs/solutions.md Added 10 documented bugs with symptom/root-cause/fix/rule format
CLAUDE.md (root + components) Added Project orientation for contributors and AI agents
specs/ reorganization Renamed docs/specs/specs/ (upstream layout mirror)
.env.example Added Testnet credential template (no secrets)
.gitignore Modified Adds .DS_Store, .claude/hooks/logs/

Detailed Findings


Major

[MJ-01] jq absence silently disables the PostToolUse test hook

Property Value
Severity Major
Category Correctness / Testing
File .claude/hooks/run-tests-for-change.sh : Lines 29–38

Description

The hook extracts FILE_PATH by piping stdin through jq, with 2>/dev/null swallowing all errors:

FILE_PATH="$(printf '%s' "$INPUT" | jq -r '.tool_input.file_path // .tool_response.filePath // empty' 2>/dev/null)"

If jq is not installed on the developer's machine (or a CI runner), the subshell returns an empty string. The very next guard exits 0:

if [[ -z "$FILE_PATH" ]]; then
  exit 0
fi

The result is that every hook invocation silently succeeds without running any test. The developer or CI sees no error — tests are simply never executed. Because jq is not declared as a dependency anywhere in the repository, this creates a silent test-bypass risk that is hard to diagnose.

Recommendation

Add an explicit jq availability check at the top of the script, before any stdin reading:

if ! command -v jq >/dev/null 2>&1; then
  echo "run-tests-for-change: 'jq' not found — install jq to enable test autorun" >&2
  exit 0   # still exit 0 so Claude isn't blocked, but at least it's visible
fi

Also document jq as a requirement in the root CLAUDE.md setup section.


[MJ-02] Unhandled KeyError on malformed bodies in reference facilitator /verify and /settle

Property Value
Severity Major
Category Correctness / Robustness
File examples/facilitator/server.py : Lines 293–303

Description

The /verify and /settle endpoints access required body keys with plain dict subscript:

@app.post("/verify")
async def _verify(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])       # KeyError if missing
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError if missing

@app.post("/settle")
async def _settle(body: dict[str, Any]) -> JSONResponse:
    payload = PaymentPayload(**body["paymentPayload"])       # KeyError if missing
    requirements = PaymentRequirements(**body["paymentRequirements"])  # KeyError if missing

A missing key propagates as an unhandled exception through FastAPI, returning an opaque HTTP 500 instead of a descriptive 422. Because this file is explicitly labeled "reference implementation" in examples/CLAUDE.md, integrators may copy this pattern into production facilitators. The testnet e2e tier calls these endpoints under real-network conditions; any client sending a subtly malformed body during debugging would receive a 500 with no actionable error message.

Recommendation

Use .get() with a meaningful 422 response, or switch to a typed Pydantic request model so FastAPI validates automatically:

from pydantic import BaseModel

class VerifyRequest(BaseModel):
    paymentPayload: dict
    paymentRequirements: dict

@app.post("/verify")
async def _verify(body: VerifyRequest) -> JSONResponse:
    payload = PaymentPayload(**body.paymentPayload)
    requirements = PaymentRequirements(**body.paymentRequirements)
    ...

FastAPI then returns a 422 with field-level detail on missing/malformed keys at no extra cost.


Minor

[MN-01] eval used for test command execution in hook script

Property Value
Severity Minor
Category Code Quality / Security
File .claude/hooks/run-tests-for-change.sh : Line 84

Description

OUTPUT="$(cd "$TEST_CWD" && eval "$TEST_CMD" 2>&1)"

$TEST_CMD is currently populated only from hardcoded case-statement branches (e.g. "uv run pytest -x -q --tb=short"), so the immediate security risk is low. However, the TypeScript branch constructs $TEST_CWD from the user-controlled $PKG_REL (derived from $FILE_PATH), which comes from hook stdin. If $PKG_REL ever propagated into $TEST_CMD (e.g. via a future path-specific flag), the eval would be exploitable. More practically: eval makes the script harder to reason about and harder to test statically.

Recommendation

Replace with direct invocation or bash -c:

# Instead of eval "$TEST_CMD":
cd "$TEST_CWD" || exit 2
OUTPUT="$(bash -c "$TEST_CMD" 2>&1)"

Or, for fully safe invocation, restructure TEST_CMD as an array and use "${TEST_CMD_ARRAY[@]}".


[MN-02] MOCK_GASFREE_PROVIDER and MOCK_GASFREE_ADDRESS are the same address

Property Value
Severity Minor
Category Testing / Correctness
File e2e/mock_facilitator/app.py : Lines 128–132

Description

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"
MOCK_GASFREE_ADDRESS  = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"

In the real GasFree protocol these are semantically distinct: the provider (serviceProvider) is the relayer that submits on-chain transactions, while the GasFree address (gasFreeAddress) is the user's custodial wallet that holds the tokens. solutions.md entry #9 specifically documents how these two addresses are different levels of an address mapping chain.

Using the same value for both means the mock cannot catch bugs where client or facilitator code inadvertently swaps serviceProvider for the custodial wallet address (or vice versa). It also understates the distinction to developers reading the mock code for protocol guidance.

Recommendation

Use two distinct, clearly labeled addresses:

MOCK_GASFREE_PROVIDER = "TLCvf7MktLG7XkbJRyUwnvCeDnaEXYkcbC"   # relayer/service-provider
MOCK_GASFREE_ADDRESS  = "TEkxiTehnzSmSe2XqrBj4w32RUN966rdz8"    # user custodial wallet

Add a comment citing solutions.md #9 so future maintainers understand why they differ.


[MN-03] Module-level build_app() call with conditional side effects at import time

Property Value
Severity Minor
Category Correctness / Code Quality
File e2e/servers/resource_server.py : Lines 119, 141

Description

app = build_app() executes unconditionally at module import time. Inside build_app(), _install_noop_tx_verifier_if_requested() may monkey-patch a global function in bankofai.x402.utils.tx_verification:

def _install_noop_tx_verifier_if_requested() -> None:
    if os.environ.get("E2E_SKIP_TX_VERIFICATION") != "1":
        return
    from bankofai.x402.utils import tx_verification
    tx_verification.get_verifier_for_network = _always_raise  # module-level mutation

Any tool (test runner, IDE import analyzer, python -c "import e2e.servers.resource_server") that imports this module while E2E_SKIP_TX_VERIFICATION=1 is set will silently corrupt the tx_verification module for the rest of the Python process. Because both resource_server.py and examples/facilitator/server.py execute their build_app() at import time, the patch applies before the caller can prevent it.

Recommendation

Guard the module-level app initialization behind if __name__ == "__main__": for the monkey-patch case, or restructure so app is created lazily (e.g., via an application factory pattern where the caller invokes build_app() explicitly). At minimum, add a module docstring note warning that importing this module with E2E_SKIP_TX_VERIFICATION=1 set will mutate a shared utility module.


[MN-04] run_all.sh does not continue running remaining scenarios after a mid-run failure

Property Value
Severity Minor
Category Testing
File e2e/scenarios/run_all.sh : Lines 21–28

Description

for s in "${scenarios[@]}"; do
  cfg="e2e/scenarios/$s/config.json"
  if ! python3 -m integration.run --config "$cfg"; then
    failed+=("$s")
  fi
done

Each individual integration/run.py invocation is fail-fast (stops at the first failing step). If a scenario fails before its @stop_all_bg step (e.g. the client step fails), the mock facilitator and resource server processes remain running on ports 4020/4021. run_all.sh then moves on to the next scenario, which tries to start new services on the same ports.

The @start_bg builtin does attempt to kill the previous process via its PID file:

if pid_file.exists():
    old_pid = int(pid_file.read_text().strip())
    os.kill(old_pid, signal.SIGTERM)

This mitigates the issue in most cases, but there is no explicit wait for the port to be freed by the OS after SIGTERM. Under load on CI, the subsequent @start_bg may launch the new process before the TCP socket clears, causing the @wait_http readiness poll to eventually time out and fail the next scenario with a confusing error. Uvicorn's default graceful shutdown takes up to 5 seconds.

Recommendation

Add a brief @stop_all_bg step to the beginning of each scenario config (not just the end) as a "clean start" guard, OR add an explicit sleep/port-wait after @stop_all_bg in each scenario. The simplest fix is a "cleanup" first step:

{
  "name": "ensure no stale services from previous run",
  "command": "@stop_all_bg",
  "timeout": 10
}

This would make each scenario self-sufficient regardless of what the previous one left behind.


[MN-05] Missing per-scenario README.md files — convention violated for 10 of 11 directories

Property Value
Severity Minor
Category Documentation
File e2e/scenarios/*/

Description

.claude/rules/testing/scenarios.md explicitly lists README.md as a required file for every scenario directory. Only e2e/scenarios/exact_permit_happy_path/README.md was added. The remaining directories are missing READMEs:

  • exact_happy_path/
  • exact_gasfree_happy_path/
  • unhappy_verify_invalid_sig/
  • unhappy_verify_expired/
  • unhappy_verify_network_mismatch/
  • unhappy_settle_insufficient/
  • unhappy_settle_revert/
  • testnet/exact_testnet/
  • testnet/exact_permit_testnet/
  • testnet/exact_gasfree_testnet/

This matters for maintainability: a developer reading an unfamiliar scenario directory has no immediate explanation of purpose, preconditions, or run command.

Recommendation

Add minimal READMEs for each missing directory. The exact_permit_happy_path/README.md can serve as the template.


Suggestions

[S-01] Pin GitHub Actions to SHA commits for supply-chain security

File: .github/workflows/check_e2e.yml, nightly_testnet.yml
Description: Both workflows reference actions at mutable major-version tags (actions/checkout@v4, astral-sh/setup-uv@v5, actions/upload-artifact@v4). A compromised action tag could execute arbitrary code in CI with access to all GitHub Secrets.
Suggestion: Pin each action to a specific commit SHA, e.g. actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2. Tools like Dependabot or pin-github-action automate this.


[S-02] @json_match spec format cannot express ~ in a plain = expected value

File: integration/commands.py : Lines 313–336
Description: The spec parser uses ~ as the regex-operator token. The disambiguation logic spec.index("~") < spec.index("=") means that any =-type assertion whose expected value contains ~ (e.g. body.hash=0x1234~abcd) would be misrouted to the regex branch, causing a confusing test failure. The usage string does not document this limitation.
Suggestion: Document the limitation in the usage= string for @json_match. Alternatively, require explicit ~= (not ~) as the regex separator to eliminate ambiguity entirely.


[S-03] shell=True in subprocess calls throughout the integration runner

File: integration/run.py : Line 83; integration/commands.py : Line 167
Description: subprocess.run(command, shell=True, ...) and subprocess.Popen(command, shell=True, ...) are used to execute shell commands. While the commands originate from JSON config files and hardcoded strings (not direct user input), shell=True is a latent risk if the input sources ever expand. Additionally, it requires the shell to parse quoting/escaping, which is where solutions.md entry #8 bug came from.
Suggestion: Consider wrapping known fixed commands (e.g., pnpm test, uv run pytest) as argument lists and using shell=False. For variable commands from config files, document the shell injection risk explicitly in integration/README.md.


[S-04] Verify that unhappy_settle_revert uses a unique output path

File: e2e/scenarios/unhappy_settle_revert/config.json
Description: unhappy_settle_insufficient writes to /tmp/x402-e2e-unhappy-settle-response.json. This reviewer did not read unhappy_settle_revert/config.json — if it also writes to the same path, a failure of one scenario could leave stale data that causes the other to produce a false pass or a misleading diff. The verify-scenario group already uses unique paths (-verify-response.json, -verify-expired-response.json, etc.), so this may be already handled.
Suggestion: Verify that unhappy_settle_revert uses a distinct output path (e.g., /tmp/x402-e2e-unhappy-settle-revert-response.json) separate from unhappy_settle_insufficient's /tmp/x402-e2e-unhappy-settle-response.json.


Positive Observations

Area Observation
docs/solutions.md Exceptional knowledge base. 10 entries, each with consistent symptom/root-cause/fix/rule structure. Entries 1–3 directly map to documented production bugs with commit references — this is exactly the kind of tribal knowledge that prevents regressions.
Two-tier test design Mock tier (no creds, ~30s) + testnet tier (nightly, real chain) is the right split. Testnet scenarios self-skip when secrets are absent rather than failing, which is the correct design for CI with optional credentials.
@start_bg robustness The kill-before-restart logic (PID file + SIGTERM → SIGKILL escalation) and start_new_session=True process group isolation are solid process management. Most custom test harnesses get this wrong.
Declarative scenario format Pure JSON config.json + expected_*.json with a generic runner is exactly right. Adding a scenario requires zero Python — copy a directory, edit JSON. This is a significant maintainability win.
E2E_SKIP_ALLOWANCE / E2E_SKIP_TX_VERIFICATION flags Explicit environment flags for the two "can't run this offline" checks (allowance and tx verification) are well-designed. The overrides are narrowly scoped (no-chain RPC path only) and clearly documented.
Security defaults TRON_TEST_KEY carries a prominent warning comment. .env is gitignored. GitHub Secrets are used for testnet credentials. The nightly workflow adds github.repository_owner == 'BofAI' guard to prevent fork-secret leakage.
asyncRewake hook integration The Claude Code hook uses asyncRewake: true + rewakeMessage to surface test failures back to the model. This closes the feedback loop for AI-driven edits cleanly.
_AllowanceSkippingEvmSigner / _OfflineTronSigner subclass pattern Keeping offline overrides as subclasses of the real signers (rather than duck-typed mocks) ensures the e2e client exercises the full SDK code path for everything except the chain RPC calls.
run_all.sh continues past failures The outer script collects failed scenarios and reports them all rather than aborting at the first one. This gives complete CI feedback in a single run.
integration/run.py is pure run_step and format_report are side-effect-free functions that take inputs and return outputs. State flows through return values. This makes the runner easy to test in isolation.

Checklist Results

Category Items Checked Pass Fail N/A Notes
Correctness 8 6 2 0 MJ-02 (KeyError), MN-04 (port reuse)
Security 8 6 1 1 MJ-01 (silent hook bypass), S-03 (shell=True)
Performance 5 4 0 1 No DB/unbounded ops in changed code
Code Quality 8 6 2 0 MN-01 (eval), MN-03 (module-level side effects)
Testing 6 4 1 1 MN-05 (missing READMEs)
Documentation 5 4 1 0 MN-05 (per-scenario READMEs)
Compatibility 4 4 0 0 No breaking changes; specs/ rename is additive
Observability 4 4 0 0 JSONL log in mock facilitator; hook logs in .claude/hooks/logs/

Disclaimer

This is an automated code review. It supplements but does not replace human review. The reviewer analyzed only the diff between main and feat/x402-ai. Runtime behavior, integration testing, and deployment impact are not covered. No production payment-path code (signing, verification, settlement) was changed in this PR; security findings are scoped to the tooling and test infrastructure introduced here.


Report generated by Code Review Skill v1.0.0
Date: 2026-04-24

@Will-Guan Will-Guan merged commit f2518cc into main Apr 24, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants