Skip to content

MCP E2E tests and shared mode live gameplay#6

Closed
ArturSkowronski wants to merge 262 commits into
bfirsh:masterfrom
ArturSkowronski:mcp-e2e-shared-mode
Closed

MCP E2E tests and shared mode live gameplay#6
ArturSkowronski wants to merge 262 commits into
bfirsh:masterfrom
ArturSkowronski:mcp-e2e-shared-mode

Conversation

@ArturSkowronski
Copy link
Copy Markdown

Summary

  • 14 E2E tests in McpApiE2ETest: real Netty server + RestApiClient — same HTTP path MCP uses in production. SMB + FF1 workflows.
  • Fix MCP stdio transport — server stayed alive with Job.join() instead of exiting immediately
  • Enable /step and /reset in shared mode — MCP can advance frames while Compose UI renders live
  • Wire API input into Compose UIComposeInputHandler merges keyboard + gamepad + API controller
  • Safe shared-mode steppingadvanceFrames waits for UI-produced frames instead of racing the CPU

Test plan

  • ./gradlew test — 464 tests, 0 failures
  • MCP server responds to initialize protocol message
  • Compose UI + API Server + MCP: Claude plays FF1 through MCP tools with live visualization

Add CI configuration, basic smoke test, and update dependencies
Removing unnecessary mappers
Add kNES API README with usage, endpoint reference, and design rationale
- Remove LaunchedEffect polling requestFocus() every 1s (unnecessary recomposition)
- Remove dead isMacOS scale*1 branch
- Remove non-observable gamepadController.statusMessage read
- Simplify ComposeUI renderer (fixed 512x480, no dead scale logic)
- Clean up formatting and indentation
Clean up Compose UI: remove polling hack, dead code, simplify renderer
Tests exercise the full API flow with a real ROM (skipped if unavailable):
- Load ROM → watch RAM → press Start → walk right → verify position changed
- Screenshot returns valid PNG with correct magic bytes
- FM2 input playback executes correct number of frames
- Batch step sequence advances exact frame count
Add REST API E2E tests: game session, screenshot, FM2, batch step
Game-specific memory maps loaded from JSON profiles at runtime.
Built-in profiles for Super Mario Bros and Final Fantasy.
Custom profiles can be registered via POST /profiles.

Endpoints:
  GET  /profiles         — list available profiles
  GET  /profiles/{id}    — get profile details with all addresses
  POST /profiles/{id}/apply — apply profile to watch list
  POST /profiles         — register custom profile at runtime

Built-in profiles:
  smb  — Super Mario Bros (17 addresses: position, lives, world, score, etc)
  ff1  — Final Fantasy (30 addresses: party HP/stats, gold, position, battle state)
Add game profiles: dynamic RAM watch for SMB and Final Fantasy
…ship

Architecture change: game profiles moved from knes-api to new knes-debug
module. knes-emulator stays pure NES, knes-debug provides tooling shared
by both the REST API and Compose UI.

New module: knes-debug
- GameProfile: loads game-specific RAM maps from JSON profiles
- MemoryMonitor: reads NES memory using active profile
- Built-in profiles: SMB (17 addresses), FF1 (30 addresses)
- 5 unit tests

Compose UI: Profile Monitor window
- New "Monitor" button opens a second window
- Dropdown to select game profile (SMB / Final Fantasy)
- Auto-refreshing table of RAM values (~10fps)
- Shows variable name, decimal value, hex value, address

knes-api: delegates to knes-debug for profile registry
- ApiGameProfile wrapper for JSON serialization
- Profile JSON files moved to knes-debug/src/main/resources/profiles/
Add knes-debug module with Profile Monitor window
New "API Server" button in UI starts/stops an embedded Ktor server on
port 6502 sharing the same NES instance. Enables observing the running
game via REST while playing with keyboard/gamepad.

Shared mode:
- /state, /screen, /watch, /profiles, /health — work normally
- /press, /release — work (merges with keyboard input)
- /step, /rom, /reset — return 400 (UI drives emulation)

Double buffer in EmulatorSession prevents torn frames: writeBuffer
receives new data from PPU, then swaps atomically with readyBuffer
which the API reads from. @volatile ensures visibility across threads.
Add embedded API server in Compose UI with double-buffered frames
New addresses: screen state, menu cursor, battle details, enemy data,
full stats for all 4 characters.

Addresses that give information a human player cannot see on screen are
marked with "hidden": true in the JSON profile:
- encounterCounter, nextEnemyType (upcoming encounter knowledge)
- enemy HP (no HP bars in FF1)
- preemptiveAmbush, hitCount, criticalHit, targetDamage (pre-animation)

GameProfile.toFairWatchMap() excludes hidden addresses for fair play.
GameProfile.toWatchMap() includes everything for debug/TAS use.
Expand FF1 profile to 70 addresses, add hidden flag for fair play
Add MCP server: control emulator from Claude as native tools
MCP tools now call the Compose UI's embedded REST API (localhost:6502)
instead of running a standalone headless NES. This means:
- The LLM controls the game through MCP tools
- The user watches gameplay live in the Compose UI window
- Same NES instance, zero duplication

Flow: Claude → MCP (stdio) → REST API (:6502) → NES ← Compose UI

Usage:
1. Launch Compose UI, click "API Server"
2. Configure MCP server in Claude Desktop/Code
3. Ask Claude to play the game
MCP server bridges to REST API for live visualization
Add MCP module tests (17 tests)
- Add 14 E2E tests (McpApiE2ETest): real Netty server + RestApiClient,
  same HTTP path as MCP production. 6 infra tests (no ROM), 5 SMB tests,
  3 FF1 tests including full intro navigation.
- Fix MCP server shutdown: add Job.join() so stdio transport stays alive
- Enable /step and /reset in shared mode so MCP can control emulation
  while Compose UI renders live
- Wire API controller into ComposeInputHandler so button presses from
  MCP reach the NES (keyboard + gamepad + API merged)
- In shared mode, advanceFrames waits for UI-produced frames instead of
  stepping CPU directly (prevents race condition)
- Add .mcp.json for Claude Code integration

464 tests, 0 failures.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant