feat(ff1): Coneria post-buy exit + grind pipeline (Spec 5 cont-4/5)#16
feat(ff1): Coneria post-buy exit + grind pipeline (Spec 5 cont-4/5)#16ArturSkowronski wants to merge 648 commits into
Conversation
V2.5.7 evidence: pathfinder returned found=false from (140,152) → (140,145) in live runs, but offline against the same ROM-decoded map it finds a 7-step path. Cause: FogOfWar was poisoned. Earlier walk attempts where worldX/Y didn't change (because party transitioned into a town/castle map, or because a modal was up) marked the destination tile blocked — but those tiles are passable; only the engine prevented the move because of state, not terrain. Fix: in WalkOverworldTo, only call fog.markBlocked when worldX/Y unchanged AND no RAM transition signal (locationType/localX/localY/screenState all identical to pre-step). Belt-and-suspenders with V2.5.2 abort. Also adds OverworldDumpTest (disabled-by-default diagnostic) which dumps the ROM-decoded overworld grid + runs the pathfinder on canonical from→to pairs. Used to verify the live failure was fog-poisoning, not pathfinder bug.
…ompts Three connected fixes informed by V2.5.7/V2.5.9 evidence + offline ROM inspection: 1) Garland is the BOSS of Chaos Shrine (Temple of Fiends) — an interior dungeon — NOT a scripted bridge encounter. Removed the false claim that walking north triggers Garland; replaced with the actual workflow (overworld → shrine entry → exitInterior loops → boss room). 2) Spawn after pressStartUntilOverworld is Overworld(146, 158), not Indoors. V2.5.7 RAM-override classifies this correctly; the prompt now tells the LLM not to expect Indoors at the start. 3) V2.5.4 hard-impassable rule for TOWN/CASTLE: tells LLM that to enter a town/castle the EXACT tile must be the walkOverworldTo target. Also clarifies that battleFightAll handles BATTLE+POSTBATTLE in one call.
…p BFS V2.6.1 — overworld classifier coverage gap. ROM byte probe around Coneria castle/town entries identified six tile bytes missing from OverworldTileClassifier: - 0x2C, 0x2E, 0x3B, 0x3C, 0x3E, 0x3F → CASTLE (castle border/decoration) - 0x3D → TOWN (town center variant) - 0x4F → TOWN (continuation of 0x49..0x4E) Previously these were UNKNOWN (impassable, but pathfinder treated them as walls rather than enter-on-arrival town/castle tiles). Now V2.5.4 hard- impassable rule blocks pathfinder from routing across the castle perimeter unless the destination is the castle/town tile itself. V2.6.2 — interior pathfinder full-map BFS (originally V2.4.6-A in STATE). V2.6.0 evidence: agent reached mapId=24 sub-map at (3, 2) and ExitInterior looped 7 turns with `findPathToExit` returning BLOCKED — exits were outside the 16×16 viewport window. New InteriorMap.readFullMapView(64×64) plus MapSession.readFullMapView() drive the same InteriorPathfinder over the whole map; ExitInterior and the findPathToExit @tool both switched. Advisor's 16×16 ASCII rendering unchanged (keeps prompt size bounded).
…ection V2.6.2 evidence: ExitInterior returned BLOCKED for mapId=8 with party at (5, 28) despite the full 64×64 BFS. V2.4.4 with the old 16×16 viewport had found a 17-step exit path from the same position. Cause: isSouthEdgeExit required EVERY column tile from ly+1 to viewport.height to be impassable. With 16×16 viewport this naturally bounded the probe to ~7 tiles south of party — a reasonable proxy for "edge of playable area". With 64×64 full-map the probe extended into outside-playable rows that default-fill to 0x00 (GRASS, passable), so the predicate always returned false → no south-edge exit ever detected. Fix: SOUTH_EDGE_PROBE_DEPTH = 8. Restores 16×16-equivalent semantics while keeping V2.6.2 global BFS reachability.
…int bug Diagnostic test confirming why ExitInterior fails in mapId=24 with party at (3, 2): the byte value at that position in our InteriorMap-decoded data is 0x3c (padding/UNKNOWN/impassable), as are all 4-neighbours. BFS starts on an impassable tile and immediately exhausts. Either (a) RAM currentMapId=24 doesn't directly index our InteriorMap loader, or (b) FF1 sub-map localX/localY are scrolled / offset. Out of V2.x scope; documented for the next milestone.
…blocker State doc supersedes STATE-2026-05-03.md (V2.4.5 era). Captures the session's architectural progress: vision classifier, full-map BFS (overworld + interior), hard-impassable TOWN/CASTLE, RAM override, fog defensive marking, classifier coverage, prompt corrections, south-edge probe depth. Includes accumulated trace/SUMMARY artifacts for each V2.5.x/V2.6.x live run. Next blocker (out of V2.6 scope): RAM currentMapId=24 doesn't index InteriorMapLoader.load(24) cleanly — party lands on 0x3c padding in our decoded data. Likely scrolled-coords or transformed mapId; needs RE work next milestone.
…oll + (8, 7) V2.6.3 evidence: party reached Indoors(mapId=24, localX=3, localY=2). The ROM-decoded InteriorMap[24] has byte 0x3c (padding) at (3, 2). All neighbours also padding. BFS exhausted → ExitInterior BLOCKED. Hypothesis verified by full 64×64 dump: mapId=24 is Coneria Castle interior with throne room at y=2-5, vertical corridor at x=9-15 y=8-26, entrance hall at y=27-31. Party RAM (3, 2) + (8, 7) = (11, 9) — the throne-room corridor floor (byte 0x31, passable). Same offset applied to RAM (3, 11) → (11, 18) — also corridor floor. Pattern matches a party walking 9 tiles north through the main hall. Conclusion: localX/localY in RAM (0x0029/0x002A) is the scroll offset (top-left of the 16×15 NES viewport), not the party's tile. Party stays at screen center, so its actual map tile = scroll + (8, 7). ExitInterior + findPathToExit @tool now translate scroll → party tile before BFS. Marks-blocked / stairs A-tap also operate on party tile. Overworld code path unchanged (worldX/worldY are absolute party tiles).
…cation V2.6.4 evidence: party rotated through mapId=8 from RAM (5, 28) to (4, 11) — first interior progression in V2.x — but then stalled. findPathToExit returned a 17-step path to (15, 32); ExitInterior couldn't execute it. Each step pressed a direction but party didn't move. Possible causes: FRAMES_PER_TILE too short, NPC blocking unobserved tile, input controller stuck, or coord-arithmetic drift. Per-step trace records (from, dir, after, mapId, pathLen) so the next post-mortem can see exactly which steps moved party and which didn't, without guessing from the executor's prose.
Saves a focused handoff for the next session. Key open question: is RAM localX/localY (0x0029/0x002A) the party tile or the scroll offset? V2.6.4 assumed scroll offset, V2.6.5 trace shows partial movement (76/583 moves) plus W-direction → Y-coord-change inconsistencies. Could be NPC push (mapId=8 is Coneria Town outdoor with random NPCs) or wrong hypothesis. Disabled mapId=8 dump test added; enable to verify.
… decoded) Ran the dump test before session end. Result: NEITHER raw nor scroll- offset interpretation consistently matches both observed party positions in mapId=8. RAM (5, 28) raw → 0x31 floor passable ✓ but scroll (13, 35) → wall ✗ RAM (4, 11) raw → 0x30 wall ✗ but scroll (12, 18) → 0x44 STAIRS ✓ Each interpretation works for exactly one position. Theory C now leading: our InteriorMapLoader.load(8) decodes a map that may not be the same one the game is playing. RAM 0x0048 (currentMapId) might not be sole indicator — FF1 may have a sub-map ID byte we don't track. 13% step-success rate (76/583 in V2.6.5 trace) is consistent with "BFS through partially-correct data, party walks by coincidence". Updated handoff doc with new task list: find sub-map ID, capture screenshot at first Indoors frame, verify loader pointer table.
… doc Task 0 of V3.0 vision-first interior pivot. Three FF1 interior screenshots (Coneria Castle throne room, mapId=24) sent to claude-sonnet-4-6 with the draft navigator prompt. Vision identified specific FF1 elements (throne, king, pillars, stairs) and produced well-formed JSON. 2/3 directional picks strategically correct; one mistook the throne dais for an exit door — addressable by enriching the prompt for Task 1. Verdict: GO. ~\$0.015 per probe. Includes: - DECISION doc explaining pivot rationale (datacrystal verification, CPP reference architecture). - Full V3.0 plan (Task 0–7, TDD-style with acceptance criteria). - Feasibility test source (disabled by default).
…nthropic API) Sonnet 4.6 picks one cardinal direction per step from an FF1 interior screenshot. Bypasses Koog (mirroring AnthropicVisionPhaseClassifier) so the outer agent loop stays unchanged. Prompt design draws on Task 0 feasibility findings: explicitly excludes thrones/treasure/dais/shop counters as exits, biases toward south pillar-corridors typical of FF1 castle layouts, and supports an optional entryDirection hint to steer the navigator toward the way the party came in. 8 parser tests (cardinal letters, EXIT/STUCK, code-fenced JSON, malformed envelopes) — all pass.
…erify) Loop: screenshot → navigator.nextDirection → tap button → check RAM moved or transitioned. Emits per-step trace via ToolCallLog so we can compute step-success rate from a single run, the way V2.6.5 did for the decoder. Termination: encounter, transition to overworld, vision STUCK/UNCLEAR, or maxSteps. lastBlocked direction is fed back to the navigator so it won't repeat a physically failed move. Skipping a dedicated unit test for slice 1 — EmulatorToolset is a concrete class, mocking would require new test infrastructure. Live run in Task 7 will validate end-to-end.
…Interior on towns Tool list now leads with walkInteriorVision in the Indoors block. exitInterior + findPathToExit kept callable but marked DEPRECATED in their @LLMDescription so the executor's prompt construction (Koog auto-includes tool descriptions) carries the steer. Navigator parameter is nullable so existing tests that build SkillRegistry without ANTHROPIC_API_KEY keep compiling. When unconfigured, the tool returns a clear failure message rather than crashing.
Indoors block in both prompts now leads with walkInteriorVision and frames exitInterior/findPathToExit as deprecated fallbacks. Advisor explicitly told NOT to propose target localX/localY in interiors (decoder unreliable); overworld waypoints unchanged.
…orAgent Main constructs the navigator from ANTHROPIC_API_KEY (already required) and threads it through ExecutorAgent → SkillRegistry → WalkInteriorVision. Optional in the type system so unit tests that build a partial ExecutorAgent without a key still compile. Slice 1 implementation now complete; one live run (Task 7) verifies the ≥50% step success criterion.
…ase-classifier bug exposed OutOfBudget after 12m 42s, ~26 walkInteriorVision invocations. Skill + navigator wiring confirmed correct: 30 navigator responses parsed without UNCLEAR; skill loop terminated cleanly each time; mechanical taps moved party 3/4 times when fired. Run never reached a real interior because of a V2.5 phase-classifier bug: when locType=0 && currentMapId=0 (overworld) but localX/Y carry stale values from a prior abort, vision phase classifier (Haiku) misclassifies the frame as INDOORS. Vision navigator (Sonnet) correctly disagrees with EXIT — but classifier wins upstream and agent oscillates. DoD criteria 4 + 5 NOT MEASURED (no real interior reached). Open V3.1 to extend RAM hard-override (locType==0 && mapId==0 → Overworld) before re-running V3.0 evidence.
…verworld) V3.0 slice 1 live run revealed a phase-classifier oscillation: after walkOverworldTo's interior-abort signal, FF1 leaves RAM localX/Y non-zero on the overworld (datacrystal RAM map: 0x0029/0x002A is 'Non-world map position' and FF1 doesn't zero it on overworld entry). The strict V2.5.7 override (locType=0 && lx=0 && ly=0) missed this case so the Haiku vision phase classifier was consulted, mistook the frame for an indoor space and the agent looped between false-Indoors and Overworld until OutOfBudget — never reaching a real interior. currentMapId=0 means 'no interior loaded'. Combined with locType=0 and live world coords, party is unambiguously on the world map. Adding this secondary override prevents the oscillation without weakening the existing strict path. Sets up V3.0 re-run that can actually measure interior step success.
…avigator over-caution exposed V3.1 RAM hard-override unblocked phase classification: agent reached Indoors(mapId=8, localX=5, localY=28) for the first time in the V2.4–V3.0 lineage without oscillation. But navigator returned STUCK on 15 of 23 (65%) direction queries inside Coneria Town — only 4 mechanical taps fired, 1 moved (25% step success). DoD criteria not met (need 50% on >n=4; need at least one transitioned=true). Diagnosis: navigator prompt was tuned in Task 0 for castle interiors (throne dais ≠ exit) but Coneria Town is an OUTDOOR town map with paths between shops + NPCs. The same prompt over-classifies winding paths as STUCK when they are walkable. Open V3.2: town-aware prompt tuning + STUCK threshold (don't honor on step=0; default to a cardinal until at least 2 failed taps). Combined session spend: ~$0.95.
…fallback V3.1 verification revealed navigator returned STUCK on 65% of queries in Coneria Town (mapId=8). Two fixes: 1. Navigator prompt: explicitly distinguish TOWN (open path-network with shops/NPCs walkable AROUND), CASTLE (corridor toward south), DUNGEON (stairs/warps). Emphasises 'STUCK ONLY if all 4 cardinals impassable — if even one direction shows walkable terrain, pick that'. 2. WalkInteriorVision: do not honor the FIRST STUCK return when there is no movement evidence yet. Default to a perpendicular fallback (SOUTH unless lastBlocked, then rotate). Honor STUCK only after 2 consecutive returns. ToolCallLog records when fallback fires. These two levers are aimed at the V3.1 measurement gap: 4 mechanical taps was too few. With STUCK-on-step-0 disarmed, we expect ~3-4× more taps per skill invocation, giving statistically meaningful step-success data on the next run.
…rmed for FF1 towns 20 invocations, 70 vision direction calls, 51 mechanical taps. Step success **7.8%** — WORSE than V2.6.5 decoder baseline (13%) and below slice-1 DoD (50%). Trade was: V3.2 prompt eliminated 65% STUCK rate (V3.1) but introduced 27% false-EXIT rate and oscillation between two adjacent tiles in Coneria Town for the entire skill phase. Honest diagnosis: single-frame zero-context vision is the wrong tool for tile-precise FF1 navigation. Failure modes documented in SUMMARY: no movement memory across calls, pixel-tile collision mismatch, two-cardinal traps, and a single prompt knob that swings between cautious-STUCK and eager-wrong loops. V3.0 hypothesis (vision-first interior) DISCONFIRMED as architected. Cumulative session spend: ~$1.50 across feasibility + 3 live runs. Recommended next: ship slice 1 + V3.1 as PR (clean stop), then iterate with hybrid C (decoder + advisor screenshots) or multimodal executor. No more vision-prompt epicycles.
…ndoors stuck)
After V3.2 disconfirmed single-frame vision-first interior nav, return
to decoder as primary baseline (13% step success on Coneria Town) and
use vision selectively via the advisor.
Four small changes:
1. ScreenshotPolicy: attach screenshot to advisor on every Indoors call,
not just phase change. Lets the advisor look at the actual frame when
the decoder is making no progress.
2. AgentSession: lower advisor consult threshold for Indoors phase from
idleTurns>=20 to >=5. The decoder either makes early progress or gets
pinned — early visual hint is cheaper than 20 wasted executor turns.
3. Executor prompt: exitInterior is PRIMARY again. walkInteriorVision is
ESCALATION ('only after exitInterior fails twice AND advisor explicitly
recommends'). V3.0 evidence cited (~8% on towns vs 13% decoder).
4. Advisor prompt: when called with 'stuck in interior' reason, INSPECT
the screenshot and emit a single cardinal hint as a plan step. Only
escalate to walkInteriorVision after two such hints fail.
~50 LOC total. Tests stay green; live run will measure whether the
hybrid lifts step success above the 13% decoder floor.
…ow skill layer Decoder (primary) + vision (advisor-driven escalation) tested on Coneria Town (mapId=8). Combined step success 4.9% on n=793 — below V2.6.5 baseline (13%) and below V3.2 vision-only (7.8%). Two surprises in the data: - Decoder degraded vs V2.6.5 baseline (4.6% on n=764 in V4 vs 13% on n=583 in V2.6.5). V2.6.5 may have been a favourable-condition fluke. - Vision-with-advisor outperformed pure vision (13.8% vs 7.8%) but on small N=29 — encouraging but not conclusive. Two negative results in a row (V3.0/V3.2 vision-first AND V4 hybrid) with N now decisive (793 steps). The hypothesis that better skill orchestration unlocks town traversal is disconfirmed. Bottleneck is likely below the skill layer: - NPC blocks marked permanently in fog (no clear-on-revisit) - Animation/walk-state timing during the 48-frame hold - Diagonal corners requiring multi-tap navigation - Possible RAM coord interpretation still wrong (V2.6.4 hypothesis) Pushing back to PR #99 with two evidence runs (V3.2 + V4) showing the search/architecture lever does not produce >15% step success on Coneria Town. Future work moves to V5: movement primitive audit before any more strategy iteration. Cumulative session spend: ~$2.00 (4 live runs).
Per-frame RAM capture during 200-frame DOWN/UP holds inside Coneria peninsula. Key findings: 1. **localY is party tile, NOT scroll offset.** DOWN hold in mapId=24 (Coneria Castle) produced clean monotonic increments: locY 0→1→2→...→11 in 200 frames, ~16 frames per tile. V2.6.4 scroll-offset hypothesis is now decisively disconfirmed by ground-truth telemetry. 2. **Movement primitive is clean.** When the decoded map is correct (mapId=24), the underlying NES input + collision works fine. Our 48-frame-per-tile skill setting is 3× more than needed for castles (~16 frames suffices). 3. **Sub-map transitions = mid-frame mapId flips + 4-frame localY-spaghetti.** Walking off a map edge changes mapId and re-centres the camera. RAM during the transition is not stable. 4. **The real bottleneck is mapId=8 (Coneria Town) decoder.** Re-confirms V2.6.5 Theory C for the third time, now with positive counter-evidence: castles work, towns don't, primitive is the same. The bug is in InteriorMapLoader.load(8) decoding a different ROM section than the game plays. Implication: V3/V4 search/architecture lever is fully exhausted. The only remaining productive direction is fixing the decoder for mapId=8 — exactly what V2.6.6 was about to investigate before the V3 pivot. Three candidate next steps documented in the notes; recommend B (visual diff between decoded mapId=8 ASCII and a rendered Coneria Town screenshot) as cheapest research move. Test disabled by default; CSV artefacts retained as evidence.
…led, navigation TODO) Test scaffold for the V5.1 visual diff between live mapId=8 frame and the offline decoder's ASCII glyph dump. Goal: identify the fingerprint of the InteriorMapLoader.load(8) bug (off-by-one? wrong bank? wrong table?). Disabled by default — the navigation logic to reliably reach mapId=8 from spawn is not yet solved. The agent enters mapId=24 (Coneria Castle) easily but mapId=8 (Coneria Town) requires more deliberate overworld pathing (V2.6.5 evidence: walk to (145,152) on overworld, step S to transition). Logic for the diff itself is complete (live screenshot + ASCII viewport + full 64x64 dump centred on party). Once the navigation step lands, flip enabled=false to enabled=canRun and run the test to produce the diff artefacts. Marked as a research TODO. The downstream V5.2 fix will use this diff to decide between (A) hex-audit pointer table, (B) wrong-bank-fix, or (C) replace decoder with screen-derived map.
… handling CI was failing on two pre-existing knes-debug tests that predate this PR: - 'canExecute checks screenState' expected 0x63 → false, but V2.4.3 made the skill dismiss PostBattle so 0x63 → true is the correct behaviour. - 'BattleFightAll executes correctly' mock returned 0x63 forever after state call 4, but the post-battle dismissal loop now requires the state to eventually clear (0x00) — otherwise the action correctly reports failure. Updates both expectations to reflect V2.4.3 semantics. Pre-existing test/code mismatch from May 2 — not introduced by V3-V5 work. Full suite green: ./gradlew test BUILD SUCCESSFUL.
FF1 agent V2.4→V3.2 — interior decoder + vision-first nav (negative result, infra ships)
… empirical OW probe
Five-component scaffold to unblock the V5.2 visual-diff workstream and the
broader town/castle navigation problem. Each piece is independently usable.
1. EmulatorSession.saveState/loadState — wraps vNES NES.stateSave/stateLoad
into an in-memory ByteArray API. Round-trip RAM-perfect (verified by
SaveStateRoundTripTest). Enables fixture-based test starts that skip 10s
of boot per run.
2. OverworldMemory — persistent (~/.knes/ff1-ow-memory.json) per-tile
observation store: ENTRY/DECOR/UNREACHABLE + enteredMapId + confirmCount.
Accumulates across sessions so empirical walkability data survives ROM
resets, like a player's mental map.
3. ExecutorAgent.goalOverride + AdvisorAgent.goalOverride — string-replace
the canonical "Goal: AtGarlandBattle" paragraph in each agent's system
prompt without forking. Tests inject Coneria-Town goal for fixture
capture without touching production prompts. GoalOverrideTest verifies
the swap is clean (no leftover Garland references).
4. AgentSession.onTurnEnd callback — non-breaking optional hook firing
after each executor turn with current phase + RAM. Returning a non-null
Outcome short-circuits the agent loop. Used by fixture-builder tests
to stop the moment a target state is reached.
5. ConeriaTownEmpiricalDiscoveryTest — H2 raw-step DFS exploration. From
spawn, taps cardinals one tile at a time, observes RAM (worldX/Y change
= walkable, locType change = entry). Bypasses BFS and OverworldTileClassifier
entirely; engine's own walkability check is source of truth.
Empirical finding from H2 run: 81 walkable OW tiles around spawn covering
x=137-152, y=150-167 — no tile triggered locationType != 0 anywhere in
range. Strongly suggests one of:
(a) OverworldTileClassifier byte-id ranges for TOWN are wrong on this ROM
(b) currentMapId/locationType RAM addresses (V2.4 heuristic) read from
non-canonical locations and miss real interior transitions
(c) Coneria Town entry is at world coords beyond explored range (y > 167)
Next session focus: verify FF1 RAM mapping for $000D (locationType per
datacrystal) and currentMapId. Once RAM is trustworthy, H2 should
auto-discover the entry tile within minutes of re-running on broader radius.
Coneria8VisualDiffTest now loads the post-boot fixture (saves ~10s) but
still fails on the navigation step pending root-cause fix above.
Files:
- knes-emulator-session: saveState/loadState API
- knes-agent runtime: AgentSession.onTurnEnd
- knes-agent executor/advisor: goalOverride + GOAL_PARAGRAPH constants
- knes-agent perception: OverworldMemory persistent store
- knes-agent tests: SaveStateRoundTrip, PostBootFixtureBuilder,
ConeriaTownFixtureBuilder (agent-driven), ConeriaTownEmpiricalDiscovery
(raw-step DFS), GoalOverrideTest
- fixtures: ff1-post-boot.{savestate,json,png}
Refs: V3.0 status memory ("13% step success on towns"), Entroper FF1
disasm bank_0F.asm:1633 (tileset_prop teleport mechanism).
… finds TOWN mode MapIdDiscoveryTest dumps zero page on overworld vs after entering Coneria Town (via raw N×6 / W×1 / UP from spawn 146,158). Diff reveals: - $0048 IS canonical map-id byte (was V2.4 heuristic — confirmed correct). Coneria Town = 8. - FF1 has THREE location modes, not two: * Overworld: locType=0x00, world=valid, local=(0,0) * Town: locType=0x00, world=frozen, local=non-zero, $48=town-id * Castle/Dgn: locType=0xD1, world=frozen, local=non-zero, $48=interior-id - $0029/$002A are canonical local tile coords (+1 per tap). - $0068/$0069 = local + 7 (scroll-offset display coords) — explains V2.6.4 scroll-offset hypothesis. - $0049/$004A are paired with $48 (sub-floor / metadata). Interior savestate captured for cross-reference. Report and 20 candidate bytes documented at docs/superpowers/notes/2026-05-04-mapid-discovery/report.md. Smoking gun for V2.6.x stuck-in-Coneria evidence: vision said "castle courtyard not town huts" because InteriorMapLoader.POINTER_TABLE_OFFSET=0x10010 is the CASTLE/DUNGEON pointer table — towns need a different table.
Run #10 (2026-05-09): post_enter_detect at turn 33 said open=true kind=weapon, but the new batch-pre probe at turn 35 saw open=false (Gemini classifyShopMenu stochastic flip on identical screen). The condition `!initialProbe.open || initialProbe.kind != "weapon"` fired reEngageKeeper, which walked dx=-1 dy=-2 — cardinals which, with the shop menu OPEN, navigate the menu cursor instead of the party. Subsequent invokeMany Up×2 + A landed on a wrong row, and all 4 pairs failed WrongClass. Fix: when the post-enter run{} block already set menuAlreadyOpen=true, trust it and skip the redundant batch probe. Only reengage when the post-enter said the shop UI was closed.
Run #11 (2026-05-09): char1 successfully bought via invokeMany, but char2/3/4 all WrongClass. Trace + analysis: between-pairs B × 3 was unwinding past BUY/SELL/EXIT and closing the shop dialog entirely. FF1 NES weapon shop B-tap semantics: - From item list: B → BUY/SELL/EXIT, B again → close shop - From "another?" prompt: B → item list, B → BUY/SELL/EXIT, B → close Single B from either dismiss-loop end state lands in or above BUY/SELL/EXIT, where Up × 2 + A reliably re-selects BUY for the next pair. Bumped 3 → 1.
Run #12 (2026-05-09): post_enter saw open=false (Gemini stochastic misread on what was actually an open shop dialog). run{} block fired keeper_approach walk; cardinals navigated the menu cursor (not party, because dialog was actually open). Then batch-pre reengage walked again, further scrambling the cursor. invokeMany then ran with menuAlreadyOpen=false but state was corrupt → 4/4 WrongClass. Fix: drop the "trust post-enter" shortcut and instead probe shop UI state ONCE after the run{} block completes (whatever it did — walk or skip). Trust THAT probe. If open → skip reengage. If closed → one reengage attempt, probe again, accept either result. Hard cap at 1 reengage walk avoids the cumulative cursor drift.
Runs #10-#13 (2026-05-09) showed the deterministic "guess by tap count" approach is brittle — Gemini's classifyShopMenu is stochastic (post-enter says open, fresh-probe says closed on identical screen), which led to double-walks corrupting menu cursor; B-spam counts between pairs were unreliable across FF1 sub-menus; and char1 typically succeeded but char2/3/4 hit WrongClass / DismissCapExhausted because state assumptions drifted. New approach: classify the FF1 shop sub-screen (MAIN_MENU / ITEM_LIST / FOR_WHOM / BUY_CONFIRM / ANOTHER / WELCOME / CLOSED / UNKNOWN) via vision after each major tap, and dispatch the next action from observed UI rather than guessed sequence. Changes: - HaikuConsult: new ShopMenuPhase enum + classifyShopMenuPhase() abstract method. Implemented in GeminiVisionConsult with a focused prompt distinguishing the eight phases via JSON {"phase":"..."}. AnthropicHaikuConsult stubs to UNKNOWN (delegated to Gemini). - BuyAtShop: new invokeManyStateful method. * Per pair: drive into MAIN_MENU (B-spam to back out, A to advance Welcome) → Up×2+A → ITEM_LIST → Down×itemSlot+A → FOR_WHOM → Down×(charSlot-1)+A → BUY_CONFIRM → A on YES → dismiss watch. * On CLOSED mid-batch: aborts remaining pairs, leaves caller to re-engage keeper between rounds. - runOutfitBootPhase: switched from invokeMany to invokeManyStateful. Cost: ~5-8 vision calls per pair (~$0.025-0.04). Worthwhile when the previous deterministic version was buying only 1/4 chars per run.
…at post-enter Run #15 (2026-05-09) trace: post_enter_detect said open=false (kind- classifier stochastic flip), keeper_approach walked cardinals which — because the menu was actually open — navigated the menu cursor instead of the party. State machine first probe returned UNKNOWN, the UNKNOWN handler tapped B which closed the shop entirely, and all 4 pairs got ShopClosed. Two-part fix: - driveToMainMenu UNKNOWN handler: step+retry for first 2 attempts (give renderer a chance to settle, classifier a chance to recover). Tap B only after 3 consecutive UNKNOWN. Avoids accidentally closing the shop on a transient misclassification. - runOutfitBootPhase post-enter: in addition to classifyShopMenu (kind detection — stochastic), also call classifyShopMenuPhase. Treat any non-CLOSED, non-UNKNOWN phase as "shop UI on screen" and skip the keeper_approach cardinals walk. Two independent classifiers must both flip simultaneously to mistakenly trigger the walk.
Run #16 (2026-05-09) screenshot inspection: post-enter frame shows WEAPON banner + empty Welcome dialog + no BUY/SELL/EXIT yet — a genuine transition frame between Welcome dismiss and item-list draw. Phase classifier honestly returned CLOSED for that ambiguous state. The previous UNKNOWN handler tapped B (closed shop), then state machine bailed ShopClosed for all 4 pairs. UNKNOWN handler upgrade: 1st UNKNOWN → tap A (advance any stuck dialog page; if at MAIN_MENU misclassified, A → ITEM_LIST which the next probe sees) 2nd UNKNOWN → step+retry only (let renderer settle) 3rd+ UNKNOWN → tap B (last resort, may close shop) Also: runOutfitBootPhase now trusts menuAlreadyOpen flag from the run{} block (which already uses dual-classifier kind+phase check). The redundant freshProbe was triggering reengage walks even when menuAlreadyOpen=true via stochastic kind-classifier flip.
…ine) User feedback after runs #15-#18: the deterministic state-machine / phase-classifier approach is over-engineered and brittle. Cursor on EXIT (run #18 evidence) was misclassified as UNKNOWN, the UNKNOWN handler tapped A which confirmed Exit and closed the shop. Pivot to vision-advisor that reads the actual screen state per step and decides the next tap. Changes: - profiles/ff1.json: char1_class .. char4_class registers at 0x6100/0x6140/0x6180/0x61C0 (FF1 disasm). Values 0..5 map to Fighter / Thief / BlackBelt / RedMage / WhiteMage / BlackMage. - HaikuConsult: new ShopPurchaseAdvice + adviseShopPurchase() interface method. SYSTEM_SHOP_PURCHASE companion prompt teaches the advisor: the eight FF1 shop sub-screens (WELCOME, MAIN_MENU cursor on Buy/Sell/Exit, ITEM_LIST, FOR_WHOM, BUY_CONFIRM, ANOTHER, CLOSED), per-class equip rules, the typical Coneria weapon shop inventory, and what action drives toward the goal at each sub-screen. - GeminiVisionConsult: implementation using Gemini 2.5 Pro thinking mode (1500 tokens) with maxOutputTokens=4000. Anthropic stub. - BuyAtShop.invokeWithAdvisor: replaces the deterministic state machine. Loop: screenshot → adviseShopPurchase(context) → apply tap → repeat. Per-iter context lists each char's class + served status + remaining gold so the advisor can pick class-compatible items in any order. Per-char "bought" tracked via weapon-slot sum delta (any non-zero weapon ID in any of the 4 slots flips bought=true). - runOutfitBootPhase: invokeWithAdvisor is the primary path. Legacy invokeManyStateful kept as fallback if advisor served zero chars. Cost: ~30 advisor calls × $0.01-0.02 = $0.3-0.6 per run. More expensive than tap-counts but expected to actually serve all 4 chars per run instead of 1/4.
Run #19 (2026-05-09) trace: nav advisor reached BUY/SELL/EXIT at iter 79 ("WEAPON shop menu with Buy/Sell/Exit is visible on screen"), but the kind-classifier on the SAME screen returned null. The Done verify rejected, advisor loop hit max-iters cap, $1.40 of advisor cost wasted on a successful shop entry that the runtime threw away. Phase classifier and kind classifier are independent Gemini calls on the same screenshot. Either may stochastically miss. Done verify now accepts the entry if EITHER says shop UI is on screen — kind=weapon OR phase ∈ {MAIN_MENU, ITEM_LIST, FOR_WHOM, BUY_CONFIRM, ANOTHER, WELCOME}. The wrong-shop kind=armor rejection still applies (that branch only fires when kind is non-weapon non-null AND open=true).
Today's empirical session has burned multiple $1+ runs on Gemini API flaps during nav (api-error, EOFException, request timeout). Code changes downstream of nav are unverified because nav rarely completes. Two-part savestate feature for dev iteration: - AgentSession: when entered=true at advisor Done verify, dump emulator savestate to /tmp/spec5-shop-entered.savestate. Subsequent dev runs can load it and skip pre-boot pressStart + advisor navigation, jumping straight to in-shop purchase loop. - Main.kt: KNES_FF1_LOAD_SAVESTATE=<path> env var. After ROM load, restore emulator state from the file. Logs success/failure to stderr. - EmulatorToolset.session: was private, now public so AgentSession can call session.saveState() at the dump point. Per autonomy_principle.md the agent plays the game (no manual gameplay), but this is a dev-tool: the savestate is captured ONLY by a successful agent-driven nav, not hand-recorded.
Run #21 (2026-05-09) breakthrough: vision-advisor purchase logic works end-to-end. char1 (Fighter) and char2 (Thief) both got Small Knife (5G each). char3 (BlackBelt) was mid-purchase of Wooden Nunchuck when iter cap hit at 30. Class mapping correct, advisor's class→item logic correct, FOR_WHOM cursor management correct. Bumped to 50 to cover all 4 chars with headroom for the advisor's state-recovery taps (e.g. when MAIN_MENU classified as ITEM_LIST mid-transition, advisor occasionally needs an extra A or B).
Two bugs preventing savestate-based dev iteration on FF1 (MMC1): 1. MapperMMC1 inherited stateSave/stateLoad from MapperDefault, which only handles joypad strobe state. The MMC1 internal registers (shiftRegister, shiftCount, regControl, regCHR0/1, regPRG) and the bank-selection bookkeeping were NEVER serialized. cpuMemory bytes at $8000-$FFFF restored fine (they capture whichever PRG bank was live at save time), but the moment the restored CPU executed any bank-switch register write, the mapper computed the new bank from power-on defaults instead of the saved register values — corrupting PRG-ROM mapping mid-frame. Empirically this manifested as savestate loads "drifting" back to title screen / reset state for FF1 / Zelda / Metroid / Mega Man 2 (all MMC1). Added MMC1.stateSave: 6 ints after the base joypad header. Added MMC1.stateLoad: read 6 ints + re-run updateMirroring, updatePRGBanks, updateCHRBanks so bank pointers / CHR window / PPU mirroring match the restored register values. 2. MapperDefault.mapperInternalStateLoad and mapperInternalStateSave had their bodies SWAPPED — Load wrote to the buffer, Save read from it. This corrupted the joypad strobe portion of every savestate (and the MMC1 fix sits on top of this fix).
…rs; mapper fix pending validation Today's continuation: 26 runs, ~\$10-12 API spend, 8+ commits. Architectural wins: - Vision-advisor purchase replaces brittle state machine (1ed46ba) - Char classes added to ff1.json profile (1ed46ba) - Class-aware item picking proven (run #21: Fighter→Knife, Thief→Knife) - Savestate dump+load infrastructure (7ce67e6) - MMC1 + MapperDefault savestate bugs identified and fixed (fb46d3c) Empirically validated: - 2/4 chars Bought reliably (runs #21, #24) - Nav success ~30% (Gemini stochastic; lots of api-error / oscillation today) - Class-aware item selection visible in trace Open / next session: - Bump cap 50→80 + improve FOR_WHOM cursor prompt → expect 4/4 buys - Empirically validate MMC1 savestate fix (need post-fix successful nav) - Replace EquipWeapon state machine with adviseEquip vision advisor
…hots A leftover debug-print guard `if (tile[i] > 255)` had nested the per-tile putByte loop body inside an effectively-never-true condition — so saveState wrote ~0 bytes per nametable while loadState unconditionally read width*height bytes, consuming bytes from the following PPU fields and cascading corruption through the rest of the snapshot. Manifested as: savestate dumped mid-shop dialog, on reload RAM restored fine (currentMapId/mapflags/smPlayer all matched the dump point) but the framebuffer rendered overworld tiles or grayscale. Caused MMC1 + non-MMC1 games alike — vNES inheritance bug from a pre-Kotlin-port port. Fix removes the guard. Round-trip identity test added to knes-agent/test/SavestateRoundtripDebug.kt verifies save → load → save produces byte-identical output. Also includes a Main.kt-flow regression test that loads the persisted /tmp/spec5-shop-entered savestate and asserts character RAM survives.
First end-to-end successful 4-character weapon purchase in project history. Run B2-v3 (post-fixes): char1+2+3+4 BOUGHT in 77 advisor iterations, ~$1.0 advisor spend, 45G total in-game. Stack of fixes that enabled 4/4: 1. Main.kt savestate handling: pre-warm 120 frames before loadState (PPU rendering pipeline must engage before state restore) and pump 120 frames after (so getScreen returns the restored scene to the post-enter detector instead of a stale buffer). 2. AgentSession.runOutfitBootPhase: when KNES_FF1_LOAD_SAVESTATE was honoured, skip walk-to-coneria + nav-advisor block. Walk pressed cardinals on the active shop dialog and the accumulated dismissals eventually landed on the title menu; the savestate already places us inside the shop, so the entire pre-shop nav phase is redundant. 3. Bumped maxAdvisorCalls 50 → 80 → 120. Run A2 trace confirmed 80 was just short of char4 nav after FOR_WHOM cursor recovery. 4. Improved SYSTEM_SHOP_PURCHASE prompt: dedicated POST-PURCHASE FLOW section teaching cursor reset to char1 between buys (counted-Down recipe), explicit ERROR_DIALOG handling, and a "do NOT output Done mid-purchase-subflow" guardrail. 5. Per-iter screenshot dump in BuyAtShop.invokeWithAdvisor (/tmp/spec5-buy-advisor-iter-NN-served-XXXX.png) for post-mortem when the advisor mis-reads sub-screen. Run-by-run progression toward 4/4: Run #21 / #24 (pre-MMC1 fix): 2/4 Run A2 (post-NameTable fix, fresh nav): 3/4 Run B2-v3 (post-NameTable fix + savestate runtime): 4/4
… grind PR #122 merged (361e88e on master). Updates handoff to reflect: - Run B2-v3 milestone (4/4 chars BOUGHT, 77 advisor calls, ~$1.0). - Two-commit stack landed: NameTable.stateSave fix + spec5 4/4 buy. - Architecture diagram updated with savestate-skip path + 120f pre/post warm. - Next goal user-defined: post-purchase exit + grind. Skipping equip (chars grind bare-handed). Subgoals: (1) ExitInterior validation, (2) walk to grind tile, (3) battleFightAll, (4) track XP → level-up against strategic GRIND target.
…Grind → Battle) Approach A — vision-LLM with hint-encoded prompts. Drops tree-detour / empirical exploration from hot path; encodes 3 user gameplay hints (DOWN-from-shop, green=grass=grind-target, overworld-trees-walkable) into VisionInteriorNavigator prompt + GrindLoop anchor selection.
…s baseline User-flagged invariant: 4/4 weapon purchase (Run B2-v3) must not regress when Phase 3 (exit) and Phase 4 (grind) are reworked. Spec now lists pinned files / prompts / contracts plus two smoke checks (fresh-run + savestate-load) the implementation plan must run before declaring done.
Implements the 2026-05-09-coneria-pipeline-design.md spec. Task 1 builds GrindAnchorSelector with 5 unit tests; Task 2 appends POST-SHOP-TOWN-EXIT hints to VisionInteriorNavigator prompt; Task 3 adds GrindLoop encounter delta log; Task 4 replaces runOutfitBootPhase exit block with vision-only + anchor + reanchor orchestration; Tasks 5–6 are the two smoke checks (savestate + fresh-run) with explicit weaponsBought=4 regression guard; Task 7 captures cont-4 results in HANDOFF.md.
Captures the uncommitted cont-3 work as a single checkpoint so the cont-4 plan tasks (per docs/superpowers/plans/2026-05-09-coneria-pipeline.md) land as clean per-task diffs on top of a known base. Includes: - AgentScratchpad — coding-agent-style action notebook persisted alongside savestate (sister *.actions.json file). - WalkInteriorVision historyHint param + VisionInteriorNavigator API. - ExitTownEmpirical + ExitInterior.walkOutOfTownOverlay (defensive fallback / offline-test path; cont-4 plan drops these from the hot path but keeps them callable when outfitNavigator==null). - GrindLoop per-step screenshots + encounterCounter log. - runOutfitBootPhase: skip-equip default, post-buy savestate dump, vision historyHint wiring, boot_exit_interior_result trace. - HANDOFF.md cont-3 final notes. Compiles clean (./gradlew :knes-agent:compileKotlin BUILD SUCCESSFUL).
Per-step delta makes encounter-byte staleness visible in stdout; one-shot WARN at loop end (grind_encounter_byte_dead) signals when all 12 steps saw zero delta, distinguishing wrong RAM byte from true peninsula dead-zone.
…chor Drops ExitTownEmpirical / tree-detour from hot path (kept as offline fallback when outfitNavigator==null). Post-exit settles 120 frames, picks anchor via GrindAnchorSelector reading OverworldMap GRASS/FOREST classifications, runs GrindLoop with up to 2 re-anchors on NoEncounter. New trace markers: boot_phase3_exit_result, boot_phase4_grind_anchor, boot_phase4_grind_result, boot_phase4_grind_reanchor, boot_pipeline_end.
VisionInteriorNavigator defaulted to claude-opus-4-5-20251101 which is not a current Anthropic model ID — every API call 404'd, parser returned UNCLEAR, and WalkInteriorVision bailed after 2 consecutive UNCLEARs. Smoke #1 (savestate-load, post-cont-4 plan Tasks 1-4) hit this exact failure mode at boot_phase3_exit_result. Default switched to claude-sonnet-4-6, matching the surrounding KDoc ("Uses Sonnet 4.6"). Caller in Main.kt does not override the default, so this restores the in-comment intent.
… Coneria exit Tasks 1-4 of the cont-4 plan committed clean (12f4e10, 1e04d3a, 62f118f, b4d7fb5) plus model-ID hotfix (598df2d). All unit tests green; OutfitBootPhaseTest regression baseline preserved. Smoke #1 empirical: vision-LLM (Sonnet 4.6, post-fix) walked 60 steps in town overlay without transitioning to overworld — exactly the spec's documented Risk #1. Phase 4 grind never exercised. Smoke #2 (fresh-run) deferred to next session per user gate. Next session: Approach B (PPU nametable read for town overlay tile data). Do NOT re-enable ExitTownEmpirical tree-detour in hot path; offline fallback path remains.
Pre-step screenshot to /tmp/spec5-vision-exit-iter-NN-smX_Y-mfM.png on every iteration. Without these, a 60-step vision drift leaves zero visible artefacts (cont-4 Smoke #1 evidence). Generalises the BuyAtShop per-iter pattern per feedback_per_iter_screenshots.md. Try/catch wraps the dump so emulator-screen capture failure never fails the skill itself — purely diagnostic.
…oops Cont-4 Smoke #1 evidence: PNG iter 30 + iter 50 show party stuck at sm(2,14) for 8+ iters because vision picks LEFT each call without seeing that the previous LEFT was a no-op (lastBlocked is only the SINGLE most-recent failure). Adds rolling 8-entry buffer of "step=N dir=X smPre=(a,b) smPost=(c,d) moved=Y/N" to historyHint, so each navigator call sees its own recent decision history. System prompt updated: model is told to detect repeating-with-moved=false patterns and pick a perpendicular cardinal, plus to treat RECENT MOVES as stronger evidence than still-image inference (image cannot show that the last 5 cardinals were no-ops).
…till hits cont-3 deadlock Two new commits diagnostically: 2412b58 (per-iter PNG dump) and 2371e6e (recent-moves history in WalkInteriorVision historyHint). Both surfaced from user feedback ("czemu nie widziałem screenów" + "idzie w lewo i tam sobie krąży — może nie wie że ma iść w dół bo nie ma historii"). Smoke #1 retry: vision now reaches sm(14,22) south edge in 9 steps (was 60-step drift west) but UNCLEAR x2 at sm(9,22) after westward drift past the warp tile. Recent-moves history broke the LEFT-loop; the cont-3 sm(14,22) deadlock still defeats vision-only. Savestate corruption flagged: /tmp/spec5-post-buy.savestate gets overwritten on every run regardless of buy success — the 4/4 fixture is lost. Quick fix proposed for cont-6. Recommended next-session direction: Approach B (PPU nametable read), fix savestate-dump regression, run Smoke #2 fresh-run for 4/4 buy baseline regression check.
|
Wrong target repo — Work seamlessly with GitHub from the command line. USAGE CORE COMMANDS GITHUB ACTIONS COMMANDS ALIAS COMMANDS ADDITIONAL COMMANDS HELP TOPICS FLAGS EXAMPLES LEARN MORE |
Summary
docs/superpowers/specs/2026-05-09-coneria-pipeline-design.md. Vision-LLM (WalkInteriorVision, Sonnet 4.6) is the sole hot-path exit;ExitTownEmpirical/tree-detour kept only as offline fallback whenoutfitNavigator==null. Phase 4 picks aGrindAnchorSelectoranchor on the closestGRASS/FORESToverworld tile (south+east tie-break) and runsGrindLoopwith up to 2 re-anchors onNoEncounter.weaponsBought=4 totalGoldSpent=25atboot_outfit_summary). The spec § Regression protection invariants (BuyAtShopcap=120,SYSTEM_SHOP_PURCHASEprompt, savestate warm-up,OutfitBootPhaseentry guards,NameTable.stateSave) are all untouched.OutfitBootPhaseTestgreen throughout.sm(14,22)Coneria south-edge deadlock withUNCLEAR x2after 12–14 steps. Recent-moves history (cont-5) broke the LEFT-LEFT-LEFT loop from cont-4 (60-step drift west → 9-step reach to south edge), but the model still drifts west past the warp tile. Approach B (PPU nametable read for town overlay tile data) is the cont-6 direction.Files changed
docs/superpowers/specs/2026-05-09-coneria-pipeline-design.mddocs/superpowers/plans/2026-05-09-coneria-pipeline.mdHANDOFF.mdknes-agent/src/main/kotlin/knes/agent/runtime/GrindAnchorSelector.kt(+test)knes-agent/src/main/kotlin/knes/agent/runtime/AgentSession.ktrunOutfitBootPhasePhase 3+4 rework, 5 new trace markersknes-agent/src/main/kotlin/knes/agent/perception/VisionInteriorNavigator.ktknes-agent/src/main/kotlin/knes/agent/skills/WalkInteriorVision.ktknes-agent/src/main/kotlin/knes/agent/skills/GrindLoop.ktgrind_encounter_byte_deadWARNknes-agent/src/main/kotlin/knes/agent/runtime/AgentScratchpad.kt(new) +Main.ktknes-agent/src/main/kotlin/knes/agent/skills/ExitInterior.kt+ExitTownEmpirical.kt(new)Test plan
./gradlew :knes-agent:test— 353 unit + 3 inSavestateRoundtripDebug. 5 new tests inGrindAnchorSelectorTest, all green. 3 pre-existing failures (Coneria8VisualDiffTest,ConeriaTownEmpiricalDiscoveryTest,ExploreOverworldFrontierTest) unchanged from baseline — out of scope../gradlew :knes-agent:compileKotlin— BUILD SUCCESSFUL.OutfitBootPhaseTestgreen — Phase 2 entry-guard regression baseline preserved.3311a4b, full pipeline):KNES_VISION=gemini-pro ./gradlew :knes-agent:run --args="--rom=$PWD/roms/ff.nes --wall-clock-cap-seconds=900 --cost-cap-usd=3.0". Result:weaponsBought=4 totalGoldSpent=25, char1 BOUGHT iter=8, char2 BOUGHT iter=20, char3 BOUGHT iter=33, char4 BOUGHT iter=45.boot_post_buy_savestate_dumped401665 bytes. Phase 1+2 baseline ✅.2371e6eand3311a4b, both runs):boot_phase3_exit_result: ok=false msg=vision returned UNCLEAR 2x in a row after 12–14 steps. Phase 3 fails per documented Risk Initial Support for Kotlin Compilation #1 — reproducible, deterministic. Per-iter PNGs at/tmp/spec5-vision-exit-iter-NN-smX_Y-mfM.pngshow trajectory: shop counter →sm(14,22)south edge → drift west past warp → bail. Phase 4 not exercised.runOutfitBootPhaseoverwrites/tmp/spec5-post-buy.savestateregardless ofbought=N. Quick fix proposed for cont-6 (gate oncharsBought.size == expected).Honest assessment
This PR is mergeable for the regression-baseline reason: 4/4 buy is empirically preserved and all unit tests are green. It does NOT deliver a working end-to-end Coneria pipeline — that requires cont-6 (Approach B). The architectural work (vision-only hot path, anchor selection, re-anchor, trace markers, per-iter diagnostics, recent-moves history) is the foundation cont-6 will build on. The cont-3 sm(14,22) deadlock is genuinely beyond pure vision-LLM reach without map-aware navigation.