Skip to content

feat(ff1): Coneria post-buy exit + grind pipeline (Spec 5 cont-4/5)#16

Closed
ArturSkowronski wants to merge 648 commits into
bfirsh:masterfrom
ArturSkowronski:ff1-buy-and-equip-coneria
Closed

feat(ff1): Coneria post-buy exit + grind pipeline (Spec 5 cont-4/5)#16
ArturSkowronski wants to merge 648 commits into
bfirsh:masterfrom
ArturSkowronski:ff1-buy-and-equip-coneria

Conversation

@ArturSkowronski
Copy link
Copy Markdown

Summary

  • Phase 3 (exit) + Phase 4 (grind) pipeline rework post-buy in Coneria, per docs/superpowers/specs/2026-05-09-coneria-pipeline-design.md. Vision-LLM (WalkInteriorVision, Sonnet 4.6) is the sole hot-path exit; ExitTownEmpirical/tree-detour kept only as offline fallback when outfitNavigator==null. Phase 4 picks a GrindAnchorSelector anchor on the closest GRASS/FOREST overworld tile (south+east tie-break) and runs GrindLoop with up to 2 re-anchors on NoEncounter.
  • Phase 1+2 (4/4 weapon buy) regression baseline preserved — empirically validated this PR's HEAD with a fresh-run smoke (weaponsBought=4 totalGoldSpent=25 at boot_outfit_summary). The spec § Regression protection invariants (BuyAtShop cap=120, SYSTEM_SHOP_PURCHASE prompt, savestate warm-up, OutfitBootPhase entry guards, NameTable.stateSave) are all untouched. OutfitBootPhaseTest green throughout.
  • Phase 3 still fails empirically per documented Risk Initial Support for Kotlin Compilation #1 — vision-LLM consistently bails on the cont-3 sm(14,22) Coneria south-edge deadlock with UNCLEAR x2 after 12–14 steps. Recent-moves history (cont-5) broke the LEFT-LEFT-LEFT loop from cont-4 (60-step drift west → 9-step reach to south edge), but the model still drifts west past the warp tile. Approach B (PPU nametable read for town overlay tile data) is the cont-6 direction.

Files changed

File What
docs/superpowers/specs/2026-05-09-coneria-pipeline-design.md New design doc + § Regression protection
docs/superpowers/plans/2026-05-09-coneria-pipeline.md 7-task TDD-shaped implementation plan
HANDOFF.md cont-4 + cont-5 progress notes
knes-agent/src/main/kotlin/knes/agent/runtime/GrindAnchorSelector.kt (+test) Pure helper, 5 unit tests
knes-agent/src/main/kotlin/knes/agent/runtime/AgentSession.kt runOutfitBootPhase Phase 3+4 rework, 5 new trace markers
knes-agent/src/main/kotlin/knes/agent/perception/VisionInteriorNavigator.kt POST-SHOP TOWN EXIT prompt, RECENT MOVES instruction, model-ID fix (claude-sonnet-4-6)
knes-agent/src/main/kotlin/knes/agent/skills/WalkInteriorVision.kt Per-iter PNG dump + 8-entry recent-moves rolling buffer
knes-agent/src/main/kotlin/knes/agent/skills/GrindLoop.kt encounterCounter delta log + grind_encounter_byte_dead WARN
knes-agent/src/main/kotlin/knes/agent/runtime/AgentScratchpad.kt (new) + Main.kt Sister-file scratchpad load/persist alongside savestate (cont-3 checkpoint)
knes-agent/src/main/kotlin/knes/agent/skills/ExitInterior.kt + ExitTownEmpirical.kt (new) Offline fallback path (cont-3 checkpoint), no longer on hot path

Test plan

  • ./gradlew :knes-agent:test — 353 unit + 3 in SavestateRoundtripDebug. 5 new tests in GrindAnchorSelectorTest, all green. 3 pre-existing failures (Coneria8VisualDiffTest, ConeriaTownEmpiricalDiscoveryTest, ExploreOverworldFrontierTest) unchanged from baseline — out of scope.
  • ./gradlew :knes-agent:compileKotlin — BUILD SUCCESSFUL.
  • OutfitBootPhaseTest green — Phase 2 entry-guard regression baseline preserved.
  • Smoke Create Abstraction over Applet Rendering #2 fresh-run (this PR's HEAD 3311a4b, full pipeline): KNES_VISION=gemini-pro ./gradlew :knes-agent:run --args="--rom=$PWD/roms/ff.nes --wall-clock-cap-seconds=900 --cost-cap-usd=3.0". Result: weaponsBought=4 totalGoldSpent=25, char1 BOUGHT iter=8, char2 BOUGHT iter=20, char3 BOUGHT iter=33, char4 BOUGHT iter=45. boot_post_buy_savestate_dumped 401665 bytes. Phase 1+2 baseline ✅.
  • Smoke Initial Support for Kotlin Compilation #1 savestate-load (HEAD 2371e6e and 3311a4b, both runs): boot_phase3_exit_result: ok=false msg=vision returned UNCLEAR 2x in a row after 12–14 steps. Phase 3 fails per documented Risk Initial Support for Kotlin Compilation #1 — reproducible, deterministic. Per-iter PNGs at /tmp/spec5-vision-exit-iter-NN-smX_Y-mfM.png show trajectory: shop counter → sm(14,22) south edge → drift west past warp → bail. Phase 4 not exercised.
  • Phase 3 reliable exit — out of scope, cont-6 (Approach B: PPU nametable read for town overlay).
  • Phase 4 grind battle empirically validated — blocked on Phase 3.
  • Savestate-dump regression: runOutfitBootPhase overwrites /tmp/spec5-post-buy.savestate regardless of bought=N. Quick fix proposed for cont-6 (gate on charsBought.size == expected).

Honest assessment

This PR is mergeable for the regression-baseline reason: 4/4 buy is empirically preserved and all unit tests are green. It does NOT deliver a working end-to-end Coneria pipeline — that requires cont-6 (Approach B). The architectural work (vision-only hot path, anchor selection, re-anchor, trace markers, per-iter diagnostics, recent-moves history) is the foundation cont-6 will build on. The cont-3 sm(14,22) deadlock is genuinely beyond pure vision-LLM reach without map-aware navigation.

V2.5.7 evidence: pathfinder returned found=false from (140,152) → (140,145)
in live runs, but offline against the same ROM-decoded map it finds a 7-step
path. Cause: FogOfWar was poisoned. Earlier walk attempts where worldX/Y
didn't change (because party transitioned into a town/castle map, or because
a modal was up) marked the destination tile blocked — but those tiles are
passable; only the engine prevented the move because of state, not terrain.

Fix: in WalkOverworldTo, only call fog.markBlocked when worldX/Y unchanged
AND no RAM transition signal (locationType/localX/localY/screenState all
identical to pre-step). Belt-and-suspenders with V2.5.2 abort.

Also adds OverworldDumpTest (disabled-by-default diagnostic) which dumps the
ROM-decoded overworld grid + runs the pathfinder on canonical from→to pairs.
Used to verify the live failure was fog-poisoning, not pathfinder bug.
…ompts

Three connected fixes informed by V2.5.7/V2.5.9 evidence + offline ROM
inspection:

1) Garland is the BOSS of Chaos Shrine (Temple of Fiends) — an interior
   dungeon — NOT a scripted bridge encounter. Removed the false claim that
   walking north triggers Garland; replaced with the actual workflow
   (overworld → shrine entry → exitInterior loops → boss room).

2) Spawn after pressStartUntilOverworld is Overworld(146, 158), not
   Indoors. V2.5.7 RAM-override classifies this correctly; the prompt now
   tells the LLM not to expect Indoors at the start.

3) V2.5.4 hard-impassable rule for TOWN/CASTLE: tells LLM that to enter a
   town/castle the EXACT tile must be the walkOverworldTo target.

Also clarifies that battleFightAll handles BATTLE+POSTBATTLE in one call.
…p BFS

V2.6.1 — overworld classifier coverage gap.
ROM byte probe around Coneria castle/town entries identified six tile
bytes missing from OverworldTileClassifier:
  - 0x2C, 0x2E, 0x3B, 0x3C, 0x3E, 0x3F  → CASTLE (castle border/decoration)
  - 0x3D                                 → TOWN (town center variant)
  - 0x4F                                 → TOWN (continuation of 0x49..0x4E)
Previously these were UNKNOWN (impassable, but pathfinder treated them as
walls rather than enter-on-arrival town/castle tiles). Now V2.5.4 hard-
impassable rule blocks pathfinder from routing across the castle perimeter
unless the destination is the castle/town tile itself.

V2.6.2 — interior pathfinder full-map BFS (originally V2.4.6-A in STATE).
V2.6.0 evidence: agent reached mapId=24 sub-map at (3, 2) and ExitInterior
looped 7 turns with `findPathToExit` returning BLOCKED — exits were
outside the 16×16 viewport window. New InteriorMap.readFullMapView(64×64)
plus MapSession.readFullMapView() drive the same InteriorPathfinder over
the whole map; ExitInterior and the findPathToExit @tool both switched.
Advisor's 16×16 ASCII rendering unchanged (keeps prompt size bounded).
…ection

V2.6.2 evidence: ExitInterior returned BLOCKED for mapId=8 with party at
(5, 28) despite the full 64×64 BFS. V2.4.4 with the old 16×16 viewport
had found a 17-step exit path from the same position.

Cause: isSouthEdgeExit required EVERY column tile from ly+1 to viewport.height
to be impassable. With 16×16 viewport this naturally bounded the probe to
~7 tiles south of party — a reasonable proxy for "edge of playable area".
With 64×64 full-map the probe extended into outside-playable rows that
default-fill to 0x00 (GRASS, passable), so the predicate always returned
false → no south-edge exit ever detected.

Fix: SOUTH_EDGE_PROBE_DEPTH = 8. Restores 16×16-equivalent semantics
while keeping V2.6.2 global BFS reachability.
…int bug

Diagnostic test confirming why ExitInterior fails in mapId=24 with party
at (3, 2): the byte value at that position in our InteriorMap-decoded
data is 0x3c (padding/UNKNOWN/impassable), as are all 4-neighbours. BFS
starts on an impassable tile and immediately exhausts.

Either (a) RAM currentMapId=24 doesn't directly index our InteriorMap
loader, or (b) FF1 sub-map localX/localY are scrolled / offset. Out of
V2.x scope; documented for the next milestone.
…blocker

State doc supersedes STATE-2026-05-03.md (V2.4.5 era). Captures the
session's architectural progress: vision classifier, full-map BFS
(overworld + interior), hard-impassable TOWN/CASTLE, RAM override,
fog defensive marking, classifier coverage, prompt corrections,
south-edge probe depth.

Includes accumulated trace/SUMMARY artifacts for each V2.5.x/V2.6.x
live run.

Next blocker (out of V2.6 scope): RAM currentMapId=24 doesn't index
InteriorMapLoader.load(24) cleanly — party lands on 0x3c padding in
our decoded data. Likely scrolled-coords or transformed mapId; needs
RE work next milestone.
…oll + (8, 7)

V2.6.3 evidence: party reached Indoors(mapId=24, localX=3, localY=2). The
ROM-decoded InteriorMap[24] has byte 0x3c (padding) at (3, 2). All
neighbours also padding. BFS exhausted → ExitInterior BLOCKED.

Hypothesis verified by full 64×64 dump: mapId=24 is Coneria Castle
interior with throne room at y=2-5, vertical corridor at x=9-15 y=8-26,
entrance hall at y=27-31. Party RAM (3, 2) + (8, 7) = (11, 9) — the
throne-room corridor floor (byte 0x31, passable). Same offset applied to
RAM (3, 11) → (11, 18) — also corridor floor. Pattern matches a party
walking 9 tiles north through the main hall.

Conclusion: localX/localY in RAM (0x0029/0x002A) is the scroll offset
(top-left of the 16×15 NES viewport), not the party's tile. Party stays
at screen center, so its actual map tile = scroll + (8, 7).

ExitInterior + findPathToExit @tool now translate scroll → party tile
before BFS. Marks-blocked / stairs A-tap also operate on party tile.
Overworld code path unchanged (worldX/worldY are absolute party tiles).
…cation

V2.6.4 evidence: party rotated through mapId=8 from RAM (5, 28) to (4, 11)
— first interior progression in V2.x — but then stalled. findPathToExit
returned a 17-step path to (15, 32); ExitInterior couldn't execute it.
Each step pressed a direction but party didn't move. Possible causes:
FRAMES_PER_TILE too short, NPC blocking unobserved tile, input controller
stuck, or coord-arithmetic drift.

Per-step trace records (from, dir, after, mapId, pathLen) so the next
post-mortem can see exactly which steps moved party and which didn't,
without guessing from the executor's prose.
Saves a focused handoff for the next session. Key open question: is RAM
localX/localY (0x0029/0x002A) the party tile or the scroll offset?
V2.6.4 assumed scroll offset, V2.6.5 trace shows partial movement (76/583
moves) plus W-direction → Y-coord-change inconsistencies. Could be NPC
push (mapId=8 is Coneria Town outdoor with random NPCs) or wrong
hypothesis. Disabled mapId=8 dump test added; enable to verify.
… decoded)

Ran the dump test before session end. Result: NEITHER raw nor scroll-
offset interpretation consistently matches both observed party positions
in mapId=8.

  RAM (5, 28) raw → 0x31 floor passable ✓ but scroll (13, 35) → wall ✗
  RAM (4, 11) raw → 0x30 wall ✗ but scroll (12, 18) → 0x44 STAIRS ✓

Each interpretation works for exactly one position. Theory C now leading:
our InteriorMapLoader.load(8) decodes a map that may not be the same one
the game is playing. RAM 0x0048 (currentMapId) might not be sole indicator
— FF1 may have a sub-map ID byte we don't track. 13% step-success rate
(76/583 in V2.6.5 trace) is consistent with "BFS through partially-correct
data, party walks by coincidence".

Updated handoff doc with new task list: find sub-map ID, capture screenshot
at first Indoors frame, verify loader pointer table.
… doc

Task 0 of V3.0 vision-first interior pivot. Three FF1 interior screenshots
(Coneria Castle throne room, mapId=24) sent to claude-sonnet-4-6 with the
draft navigator prompt. Vision identified specific FF1 elements (throne,
king, pillars, stairs) and produced well-formed JSON. 2/3 directional
picks strategically correct; one mistook the throne dais for an exit door
— addressable by enriching the prompt for Task 1.

Verdict: GO. ~\$0.015 per probe.

Includes:
- DECISION doc explaining pivot rationale (datacrystal verification, CPP
  reference architecture).
- Full V3.0 plan (Task 0–7, TDD-style with acceptance criteria).
- Feasibility test source (disabled by default).
…nthropic API)

Sonnet 4.6 picks one cardinal direction per step from an FF1 interior
screenshot. Bypasses Koog (mirroring AnthropicVisionPhaseClassifier) so
the outer agent loop stays unchanged.

Prompt design draws on Task 0 feasibility findings: explicitly excludes
thrones/treasure/dais/shop counters as exits, biases toward south
pillar-corridors typical of FF1 castle layouts, and supports an
optional entryDirection hint to steer the navigator toward the way the
party came in.

8 parser tests (cardinal letters, EXIT/STUCK, code-fenced JSON, malformed
envelopes) — all pass.
…erify)

Loop: screenshot → navigator.nextDirection → tap button → check RAM moved
or transitioned. Emits per-step trace via ToolCallLog so we can compute
step-success rate from a single run, the way V2.6.5 did for the decoder.

Termination: encounter, transition to overworld, vision STUCK/UNCLEAR,
or maxSteps. lastBlocked direction is fed back to the navigator so it
won't repeat a physically failed move.

Skipping a dedicated unit test for slice 1 — EmulatorToolset is a
concrete class, mocking would require new test infrastructure. Live
run in Task 7 will validate end-to-end.
…Interior on towns

Tool list now leads with walkInteriorVision in the Indoors block.
exitInterior + findPathToExit kept callable but marked DEPRECATED in
their @LLMDescription so the executor's prompt construction (Koog
auto-includes tool descriptions) carries the steer.

Navigator parameter is nullable so existing tests that build SkillRegistry
without ANTHROPIC_API_KEY keep compiling. When unconfigured, the tool
returns a clear failure message rather than crashing.
Indoors block in both prompts now leads with walkInteriorVision and frames
exitInterior/findPathToExit as deprecated fallbacks. Advisor explicitly
told NOT to propose target localX/localY in interiors (decoder unreliable);
overworld waypoints unchanged.
…orAgent

Main constructs the navigator from ANTHROPIC_API_KEY (already required)
and threads it through ExecutorAgent → SkillRegistry → WalkInteriorVision.
Optional in the type system so unit tests that build a partial
ExecutorAgent without a key still compile.

Slice 1 implementation now complete; one live run (Task 7) verifies the
≥50% step success criterion.
…ase-classifier bug exposed

OutOfBudget after 12m 42s, ~26 walkInteriorVision invocations. Skill +
navigator wiring confirmed correct: 30 navigator responses parsed without
UNCLEAR; skill loop terminated cleanly each time; mechanical taps moved
party 3/4 times when fired.

Run never reached a real interior because of a V2.5 phase-classifier bug:
when locType=0 && currentMapId=0 (overworld) but localX/Y carry stale
values from a prior abort, vision phase classifier (Haiku) misclassifies
the frame as INDOORS. Vision navigator (Sonnet) correctly disagrees with
EXIT — but classifier wins upstream and agent oscillates.

DoD criteria 4 + 5 NOT MEASURED (no real interior reached). Open V3.1
to extend RAM hard-override (locType==0 && mapId==0 → Overworld)
before re-running V3.0 evidence.
…verworld)

V3.0 slice 1 live run revealed a phase-classifier oscillation: after
walkOverworldTo's interior-abort signal, FF1 leaves RAM localX/Y non-zero
on the overworld (datacrystal RAM map: 0x0029/0x002A is 'Non-world map
position' and FF1 doesn't zero it on overworld entry). The strict
V2.5.7 override (locType=0 && lx=0 && ly=0) missed this case so the
Haiku vision phase classifier was consulted, mistook the frame for an
indoor space and the agent looped between false-Indoors and Overworld
until OutOfBudget — never reaching a real interior.

currentMapId=0 means 'no interior loaded'. Combined with locType=0 and
live world coords, party is unambiguously on the world map. Adding this
secondary override prevents the oscillation without weakening the
existing strict path.

Sets up V3.0 re-run that can actually measure interior step success.
…avigator over-caution exposed

V3.1 RAM hard-override unblocked phase classification: agent reached
Indoors(mapId=8, localX=5, localY=28) for the first time in the V2.4–V3.0
lineage without oscillation.

But navigator returned STUCK on 15 of 23 (65%) direction queries inside
Coneria Town — only 4 mechanical taps fired, 1 moved (25% step success).
DoD criteria not met (need 50% on >n=4; need at least one transitioned=true).

Diagnosis: navigator prompt was tuned in Task 0 for castle interiors
(throne dais ≠ exit) but Coneria Town is an OUTDOOR town map with paths
between shops + NPCs. The same prompt over-classifies winding paths as
STUCK when they are walkable.

Open V3.2: town-aware prompt tuning + STUCK threshold (don't honor on
step=0; default to a cardinal until at least 2 failed taps).

Combined session spend: ~$0.95.
…fallback

V3.1 verification revealed navigator returned STUCK on 65% of queries in
Coneria Town (mapId=8). Two fixes:

1. Navigator prompt: explicitly distinguish TOWN (open path-network with
   shops/NPCs walkable AROUND), CASTLE (corridor toward south), DUNGEON
   (stairs/warps). Emphasises 'STUCK ONLY if all 4 cardinals impassable —
   if even one direction shows walkable terrain, pick that'.

2. WalkInteriorVision: do not honor the FIRST STUCK return when there
   is no movement evidence yet. Default to a perpendicular fallback
   (SOUTH unless lastBlocked, then rotate). Honor STUCK only after 2
   consecutive returns. ToolCallLog records when fallback fires.

These two levers are aimed at the V3.1 measurement gap: 4 mechanical
taps was too few. With STUCK-on-step-0 disarmed, we expect ~3-4× more
taps per skill invocation, giving statistically meaningful step-success
data on the next run.
…rmed for FF1 towns

20 invocations, 70 vision direction calls, 51 mechanical taps. Step
success **7.8%** — WORSE than V2.6.5 decoder baseline (13%) and below
slice-1 DoD (50%). Trade was: V3.2 prompt eliminated 65% STUCK rate
(V3.1) but introduced 27% false-EXIT rate and oscillation between two
adjacent tiles in Coneria Town for the entire skill phase.

Honest diagnosis: single-frame zero-context vision is the wrong tool
for tile-precise FF1 navigation. Failure modes documented in SUMMARY:
no movement memory across calls, pixel-tile collision mismatch,
two-cardinal traps, and a single prompt knob that swings between
cautious-STUCK and eager-wrong loops.

V3.0 hypothesis (vision-first interior) DISCONFIRMED as architected.

Cumulative session spend: ~$1.50 across feasibility + 3 live runs.

Recommended next: ship slice 1 + V3.1 as PR (clean stop), then iterate
with hybrid C (decoder + advisor screenshots) or multimodal executor.
No more vision-prompt epicycles.
…ndoors stuck)

After V3.2 disconfirmed single-frame vision-first interior nav, return
to decoder as primary baseline (13% step success on Coneria Town) and
use vision selectively via the advisor.

Four small changes:

1. ScreenshotPolicy: attach screenshot to advisor on every Indoors call,
   not just phase change. Lets the advisor look at the actual frame when
   the decoder is making no progress.

2. AgentSession: lower advisor consult threshold for Indoors phase from
   idleTurns>=20 to >=5. The decoder either makes early progress or gets
   pinned — early visual hint is cheaper than 20 wasted executor turns.

3. Executor prompt: exitInterior is PRIMARY again. walkInteriorVision is
   ESCALATION ('only after exitInterior fails twice AND advisor explicitly
   recommends'). V3.0 evidence cited (~8% on towns vs 13% decoder).

4. Advisor prompt: when called with 'stuck in interior' reason, INSPECT
   the screenshot and emit a single cardinal hint as a plan step. Only
   escalate to walkInteriorVision after two such hints fail.

~50 LOC total. Tests stay green; live run will measure whether the
hybrid lifts step success above the 13% decoder floor.
…ow skill layer

Decoder (primary) + vision (advisor-driven escalation) tested on Coneria
Town (mapId=8). Combined step success 4.9% on n=793 — below V2.6.5
baseline (13%) and below V3.2 vision-only (7.8%).

Two surprises in the data:
- Decoder degraded vs V2.6.5 baseline (4.6% on n=764 in V4 vs 13% on
  n=583 in V2.6.5). V2.6.5 may have been a favourable-condition fluke.
- Vision-with-advisor outperformed pure vision (13.8% vs 7.8%) but on
  small N=29 — encouraging but not conclusive.

Two negative results in a row (V3.0/V3.2 vision-first AND V4 hybrid)
with N now decisive (793 steps). The hypothesis that better skill
orchestration unlocks town traversal is disconfirmed. Bottleneck is
likely below the skill layer:
- NPC blocks marked permanently in fog (no clear-on-revisit)
- Animation/walk-state timing during the 48-frame hold
- Diagonal corners requiring multi-tap navigation
- Possible RAM coord interpretation still wrong (V2.6.4 hypothesis)

Pushing back to PR #99 with two evidence runs (V3.2 + V4) showing the
search/architecture lever does not produce >15% step success on Coneria
Town. Future work moves to V5: movement primitive audit before any
more strategy iteration.

Cumulative session spend: ~$2.00 (4 live runs).
Per-frame RAM capture during 200-frame DOWN/UP holds inside Coneria
peninsula. Key findings:

1. **localY is party tile, NOT scroll offset.** DOWN hold in
   mapId=24 (Coneria Castle) produced clean monotonic increments:
   locY 0→1→2→...→11 in 200 frames, ~16 frames per tile. V2.6.4
   scroll-offset hypothesis is now decisively disconfirmed by
   ground-truth telemetry.

2. **Movement primitive is clean.** When the decoded map is correct
   (mapId=24), the underlying NES input + collision works fine.
   Our 48-frame-per-tile skill setting is 3× more than needed for
   castles (~16 frames suffices).

3. **Sub-map transitions = mid-frame mapId flips + 4-frame
   localY-spaghetti.** Walking off a map edge changes mapId and
   re-centres the camera. RAM during the transition is not stable.

4. **The real bottleneck is mapId=8 (Coneria Town) decoder.**
   Re-confirms V2.6.5 Theory C for the third time, now with positive
   counter-evidence: castles work, towns don't, primitive is the
   same. The bug is in InteriorMapLoader.load(8) decoding a
   different ROM section than the game plays.

Implication: V3/V4 search/architecture lever is fully exhausted. The
only remaining productive direction is fixing the decoder for mapId=8
— exactly what V2.6.6 was about to investigate before the V3 pivot.

Three candidate next steps documented in the notes; recommend B
(visual diff between decoded mapId=8 ASCII and a rendered Coneria Town
screenshot) as cheapest research move.

Test disabled by default; CSV artefacts retained as evidence.
…led, navigation TODO)

Test scaffold for the V5.1 visual diff between live mapId=8 frame and
the offline decoder's ASCII glyph dump. Goal: identify the fingerprint
of the InteriorMapLoader.load(8) bug (off-by-one? wrong bank? wrong
table?).

Disabled by default — the navigation logic to reliably reach mapId=8
from spawn is not yet solved. The agent enters mapId=24 (Coneria
Castle) easily but mapId=8 (Coneria Town) requires more deliberate
overworld pathing (V2.6.5 evidence: walk to (145,152) on overworld,
step S to transition).

Logic for the diff itself is complete (live screenshot + ASCII viewport
+ full 64x64 dump centred on party). Once the navigation step lands,
flip enabled=false to enabled=canRun and run the test to produce the
diff artefacts.

Marked as a research TODO. The downstream V5.2 fix will use this diff
to decide between (A) hex-audit pointer table, (B) wrong-bank-fix, or
(C) replace decoder with screen-derived map.
… handling

CI was failing on two pre-existing knes-debug tests that predate this PR:
- 'canExecute checks screenState' expected 0x63 → false, but V2.4.3 made
  the skill dismiss PostBattle so 0x63 → true is the correct behaviour.
- 'BattleFightAll executes correctly' mock returned 0x63 forever after
  state call 4, but the post-battle dismissal loop now requires the
  state to eventually clear (0x00) — otherwise the action correctly
  reports failure.

Updates both expectations to reflect V2.4.3 semantics. Pre-existing
test/code mismatch from May 2 — not introduced by V3-V5 work.

Full suite green: ./gradlew test BUILD SUCCESSFUL.
FF1 agent V2.4→V3.2 — interior decoder + vision-first nav (negative result, infra ships)
… empirical OW probe

Five-component scaffold to unblock the V5.2 visual-diff workstream and the
broader town/castle navigation problem. Each piece is independently usable.

1. EmulatorSession.saveState/loadState — wraps vNES NES.stateSave/stateLoad
   into an in-memory ByteArray API. Round-trip RAM-perfect (verified by
   SaveStateRoundTripTest). Enables fixture-based test starts that skip 10s
   of boot per run.

2. OverworldMemory — persistent (~/.knes/ff1-ow-memory.json) per-tile
   observation store: ENTRY/DECOR/UNREACHABLE + enteredMapId + confirmCount.
   Accumulates across sessions so empirical walkability data survives ROM
   resets, like a player's mental map.

3. ExecutorAgent.goalOverride + AdvisorAgent.goalOverride — string-replace
   the canonical "Goal: AtGarlandBattle" paragraph in each agent's system
   prompt without forking. Tests inject Coneria-Town goal for fixture
   capture without touching production prompts. GoalOverrideTest verifies
   the swap is clean (no leftover Garland references).

4. AgentSession.onTurnEnd callback — non-breaking optional hook firing
   after each executor turn with current phase + RAM. Returning a non-null
   Outcome short-circuits the agent loop. Used by fixture-builder tests
   to stop the moment a target state is reached.

5. ConeriaTownEmpiricalDiscoveryTest — H2 raw-step DFS exploration. From
   spawn, taps cardinals one tile at a time, observes RAM (worldX/Y change
   = walkable, locType change = entry). Bypasses BFS and OverworldTileClassifier
   entirely; engine's own walkability check is source of truth.

Empirical finding from H2 run: 81 walkable OW tiles around spawn covering
x=137-152, y=150-167 — no tile triggered locationType != 0 anywhere in
range. Strongly suggests one of:
  (a) OverworldTileClassifier byte-id ranges for TOWN are wrong on this ROM
  (b) currentMapId/locationType RAM addresses (V2.4 heuristic) read from
      non-canonical locations and miss real interior transitions
  (c) Coneria Town entry is at world coords beyond explored range (y > 167)

Next session focus: verify FF1 RAM mapping for $000D (locationType per
datacrystal) and currentMapId. Once RAM is trustworthy, H2 should
auto-discover the entry tile within minutes of re-running on broader radius.

Coneria8VisualDiffTest now loads the post-boot fixture (saves ~10s) but
still fails on the navigation step pending root-cause fix above.

Files:
  - knes-emulator-session: saveState/loadState API
  - knes-agent runtime: AgentSession.onTurnEnd
  - knes-agent executor/advisor: goalOverride + GOAL_PARAGRAPH constants
  - knes-agent perception: OverworldMemory persistent store
  - knes-agent tests: SaveStateRoundTrip, PostBootFixtureBuilder,
    ConeriaTownFixtureBuilder (agent-driven), ConeriaTownEmpiricalDiscovery
    (raw-step DFS), GoalOverrideTest
  - fixtures: ff1-post-boot.{savestate,json,png}

Refs: V3.0 status memory ("13% step success on towns"), Entroper FF1
disasm bank_0F.asm:1633 (tileset_prop teleport mechanism).
… finds TOWN mode

MapIdDiscoveryTest dumps zero page on overworld vs after entering Coneria Town
(via raw N×6 / W×1 / UP from spawn 146,158). Diff reveals:

- $0048 IS canonical map-id byte (was V2.4 heuristic — confirmed correct).
  Coneria Town = 8.
- FF1 has THREE location modes, not two:
  * Overworld:  locType=0x00, world=valid, local=(0,0)
  * Town:       locType=0x00, world=frozen, local=non-zero, $48=town-id
  * Castle/Dgn: locType=0xD1, world=frozen, local=non-zero, $48=interior-id
- $0029/$002A are canonical local tile coords (+1 per tap).
- $0068/$0069 = local + 7 (scroll-offset display coords) — explains V2.6.4
  scroll-offset hypothesis.
- $0049/$004A are paired with $48 (sub-floor / metadata).

Interior savestate captured for cross-reference. Report and 20 candidate bytes
documented at docs/superpowers/notes/2026-05-04-mapid-discovery/report.md.

Smoking gun for V2.6.x stuck-in-Coneria evidence: vision said "castle courtyard
not town huts" because InteriorMapLoader.POINTER_TABLE_OFFSET=0x10010 is the
CASTLE/DUNGEON pointer table — towns need a different table.
Run #10 (2026-05-09): post_enter_detect at turn 33 said open=true
kind=weapon, but the new batch-pre probe at turn 35 saw open=false
(Gemini classifyShopMenu stochastic flip on identical screen). The
condition `!initialProbe.open || initialProbe.kind != "weapon"`
fired reEngageKeeper, which walked dx=-1 dy=-2 — cardinals which,
with the shop menu OPEN, navigate the menu cursor instead of the
party. Subsequent invokeMany Up×2 + A landed on a wrong row, and
all 4 pairs failed WrongClass.

Fix: when the post-enter run{} block already set menuAlreadyOpen=true,
trust it and skip the redundant batch probe. Only reengage when the
post-enter said the shop UI was closed.
Run #11 (2026-05-09): char1 successfully bought via invokeMany, but
char2/3/4 all WrongClass. Trace + analysis: between-pairs B × 3 was
unwinding past BUY/SELL/EXIT and closing the shop dialog entirely.

FF1 NES weapon shop B-tap semantics:
  - From item list:           B → BUY/SELL/EXIT, B again → close shop
  - From "another?" prompt:   B → item list,    B → BUY/SELL/EXIT, B → close

Single B from either dismiss-loop end state lands in or above
BUY/SELL/EXIT, where Up × 2 + A reliably re-selects BUY for the
next pair. Bumped 3 → 1.
Run #12 (2026-05-09): post_enter saw open=false (Gemini stochastic
misread on what was actually an open shop dialog). run{} block fired
keeper_approach walk; cardinals navigated the menu cursor (not party,
because dialog was actually open). Then batch-pre reengage walked
again, further scrambling the cursor. invokeMany then ran with
menuAlreadyOpen=false but state was corrupt → 4/4 WrongClass.

Fix: drop the "trust post-enter" shortcut and instead probe shop UI
state ONCE after the run{} block completes (whatever it did — walk
or skip). Trust THAT probe. If open → skip reengage. If closed →
one reengage attempt, probe again, accept either result. Hard cap
at 1 reengage walk avoids the cumulative cursor drift.
Runs #10-#13 (2026-05-09) showed the deterministic "guess by tap
count" approach is brittle — Gemini's classifyShopMenu is stochastic
(post-enter says open, fresh-probe says closed on identical screen),
which led to double-walks corrupting menu cursor; B-spam counts
between pairs were unreliable across FF1 sub-menus; and char1 typically
succeeded but char2/3/4 hit WrongClass / DismissCapExhausted because
state assumptions drifted.

New approach: classify the FF1 shop sub-screen (MAIN_MENU /
ITEM_LIST / FOR_WHOM / BUY_CONFIRM / ANOTHER / WELCOME / CLOSED /
UNKNOWN) via vision after each major tap, and dispatch the next
action from observed UI rather than guessed sequence.

Changes:
- HaikuConsult: new ShopMenuPhase enum + classifyShopMenuPhase()
  abstract method. Implemented in GeminiVisionConsult with a focused
  prompt distinguishing the eight phases via JSON {"phase":"..."}.
  AnthropicHaikuConsult stubs to UNKNOWN (delegated to Gemini).
- BuyAtShop: new invokeManyStateful method.
  * Per pair: drive into MAIN_MENU (B-spam to back out, A to advance
    Welcome) → Up×2+A → ITEM_LIST → Down×itemSlot+A → FOR_WHOM →
    Down×(charSlot-1)+A → BUY_CONFIRM → A on YES → dismiss watch.
  * On CLOSED mid-batch: aborts remaining pairs, leaves caller to
    re-engage keeper between rounds.
- runOutfitBootPhase: switched from invokeMany to invokeManyStateful.

Cost: ~5-8 vision calls per pair (~$0.025-0.04). Worthwhile when
the previous deterministic version was buying only 1/4 chars per run.
…at post-enter

Run #15 (2026-05-09) trace: post_enter_detect said open=false (kind-
classifier stochastic flip), keeper_approach walked cardinals which —
because the menu was actually open — navigated the menu cursor instead
of the party. State machine first probe returned UNKNOWN, the UNKNOWN
handler tapped B which closed the shop entirely, and all 4 pairs got
ShopClosed.

Two-part fix:
- driveToMainMenu UNKNOWN handler: step+retry for first 2 attempts
  (give renderer a chance to settle, classifier a chance to recover).
  Tap B only after 3 consecutive UNKNOWN. Avoids accidentally closing
  the shop on a transient misclassification.
- runOutfitBootPhase post-enter: in addition to classifyShopMenu (kind
  detection — stochastic), also call classifyShopMenuPhase. Treat any
  non-CLOSED, non-UNKNOWN phase as "shop UI on screen" and skip the
  keeper_approach cardinals walk. Two independent classifiers must
  both flip simultaneously to mistakenly trigger the walk.
Run #16 (2026-05-09) screenshot inspection: post-enter frame shows
WEAPON banner + empty Welcome dialog + no BUY/SELL/EXIT yet — a
genuine transition frame between Welcome dismiss and item-list draw.
Phase classifier honestly returned CLOSED for that ambiguous state.
The previous UNKNOWN handler tapped B (closed shop), then state
machine bailed ShopClosed for all 4 pairs.

UNKNOWN handler upgrade:
  1st UNKNOWN → tap A (advance any stuck dialog page; if at MAIN_MENU
                misclassified, A → ITEM_LIST which the next probe sees)
  2nd UNKNOWN → step+retry only (let renderer settle)
  3rd+ UNKNOWN → tap B (last resort, may close shop)

Also: runOutfitBootPhase now trusts menuAlreadyOpen flag from the
run{} block (which already uses dual-classifier kind+phase check).
The redundant freshProbe was triggering reengage walks even when
menuAlreadyOpen=true via stochastic kind-classifier flip.
…ine)

User feedback after runs #15-#18: the deterministic state-machine /
phase-classifier approach is over-engineered and brittle. Cursor on
EXIT (run #18 evidence) was misclassified as UNKNOWN, the UNKNOWN
handler tapped A which confirmed Exit and closed the shop. Pivot to
vision-advisor that reads the actual screen state per step and decides
the next tap.

Changes:
- profiles/ff1.json: char1_class .. char4_class registers at
  0x6100/0x6140/0x6180/0x61C0 (FF1 disasm). Values 0..5 map to
  Fighter / Thief / BlackBelt / RedMage / WhiteMage / BlackMage.

- HaikuConsult: new ShopPurchaseAdvice + adviseShopPurchase() interface
  method. SYSTEM_SHOP_PURCHASE companion prompt teaches the advisor:
  the eight FF1 shop sub-screens (WELCOME, MAIN_MENU cursor on
  Buy/Sell/Exit, ITEM_LIST, FOR_WHOM, BUY_CONFIRM, ANOTHER, CLOSED),
  per-class equip rules, the typical Coneria weapon shop inventory,
  and what action drives toward the goal at each sub-screen.

- GeminiVisionConsult: implementation using Gemini 2.5 Pro thinking
  mode (1500 tokens) with maxOutputTokens=4000. Anthropic stub.

- BuyAtShop.invokeWithAdvisor: replaces the deterministic state machine.
  Loop: screenshot → adviseShopPurchase(context) → apply tap → repeat.
  Per-iter context lists each char's class + served status + remaining
  gold so the advisor can pick class-compatible items in any order.
  Per-char "bought" tracked via weapon-slot sum delta (any non-zero
  weapon ID in any of the 4 slots flips bought=true).

- runOutfitBootPhase: invokeWithAdvisor is the primary path. Legacy
  invokeManyStateful kept as fallback if advisor served zero chars.

Cost: ~30 advisor calls × $0.01-0.02 = $0.3-0.6 per run. More expensive
than tap-counts but expected to actually serve all 4 chars per run
instead of 1/4.
Run #19 (2026-05-09) trace: nav advisor reached BUY/SELL/EXIT at
iter 79 ("WEAPON shop menu with Buy/Sell/Exit is visible on screen"),
but the kind-classifier on the SAME screen returned null. The Done
verify rejected, advisor loop hit max-iters cap, $1.40 of advisor
cost wasted on a successful shop entry that the runtime threw away.

Phase classifier and kind classifier are independent Gemini calls on
the same screenshot. Either may stochastically miss. Done verify now
accepts the entry if EITHER says shop UI is on screen — kind=weapon
OR phase ∈ {MAIN_MENU, ITEM_LIST, FOR_WHOM, BUY_CONFIRM, ANOTHER,
WELCOME}. The wrong-shop kind=armor rejection still applies (that
branch only fires when kind is non-weapon non-null AND open=true).
Today's empirical session has burned multiple $1+ runs on Gemini API
flaps during nav (api-error, EOFException, request timeout). Code
changes downstream of nav are unverified because nav rarely completes.

Two-part savestate feature for dev iteration:
- AgentSession: when entered=true at advisor Done verify, dump emulator
  savestate to /tmp/spec5-shop-entered.savestate. Subsequent dev runs
  can load it and skip pre-boot pressStart + advisor navigation,
  jumping straight to in-shop purchase loop.
- Main.kt: KNES_FF1_LOAD_SAVESTATE=<path> env var. After ROM load,
  restore emulator state from the file. Logs success/failure to stderr.
- EmulatorToolset.session: was private, now public so AgentSession can
  call session.saveState() at the dump point.

Per autonomy_principle.md the agent plays the game (no manual
gameplay), but this is a dev-tool: the savestate is captured ONLY by
a successful agent-driven nav, not hand-recorded.
Run #21 (2026-05-09) breakthrough: vision-advisor purchase logic
works end-to-end. char1 (Fighter) and char2 (Thief) both got Small
Knife (5G each). char3 (BlackBelt) was mid-purchase of Wooden
Nunchuck when iter cap hit at 30. Class mapping correct, advisor's
class→item logic correct, FOR_WHOM cursor management correct.

Bumped to 50 to cover all 4 chars with headroom for the advisor's
state-recovery taps (e.g. when MAIN_MENU classified as ITEM_LIST
mid-transition, advisor occasionally needs an extra A or B).
Two bugs preventing savestate-based dev iteration on FF1 (MMC1):

1. MapperMMC1 inherited stateSave/stateLoad from MapperDefault, which
   only handles joypad strobe state. The MMC1 internal registers
   (shiftRegister, shiftCount, regControl, regCHR0/1, regPRG) and the
   bank-selection bookkeeping were NEVER serialized. cpuMemory bytes
   at $8000-$FFFF restored fine (they capture whichever PRG bank was
   live at save time), but the moment the restored CPU executed any
   bank-switch register write, the mapper computed the new bank from
   power-on defaults instead of the saved register values — corrupting
   PRG-ROM mapping mid-frame. Empirically this manifested as savestate
   loads "drifting" back to title screen / reset state for FF1 / Zelda
   / Metroid / Mega Man 2 (all MMC1).

   Added MMC1.stateSave: 6 ints after the base joypad header.
   Added MMC1.stateLoad: read 6 ints + re-run updateMirroring,
   updatePRGBanks, updateCHRBanks so bank pointers / CHR window / PPU
   mirroring match the restored register values.

2. MapperDefault.mapperInternalStateLoad and mapperInternalStateSave
   had their bodies SWAPPED — Load wrote to the buffer, Save read
   from it. This corrupted the joypad strobe portion of every
   savestate (and the MMC1 fix sits on top of this fix).
…rs; mapper fix pending validation

Today's continuation: 26 runs, ~\$10-12 API spend, 8+ commits.

Architectural wins:
- Vision-advisor purchase replaces brittle state machine (1ed46ba)
- Char classes added to ff1.json profile (1ed46ba)
- Class-aware item picking proven (run #21: Fighter→Knife, Thief→Knife)
- Savestate dump+load infrastructure (7ce67e6)
- MMC1 + MapperDefault savestate bugs identified and fixed (fb46d3c)

Empirically validated:
- 2/4 chars Bought reliably (runs #21, #24)
- Nav success ~30% (Gemini stochastic; lots of api-error / oscillation today)
- Class-aware item selection visible in trace

Open / next session:
- Bump cap 50→80 + improve FOR_WHOM cursor prompt → expect 4/4 buys
- Empirically validate MMC1 savestate fix (need post-fix successful nav)
- Replace EquipWeapon state machine with adviseEquip vision advisor
…hots

A leftover debug-print guard `if (tile[i] > 255)` had nested the
per-tile putByte loop body inside an effectively-never-true condition
— so saveState wrote ~0 bytes per nametable while loadState
unconditionally read width*height bytes, consuming bytes from the
following PPU fields and cascading corruption through the rest of the
snapshot.

Manifested as: savestate dumped mid-shop dialog, on reload RAM
restored fine (currentMapId/mapflags/smPlayer all matched the dump
point) but the framebuffer rendered overworld tiles or grayscale.
Caused MMC1 + non-MMC1 games alike — vNES inheritance bug from a
pre-Kotlin-port port.

Fix removes the guard. Round-trip identity test added to
knes-agent/test/SavestateRoundtripDebug.kt verifies save → load →
save produces byte-identical output. Also includes a Main.kt-flow
regression test that loads the persisted /tmp/spec5-shop-entered
savestate and asserts character RAM survives.
First end-to-end successful 4-character weapon purchase in project
history. Run B2-v3 (post-fixes): char1+2+3+4 BOUGHT in 77 advisor
iterations, ~$1.0 advisor spend, 45G total in-game.

Stack of fixes that enabled 4/4:

1. Main.kt savestate handling: pre-warm 120 frames before loadState
   (PPU rendering pipeline must engage before state restore) and
   pump 120 frames after (so getScreen returns the restored scene
   to the post-enter detector instead of a stale buffer).

2. AgentSession.runOutfitBootPhase: when KNES_FF1_LOAD_SAVESTATE
   was honoured, skip walk-to-coneria + nav-advisor block. Walk
   pressed cardinals on the active shop dialog and the
   accumulated dismissals eventually landed on the title menu;
   the savestate already places us inside the shop, so the entire
   pre-shop nav phase is redundant.

3. Bumped maxAdvisorCalls 50 → 80 → 120. Run A2 trace confirmed
   80 was just short of char4 nav after FOR_WHOM cursor recovery.

4. Improved SYSTEM_SHOP_PURCHASE prompt: dedicated POST-PURCHASE
   FLOW section teaching cursor reset to char1 between buys
   (counted-Down recipe), explicit ERROR_DIALOG handling, and a
   "do NOT output Done mid-purchase-subflow" guardrail.

5. Per-iter screenshot dump in BuyAtShop.invokeWithAdvisor
   (/tmp/spec5-buy-advisor-iter-NN-served-XXXX.png) for
   post-mortem when the advisor mis-reads sub-screen.

Run-by-run progression toward 4/4:
  Run #21 / #24 (pre-MMC1 fix): 2/4
  Run A2 (post-NameTable fix, fresh nav): 3/4
  Run B2-v3 (post-NameTable fix + savestate runtime): 4/4
… grind

PR #122 merged (361e88e on master). Updates handoff to reflect:
- Run B2-v3 milestone (4/4 chars BOUGHT, 77 advisor calls, ~$1.0).
- Two-commit stack landed: NameTable.stateSave fix + spec5 4/4 buy.
- Architecture diagram updated with savestate-skip path + 120f
  pre/post warm.
- Next goal user-defined: post-purchase exit + grind. Skipping equip
  (chars grind bare-handed). Subgoals: (1) ExitInterior validation,
  (2) walk to grind tile, (3) battleFightAll, (4) track XP →
  level-up against strategic GRIND target.
…Grind → Battle)

Approach A — vision-LLM with hint-encoded prompts. Drops tree-detour /
empirical exploration from hot path; encodes 3 user gameplay hints
(DOWN-from-shop, green=grass=grind-target, overworld-trees-walkable)
into VisionInteriorNavigator prompt + GrindLoop anchor selection.
…s baseline

User-flagged invariant: 4/4 weapon purchase (Run B2-v3) must not regress when
Phase 3 (exit) and Phase 4 (grind) are reworked. Spec now lists pinned files /
prompts / contracts plus two smoke checks (fresh-run + savestate-load) the
implementation plan must run before declaring done.
Implements the 2026-05-09-coneria-pipeline-design.md spec. Task 1 builds
GrindAnchorSelector with 5 unit tests; Task 2 appends POST-SHOP-TOWN-EXIT
hints to VisionInteriorNavigator prompt; Task 3 adds GrindLoop encounter
delta log; Task 4 replaces runOutfitBootPhase exit block with vision-only
+ anchor + reanchor orchestration; Tasks 5–6 are the two smoke checks
(savestate + fresh-run) with explicit weaponsBought=4 regression guard;
Task 7 captures cont-4 results in HANDOFF.md.
Captures the uncommitted cont-3 work as a single checkpoint so the cont-4
plan tasks (per docs/superpowers/plans/2026-05-09-coneria-pipeline.md)
land as clean per-task diffs on top of a known base.

Includes:
- AgentScratchpad — coding-agent-style action notebook persisted alongside
  savestate (sister *.actions.json file).
- WalkInteriorVision historyHint param + VisionInteriorNavigator API.
- ExitTownEmpirical + ExitInterior.walkOutOfTownOverlay (defensive
  fallback / offline-test path; cont-4 plan drops these from the hot path
  but keeps them callable when outfitNavigator==null).
- GrindLoop per-step screenshots + encounterCounter log.
- runOutfitBootPhase: skip-equip default, post-buy savestate dump, vision
  historyHint wiring, boot_exit_interior_result trace.
- HANDOFF.md cont-3 final notes.

Compiles clean (./gradlew :knes-agent:compileKotlin BUILD SUCCESSFUL).
Encode user hints #1 (DOWN-from-shop) and #3 (overworld trees walkable) so
the model walks SOUTH out of the shop building, avoids LEFT/RIGHT building
doorways, and doesn't return STUCK when it sees trees on the south horizon.
Per-step delta makes encounter-byte staleness visible in stdout; one-shot
WARN at loop end (grind_encounter_byte_dead) signals when all 12 steps
saw zero delta, distinguishing wrong RAM byte from true peninsula dead-zone.
…chor

Drops ExitTownEmpirical / tree-detour from hot path (kept as offline fallback
when outfitNavigator==null). Post-exit settles 120 frames, picks anchor via
GrindAnchorSelector reading OverworldMap GRASS/FOREST classifications, runs
GrindLoop with up to 2 re-anchors on NoEncounter. New trace markers:
boot_phase3_exit_result, boot_phase4_grind_anchor, boot_phase4_grind_result,
boot_phase4_grind_reanchor, boot_pipeline_end.
VisionInteriorNavigator defaulted to claude-opus-4-5-20251101 which is
not a current Anthropic model ID — every API call 404'd, parser returned
UNCLEAR, and WalkInteriorVision bailed after 2 consecutive UNCLEARs.
Smoke #1 (savestate-load, post-cont-4 plan Tasks 1-4) hit this exact
failure mode at boot_phase3_exit_result.

Default switched to claude-sonnet-4-6, matching the surrounding KDoc
("Uses Sonnet 4.6"). Caller in Main.kt does not override the default,
so this restores the in-comment intent.
… Coneria exit

Tasks 1-4 of the cont-4 plan committed clean (12f4e10, 1e04d3a, 62f118f,
b4d7fb5) plus model-ID hotfix (598df2d). All unit tests green;
OutfitBootPhaseTest regression baseline preserved.

Smoke #1 empirical: vision-LLM (Sonnet 4.6, post-fix) walked 60 steps
in town overlay without transitioning to overworld — exactly the spec's
documented Risk #1. Phase 4 grind never exercised. Smoke #2 (fresh-run)
deferred to next session per user gate.

Next session: Approach B (PPU nametable read for town overlay tile data).
Do NOT re-enable ExitTownEmpirical tree-detour in hot path; offline
fallback path remains.
Pre-step screenshot to /tmp/spec5-vision-exit-iter-NN-smX_Y-mfM.png on
every iteration. Without these, a 60-step vision drift leaves zero
visible artefacts (cont-4 Smoke #1 evidence). Generalises the
BuyAtShop per-iter pattern per feedback_per_iter_screenshots.md.

Try/catch wraps the dump so emulator-screen capture failure never
fails the skill itself — purely diagnostic.
…oops

Cont-4 Smoke #1 evidence: PNG iter 30 + iter 50 show party stuck at
sm(2,14) for 8+ iters because vision picks LEFT each call without
seeing that the previous LEFT was a no-op (lastBlocked is only the
SINGLE most-recent failure). Adds rolling 8-entry buffer of
"step=N dir=X smPre=(a,b) smPost=(c,d) moved=Y/N" to historyHint, so
each navigator call sees its own recent decision history.

System prompt updated: model is told to detect repeating-with-moved=false
patterns and pick a perpendicular cardinal, plus to treat RECENT MOVES as
stronger evidence than still-image inference (image cannot show that the
last 5 cardinals were no-ops).
…till hits cont-3 deadlock

Two new commits diagnostically: 2412b58 (per-iter PNG dump) and
2371e6e (recent-moves history in WalkInteriorVision historyHint).
Both surfaced from user feedback ("czemu nie widziałem screenów" +
"idzie w lewo i tam sobie krąży — może nie wie że ma iść w dół bo
nie ma historii").

Smoke #1 retry: vision now reaches sm(14,22) south edge in 9 steps
(was 60-step drift west) but UNCLEAR x2 at sm(9,22) after westward
drift past the warp tile. Recent-moves history broke the LEFT-loop;
the cont-3 sm(14,22) deadlock still defeats vision-only.

Savestate corruption flagged: /tmp/spec5-post-buy.savestate gets
overwritten on every run regardless of buy success — the 4/4
fixture is lost. Quick fix proposed for cont-6.

Recommended next-session direction: Approach B (PPU nametable
read), fix savestate-dump regression, run Smoke #2 fresh-run for
4/4 buy baseline regression check.
@ArturSkowronski
Copy link
Copy Markdown
Author

Wrong target repo — Work seamlessly with GitHub from the command line.

USAGE
gh [flags]

CORE COMMANDS
auth: Authenticate gh and git with GitHub
browse: Open repositories, issues, pull requests, and more in the browser
codespace: Connect to and manage codespaces
gist: Manage gists
issue: Manage issues
org: Manage organizations
pr: Manage pull requests
project: Work with GitHub Projects.
release: Manage releases
repo: Manage repositories

GITHUB ACTIONS COMMANDS
cache: Manage GitHub Actions caches
run: View details about workflow runs
workflow: View details about GitHub Actions workflows

ALIAS COMMANDS
co: Alias for "pr checkout"

ADDITIONAL COMMANDS
agent-task: Work with agent tasks (preview)
alias: Create command shortcuts
api: Make an authenticated GitHub API request
attestation: Work with artifact attestations
completion: Generate shell completion scripts
config: Manage configuration for gh
copilot: Run the GitHub Copilot CLI (preview)
extension: Manage gh extensions
gpg-key: Manage GPG keys
label: Manage labels
licenses: View third-party license information
preview: Execute previews for gh features
ruleset: View info about repo rulesets
search: Search for repositories, issues, and pull requests
secret: Manage GitHub secrets
ssh-key: Manage SSH keys
status: Print information about relevant issues, pull requests, and notifications across repositories
variable: Manage GitHub Actions variables

HELP TOPICS
accessibility: Learn about GitHub CLI's accessibility experiences
actions: Learn about working with GitHub Actions
environment: Environment variables that can be used with gh
exit-codes: Exit codes used by gh
formatting: Formatting options for JSON data exported from gh
mintty: Information about using gh with MinTTY
reference: A comprehensive reference of all gh commands

FLAGS
--help Show help for command
--version Show gh version

EXAMPLES
$ gh issue create
$ gh repo clone cli/cli
$ gh pr checkout 321

LEARN MORE
Use gh <command> <subcommand> --help for more information about a command.
Read the manual at https://cli.github.com/manual
Learn about exit codes using gh help exit-codes
Learn about accessibility experiences using gh help accessibility auto-resolved to upstream. Will recreate on ArturSkowronski/kNES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant