gameplay_capture: carry known_good action as supervised label by dp-web4 · Pull Request #23 · dp-web4/SAGE

dp-web4 · 2026-04-18T20:28:16Z

Summary

Per Dennis's observation: the gameplay records are supervised training triples, but we were only capturing what our baseline PROPOSED, not what actually was the right move. The winning trace's per-step action is by definition a good next action — encode it in metadata so downstream training can use it as the teacher signal.

What this ships

Three new fields on every gameplay record's metadata:

known_good_action: int — GameAction value the winning trace took
known_good_data: Dict|None — click coords for CLICK actions, None otherwise
known_good_level: int — game level at this step

What this unlocks

Training task	Target	State
Router BC	`baseline_dispatch`	already worked
Action prediction	`known_good_action`	NEW — direct action-level supervision
Motor-skill BC by demonstration	`known_good_action` given `(state, skill_params)`	NEW — when motor-skills land
Outcome-weighted shaping	`sample_weight ∝ game_outcome.won`	NEW
Backprop through chained components	terminal-loss on winning action	NEW — whole-stack gradient

"This is what SAGE should do next to evaluate what it proposes next" — the proposal and the ground truth are now both in every record.

Backward compatibility

RouterRecord schema unchanged (metadata is an open dict). Old consumers that don't look at the new fields are unaffected. The PR #21 records emitted before this merge don't have the new labels, but that's 148 records on CBP — easy to re-capture via fleet_gameplay_capture.sh.

Tests

2 new unit tests (18 total in the module). Verify:

known_good_action exactly matches the trace action (1/3/6 for UP/LEFT/CLICK)
Click-step known_good_data carries {x, y} coords
known_good_level passes through correctly

Recommendation post-merge

Re-run fleet_gameplay_capture.sh on machines that already ran it (currently CBP only) so their records get upgraded with the new labels. Machines that haven't run it yet just get the labels natively on first run.

🤖 Generated with Claude Code

Per Dennis's observation: the gameplay records ARE supervised training triples, but we were only capturing what our baseline PROPOSED, not what actually was the right move. The winning trace's per-step action is by definition a good next action — encode it in metadata so downstream training can use it as the teacher signal. New metadata fields on every gameplay record: - known_good_action: int (GameAction value that the winning trace took) - known_good_data: Dict|None (click coords for action=6, else None) - known_good_level: int (game level at this step) What this unlocks for training: - Router BC: (state → baseline_dispatch) [already worked] - Action prediction: (state → known_good_action) [NEW] - Motor-skill BC by demonstration: (state × skill_params → known_good_action) [NEW] - Outcome-weighted shaping: sample_weight ∝ game_outcome.won [NEW] - Backprop through chained components using the winning action as the terminal-loss target [NEW] 'This is what SAGE should do next to evaluate what it proposes next' — the proposal and the ground truth are now both in every record. Tests: 2 new (18 total). Verify known_good_action matches trace action exactly, click-step known_good_data carries coords, levels pass through. Backward compat: old consumers that don't look at the new fields are unaffected. RouterRecord schema unchanged (metadata is open dict). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…subtraction. Phi4 register-substitution discovered (Δpol -3.36, Δbiz +1.08 same trajectory). Hardware register quantified — Thor Δhw +2.46 largest single Δ, positive across all 8 raised instances. CBP basin = TED+gov+marketing combo. Lexicon substring FP bug fixed (recurrence #9 of S110 pattern at analysis layer). S119 #18/#19/#20 executed; #21/#22/#23/#24 held. Machine: localhost.localdomain Date: 2026-04-28 01:13:05 UTC Changes committed automatically at session end. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

dp-web4 merged commit 3537a9e into main Apr 18, 2026

dp-web4 deleted the router/gameplay-known-good-labels branch April 18, 2026 20:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gameplay_capture: carry known_good action as supervised label#23

gameplay_capture: carry known_good action as supervised label#23
dp-web4 merged 1 commit intomainfrom
router/gameplay-known-good-labels

dp-web4 commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dp-web4 commented Apr 18, 2026

Summary

What this ships

What this unlocks

Backward compatibility

Tests

Recommendation post-merge

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant