Skip to content

Add Q-Learning AI to control Pac-Man autonomously#11

Open
vck77 wants to merge 3 commits intogreyblue9:masterfrom
vck77:claude/pacman-q-learning-ai-KSBr8
Open

Add Q-Learning AI to control Pac-Man autonomously#11
vck77 wants to merge 3 commits intogreyblue9:masterfrom
vck77:claude/pacman-q-learning-ai-KSBr8

Conversation

@vck77
Copy link

@vck77 vck77 commented Mar 2, 2026

Replaces keyboard-driven Pac-Man with a tabular Q-Learning agent that learns to navigate the maze, eat pellets, and avoid ghosts through self-play.

New file – pacman/q_learning_ai.py:
• QLearningAgent class with epsilon-greedy policy, Q(s,a) update rule,
and JSON persistence (q_table.json survives between runs).
• 13-feature binary state: walls × 4, dangerous-ghost × 4, ghost
vulnerable flag, pellet visible × 4.
• Wall-aware action selection (never picks an immediately blocked move
during exploitation).

Changes to pacman/pacman.pyw:
• AI_ENABLED = True flag at the top (set False to play manually).
• AIStep() called every frame: makes a movement decision every
AI_DECISION_INTERVAL (8) frames, computes rewards from score deltas
(+10 pellet, +100 power pellet, etc.), applies –500 death penalty
and +1000 level-win bonus, auto-restarts on game over, saves the
Q-table every 5 episodes.
• DrawAIStats() overlays episode count, epsilon, state count, and
total steps in the top-left corner during play.
• Manual keyboard input is preserved when AI_ENABLED = False.

https://claude.ai/code/session_01EKGJKXQ5ahXkGuXTyAVZsA

Summary by Sourcery

Integrate a tabular Q-learning agent to autonomously control Pac-Man, with optional keyboard control preserved via a feature flag.

New Features:

  • Add a configurable Q-learning AI controller that drives Pac-Man using an epsilon-greedy policy and a compact binary state representation.
  • Display an on-screen HUD showing Q-learning statistics such as episode count, epsilon, state count, and total steps during play.

Enhancements:

  • Wire the main game loop to delegate movement and reward handling to the AI when enabled, including automatic episode progression, Q-table persistence, and auto-restart behavior.

Replaces keyboard-driven Pac-Man with a tabular Q-Learning agent that
learns to navigate the maze, eat pellets, and avoid ghosts through
self-play.

New file – pacman/q_learning_ai.py:
  • QLearningAgent class with epsilon-greedy policy, Q(s,a) update rule,
    and JSON persistence (q_table.json survives between runs).
  • 13-feature binary state: walls × 4, dangerous-ghost × 4, ghost
    vulnerable flag, pellet visible × 4.
  • Wall-aware action selection (never picks an immediately blocked move
    during exploitation).

Changes to pacman/pacman.pyw:
  • AI_ENABLED = True flag at the top (set False to play manually).
  • AIStep() called every frame: makes a movement decision every
    AI_DECISION_INTERVAL (8) frames, computes rewards from score deltas
    (+10 pellet, +100 power pellet, etc.), applies –500 death penalty
    and +1000 level-win bonus, auto-restarts on game over, saves the
    Q-table every 5 episodes.
  • DrawAIStats() overlays episode count, epsilon, state count, and
    total steps in the top-left corner during play.
  • Manual keyboard input is preserved when AI_ENABLED = False.

https://claude.ai/code/session_01EKGJKXQ5ahXkGuXTyAVZsA
@sourcery-ai
Copy link

sourcery-ai bot commented Mar 2, 2026

Reviewer's Guide

Introduces a tabular Q-learning agent that can autonomously control Pac-Man, wires it into the main game loop behind an AI_ENABLED flag, adds per-mode reward handling and auto-restart, and overlays basic AI training stats while persisting the learned Q-table across runs.

Sequence diagram for AIStep Q-learning control loop and game integration

sequenceDiagram
    participant GameLoop
    participant AIStep
    participant QLearningAgent as Agent
    participant Player
    participant Level as LevelObj
    participant Game as GameObj

    GameLoop->>AIStep: AIStep()
    alt mode == 1 (normal gameplay)
        AIStep->>AIStep: ai_frame_counter += 1
        alt ai_frame_counter >= AI_DECISION_INTERVAL
            AIStep->>Agent: get_state(Player, Ghosts, LevelObj, GameObj)
            Agent-->>AIStep: curr_state
            alt Agent.prev_state is not None
                AIStep->>Agent: update(prev_state, prev_action, reward, curr_state)
                AIStep->>Agent: decay_epsilon()
            end
            AIStep->>Agent: choose_action(curr_state, Player, LevelObj)
            Agent-->>AIStep: action
            AIStep->>AIStep: compute dx, dy from ACTION_VELS[action] * Player.speed
            AIStep->>LevelObj: CheckIfHitWall(Player.x+dx, Player.y+dy, nearestRow, nearestCol)
            alt no wall hit
                AIStep->>Player: set velX = dx, velY = dy
            end
            AIStep->>Agent: set prev_state, prev_action, prev_score
        end
    else mode == 2 (death)
        alt ai_prev_mode == 1 and Agent.prev_state is not None
            AIStep->>Agent: update(prev_state, prev_action, -500, terminal_state)
            AIStep->>Agent: decay_epsilon()
            AIStep->>Agent: prev_state = None
        end
    else mode == 3 (game over)
        AIStep->>Agent: episode += 1
        alt Agent.episode % 5 == 0
            AIStep->>Agent: save(AI_QTABLE_PATH)
        end
        AIStep->>Agent: prev_state = None
        AIStep->>GameObj: StartNewGame()
    else mode == 6 (level complete)
        alt ai_prev_mode == 1 and Agent.prev_state is not None
            AIStep->>Agent: update(prev_state, prev_action, 1000, terminal_state)
            AIStep->>Agent: prev_state = None
        end
    end
    AIStep->>AIStep: ai_prev_mode = GameObj.mode
    AIStep->>GameLoop: return

    GameLoop->>GameLoop: update entities and render
    alt AI_ENABLED
        GameLoop->>Agent: DrawAIStats() via HUD overlay
    end

    alt ESC pressed
        GameLoop->>Agent: save(AI_QTABLE_PATH)
        GameLoop->>GameLoop: sys.exit(0)
    end
Loading

Class diagram for the new QLearningAgent AI controller

classDiagram
    class QLearningAgent {
        <<class>>
        +float alpha
        +float gamma
        +float epsilon
        +float epsilon_min
        +float epsilon_decay
        +string qtable_path
        +dict q_table
        +int episode
        +float total_reward
        +int steps
        +tuple prev_state
        +int prev_action
        +int prev_score
        +list ACTIONS
        +dict ACTION_VELS
        +dict ACTION_NAMES
        +QLearningAgent __init__(float alpha, float gamma, float epsilon, float epsilon_min, float epsilon_decay, string qtable_path)
        +float _q(tuple state, int action)
        +void _set_q(tuple state, int action, float value)
        +tuple get_state(object player, dict ghosts, object level_obj, object game_obj)
        +int choose_action(tuple state, object player, object level_obj)
        +void update(tuple state, int action, float reward, tuple next_state)
        +void decay_epsilon()
        +void save(string path)
        +void load(string path)
    }
Loading

File-Level Changes

Change Details Files
Add a reusable tabular Q-learning agent with compact binary state representation, epsilon-greedy, and JSON persistence.
  • Define QLearningAgent with configurable alpha, gamma, epsilon schedule, and internal Q-table structure.
  • Implement 13-bit binary state extraction from the current game situation, including walls, nearby dangerous ghosts, ghost vulnerability, and pellets in each direction.
  • Provide epsilon-greedy, wall-aware action selection using the existing collision checks to filter invalid moves.
  • Implement the Q-learning update rule, epsilon decay, and basic training statistics tracking (episodes, total reward, steps).
  • Add save/load methods that serialize and deserialize the Q-table and metadata to/from JSON so learning persists between runs.
pacman/q_learning_ai.py
Wire the Q-learning agent into the main Pac-Man loop behind a configuration flag, translating game events into rewards and delegating movement when enabled.
  • Introduce AI_ENABLED and AI_QTABLE_PATH configuration at the top of the script and conditionally import QLearningAgent when AI is active.
  • Initialize the Q-learning agent, tracking previous game mode and frame counter, and create a small font for AI HUD rendering.
  • Add AIStep() to run each frame: every AI_DECISION_INTERVAL frames compute state, apply reward from score deltas with a step penalty, update Q-values, decay epsilon, and set Pac-Man velocity based on the chosen action if the move is non-blocking.
  • In AIStep(), handle special modes with terminal rewards: apply a large negative reward on death, large positive reward on level completion, increment episode and periodically save the Q-table on game over, and auto-restart new games.
  • Allow ESC to save the Q-table and exit immediately when AI is enabled.
  • Modify the main loop so that when AI_ENABLED is true, AIStep() replaces CheckInputs() in gameplay modes; when false, preserve existing keyboard handling.
  • Add DrawAIStats() and call it each frame when AI is enabled to overlay current episode, epsilon, state count, and total step count.
  • Ensure manual keyboard-driven play remains unchanged when AI_ENABLED is False, including in non-gameplay modes like game over and waiting-to-start.
pacman/pacman.pyw

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location path="pacman/pacman.pyw" line_range="1432-1433" />
<code_context>
+    # --- Mode 2: Pac-Man just died -- apply death penalty ---
+    elif thisGame.mode == 2:
+        if ai_prev_mode == 1 and ai_agent.prev_state is not None:
+            terminal = (0,) * 13
+            ai_agent.update(ai_agent.prev_state, ai_agent.prev_action, -500, terminal)
+            ai_agent.decay_epsilon()
+            ai_agent.prev_state = None
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider handling terminal transitions without bootstrapping off a dummy next state.

Using a synthetic `terminal = (0,) * 13` still allows the update to bootstrap from whatever Q-values get learned for that dummy state, which can distort the intended -500 terminal penalty. Instead, handle terminal transitions without a next-state value (e.g., `new_q = current + alpha * (reward - current)` with no `gamma * max Q(s')`, or by allowing `next_state=None` and skipping the `best_next` term) so the terminal reward isn’t coupled to an arbitrary placeholder state.

Suggested implementation:

```
    # --- Mode 2: Pac-Man just died -- apply death penalty ---
    elif thisGame.mode == 2:
        if ai_prev_mode == 1 and ai_agent.prev_state is not None:
            # Terminal transition: no next state, so apply pure terminal penalty
            ai_agent.update(ai_agent.prev_state, ai_agent.prev_action, -500, None)
            ai_agent.decay_epsilon()
            ai_agent.prev_state = None

```

To fully implement the suggested behavior, you should also adjust the `ai_agent.update` method (likely in the AI agent class) so that:
1. Its signature allows `next_state` to be `None`.
2. When `next_state is None`, it performs a non-bootstrapping terminal update, e.g.:
   - `new_q = current_q + alpha * (reward - current_q)`
   - i.e., do **not** add `gamma * max_a' Q(next_state, a')` in this branch.
3. When `next_state` is not `None`, keep the existing Q-learning update with the bootstrap term.
This ensures the `-500` death penalty is not coupled to any arbitrary placeholder state.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +1432 to +1433
terminal = (0,) * 13
ai_agent.update(ai_agent.prev_state, ai_agent.prev_action, -500, terminal)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Consider handling terminal transitions without bootstrapping off a dummy next state.

Using a synthetic terminal = (0,) * 13 still allows the update to bootstrap from whatever Q-values get learned for that dummy state, which can distort the intended -500 terminal penalty. Instead, handle terminal transitions without a next-state value (e.g., new_q = current + alpha * (reward - current) with no gamma * max Q(s'), or by allowing next_state=None and skipping the best_next term) so the terminal reward isn’t coupled to an arbitrary placeholder state.

Suggested implementation:

    # --- Mode 2: Pac-Man just died -- apply death penalty ---
    elif thisGame.mode == 2:
        if ai_prev_mode == 1 and ai_agent.prev_state is not None:
            # Terminal transition: no next state, so apply pure terminal penalty
            ai_agent.update(ai_agent.prev_state, ai_agent.prev_action, -500, None)
            ai_agent.decay_epsilon()
            ai_agent.prev_state = None

To fully implement the suggested behavior, you should also adjust the ai_agent.update method (likely in the AI agent class) so that:

  1. Its signature allows next_state to be None.
  2. When next_state is None, it performs a non-bootstrapping terminal update, e.g.:
    • new_q = current_q + alpha * (reward - current_q)
    • i.e., do not add gamma * max_a' Q(next_state, a') in this branch.
  3. When next_state is not None, keep the existing Q-learning update with the bootstrap term.
    This ensures the -500 death penalty is not coupled to any arbitrary placeholder state.

claude added 2 commits March 2, 2026 09:34
Pressing + speeds up the AI training simulation by running multiple
game-update iterations per rendered frame (1x/2x/4x/8x/16x). Pressing
- slows it back down. Current speed is shown in the AI HUD overlay.

https://claude.ai/code/session_01EKGJKXQ5ahXkGuXTyAVZsA
FollowNextPathWay() recursed after finding a new path, but if
FindPath returned an empty string (ghost already at destination),
it would recurse infinitely. Guard both recursive calls with
`if self.currentPath:` so they only fire when the new path is
non-empty. This was latent in the original code but became
reliably triggered at 2x+ sim speed.

https://claude.ai/code/session_01EKGJKXQ5ahXkGuXTyAVZsA
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants