Drop a few files into an existing repo, describe your problem, and let an AI agent work through it autonomously — running experiments, reasoning about results, and keeping a diary of what it tried.
Inspired by Karpathy's autoresearch and Donald Knuth's "Claudes Cycles": the idea of an idealized researcher cycling through hypotheses, experiments, and conclusions.
The agent follows a simple loop:
- Read
problem.mdanddiary.md - Choose a step — either an experiment (write and run code) or a reasoning (think something through in writing)
- Create a folder for that step under
steps/ - Update
diary.mdbefore moving on - Repeat until it reaches a conclusion or hits the step budget
The agent maintains a live hypothesis list in diary.md, kills approaches that fail, and prefers understanding over brute-force search.
| File | Purpose |
|---|---|
CLAUDE.md |
Instructions the agent reads automatically |
problem.md |
Your problem — fill this in before starting |
diary.md |
Running log of steps, findings, and hypotheses |
pyproject.toml |
Python dependencies (uv) |
Dockerfile |
Defines the Docker image for the agent |
.dockerignore |
Files to ignore when building the Docker image |
run_agent.sh |
Helper script to launch the Docker container and start the agent |
I find that this approach works best using frontier models in "thinking mode". As of March 2026 this is Claude Opus with the Max effort setting.
Long context windows can consume credits quickly. It is usually better to restart sessions intermittently (and at the latest when you hit the Claude usage cap), rather than keeping one very long-running thread.
Open the repo you want to work on in VSCode with the official Claude Code extension (and having logged into your Claude Pro account). Edit the problem.md file to describe your problem. Then start the agent in the dedicated chat UI. The agent will get working but will ask for approval on most actions. This is the safest and most supervised option — good for shorter sessions where you want to stay in the loop; but you cannot leave it running unattended because it will pause and wait for your input.
You will need to change the model to "Claude Opus" and the effort to "Max" manually in the extension settings before starting the session, to get the best results.
This lets the agent run without any interruption. It must be run inside a Docker container — never directly on your machine. When permissions are skipped, the agent can do anything a normal process can: delete files, overwrite code, install packages, and make network requests. Without a container, that means your entire home directory, SSH keys, credentials, and any mounted drives are in scope. A container limits the blast radius to just the repo folder, while still allowing the internet access the agent needs.
Security disclaimer: this container setup reduces risk, but it is not 100% airtight. Container escapes and misconfiguration risks exist. Do your own research, validate the setup for your environment, and do not rely on this implementation alone as a complete safety boundary.
Linux:
sudo apt install docker.io
sudo usermod -aG docker $USER
# Log out and back inMac:
# Install Docker Desktop from https://www.docker.com/products/docker-desktop/
# Or via Homebrew:
brew install --cask docker
# Then launch Docker Desktop from Applications — it must be running before using docker commandsTo check if Docker is installed and working, run:
docker run hello-worldThat's it for setup. You only need to do this once.
Clone a fresh copy specifically for the agent to work in — do not use your normal working copy. This way the agent has full access to your codebase, but any mistakes are isolated to the clone.
Next you will need to add the provided files into your repo. Copy CLAUDE.md, problem.md, diary.md, and pyproject.toml into the root of your repo. These govern how the agent behaves. Edit problem.md to describe your problem with as much detail as possible. Next add the files Dockerfile, .dockerignore, and run_agent.sh files as well. These define the Docker image (the environment the agent runs in) and a helper script to launch the container and start the agent.
# Clone a fresh copy of your repo — the agent will work here, not on your original
git clone <your-repo> your-repo-agent
cd your-repo-agent
rm -f .env # remove any secrets before running
# Copy the necessary files into your repo
cp /path/to/CLAUDE.md .
cp /path/to/problem.md .
cp /path/to/diary.md .
cp /path/to/pyproject.toml .
cp /path/to/Dockerfile .
cp /path/to/.dockerignore .
cp /path/to/run_agent.sh .
# Make sure the helper script is executable
chmod +x run_agent.sh
# First run only: log into Claude inside the container
./run_agent.sh --login
# After login completes, exit that Claude session, then re-run normally
./run_agent.shOn first use, --login is only for authentication. The actual autonomous run should be started without --login, because Claude should run with the normal flags from the script.
What happens when you run ./run_agent.sh:
- Docker builds the image defined in
Dockerfile - Docker starts a container with your repo mounted at
/workspace - Inside the container, the script runs
uv sync - Inside the container, the script launches:
claude --model opus --effort max --dangerously-skip-permissions
- The agent reads the instructions in
CLAUDE.md, the problem inproblem.md, and the diary indiary.md— then gets to work autonomously without asking for permission on any actions.
If you want to stop the session, quit Claude as normal. The container will then exit. Your repo files remain on your machine because the repo folder is mounted into the container.
Why this is safe: the container can only see the repo folder you mounted into /workspace. It does not automatically have access to your home directory, SSH keys, or anything outside that folder. If the agent does something unexpected, the damage is limited to the clone.
- Clone a fresh copy of the repo you want to work on for the agent
- Copy
CLAUDE.md,problem.md,diary.md, andpyproject.tomlinto your repo - Add
Dockerfile,.dockerignore, andrun_agent.sh - Fill in
problem.md - Optionally set a step budget by editing the
BUDGET:line at the top ofdiary.md - Run
./run_agent.sh