Persistent Agent: MongoDB as Persistent Memory for Long-Running Autonomous Agents

A field report from 283 experiments in autonomous ML optimization

The Problem Anthropic's Harness Solves — And One It Doesn't

Anthropic's recent engineering post on effective harnesses for long-running agents identifies the core challenge precisely: each new agent session begins with no memory of what came before. Their solution — a claude-progress.txt file and git history — works well for software engineering tasks where progress is linear and additive.

But autonomous experimentation is a different problem. In ML optimization, the agent isn't building toward a single goal. It's exploring a search space. And in a search space, what you've already tried is just as important as what worked.

A flat progress file can tell the next agent "we tried gradient boosting." It can't answer:

Which of the 47 feature engineering approaches worked on which data distributions?
What's the failure mode of target encoding on this specific dataset?
Which hyperparameter regions have already been exhausted?

When the search space is large, a text file becomes noise. The agent drowns in history rather than learning from it.

The Persistent Agent Approach: MongoDB as Agent Memory

Persistent Agent is an autonomous ML experimentation system built on a different premise: persistent structured memory, queryable by the agent itself.

Instead of a progress file, every experiment writes a structured document to MongoDB:

{
  "experiment_id": "exp_0283",
  "hypothesis": "Log-transform skewed features before gradient boosting",
  "cv_score": 0.12891,
  "lb_score": 0.12634,
  "features_used": ["GrLivArea_log", "LotArea_log"],
  "model": "XGBRegressor",
  "failed": false,
  "failure_reason": null,
  "parent_experiment": "exp_0231",
  "created_at": "2026-03-15T09:23:11Z"
}

When a new agent session starts, it doesn't read a flat file. It queries:

// What has already been tried?
db.experiments.find({ competition: "house-prices" }).sort({ cv_score: 1 }).limit(20)

// What's the best approach so far?
db.experiments.find({ lb_score: { $exists: true } }).sort({ lb_score: 1 }).limit(1)

// What approaches failed and why?
db.experiments.find({ failed: true }).project({ hypothesis: 1, failure_reason: 1 })

The agent proposes the next experiment with full awareness of everything that came before. Not as a text summary — as structured, queryable data.

What 283 Experiments Taught Us

Running Persistent Agent autonomously on the Kaggle House Prices competition produced some observations that flat-file harnesses can't easily surface:

Failure modes are structured. Target encoding without proper cross-validation leaks systematically. Once that failure is stored in MongoDB, no future agent session proposes target encoding without cross-validation again. A progress file would bury this in prose.

The search space has topology. Some experiments are parents of others. MongoDB preserves this lineage. The agent can query "what experiments branched from exp_0089 and what happened to them" — essential for understanding why a promising direction dead-ended.

CV/LB divergence is detectable. By storing both cross-validation scores and leaderboard scores, Persistent Agent can detect when a model overfits to the validation set. This pattern — invisible in a text log — becomes a queryable signal.

Plateau detection requires history. After 283 experiments, the AnalyzePlateau module queries the last N experiments and detects when marginal improvement has stalled. This drives the decision to explore vs. exploit — a decision that requires structured history, not a summary.

Architecture

┌─────────────────────────────────────────┐
│          Persistent Agent               │
│                                         │
│  Sidekiq Worker                         │
│       │                                 │
│       ▼                                 │
│  Claude Code (Proposer)                 │
│       │  reads experiment history       │
│       │  from MongoDB via tool          │
│       ▼                                 │
│  DSL Experiment Spec                    │
│       │                                 │
│       ▼                                 │
│  Python Runner                          │
│       │  executes ML pipeline           │
│       ▼                                 │
│  MongoDB (Persistent Memory)  ◄─────────┘
│       stores result, scores,            │
│       features, failure reasons         │
└─────────────────────────────────────────┘

The key design decision: Claude Code has a MongoDB query tool. It's not summarized for the agent — the agent queries it directly. This means the agent's awareness of history is limited only by what it thinks to ask, not by what a previous agent thought to write down.

Comparison to Anthropic's Progress File Approach

	Anthropic's Harness	Persistent Agent / MongoDB
Memory format	Flat text file	Structured documents
Queryable	No	Yes
Scales with history	Degrades	Constant performance
Failure analysis	Prose	Structured queries
Search space topology	Not captured	Parent/child lineage
Best for	Linear build tasks	Iterative search tasks

Neither approach is universally better. For building a web app toward a known goal, a progress file is sufficient and simpler. For open-ended search across a large experiment space, structured memory becomes essential.

Open Questions

Anthropic's post notes that it's unclear whether a single general-purpose agent or specialized agents perform better across contexts. Persistent Agent uses a single proposer (Claude Code) but the MongoDB schema implicitly creates specialization — the proposer behaves differently when querying feature engineering history vs. model selection history.

A natural extension: specialized query agents that pre-process history into focused context before the proposer runs. A "what feature engineering has been tried" agent that summarizes the relevant subset, rather than exposing raw MongoDB queries to the proposer.

Running Persistent Agent

git clone https://github.com/georgeu2000/persistent-agent
cd persistent-agent

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
.github		.github
app		app
bin		bin
config		config
docs/images		docs/images
lib		lib
prompts		prompts
public		public
script		script
solvers		solvers
spec		spec
tests		tests
.gitignore		.gitignore
.rspec		.rspec
.rubocop.yml		.rubocop.yml
.ruby-version		.ruby-version
CLAUDE.md		CLAUDE.md
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
Rakefile		Rakefile
config.ru		config.ru
conftest.py		conftest.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Persistent Agent: MongoDB as Persistent Memory for Long-Running Autonomous Agents

The Problem Anthropic's Harness Solves — And One It Doesn't

The Persistent Agent Approach: MongoDB as Agent Memory

What 283 Experiments Taught Us

Architecture

Comparison to Anthropic's Progress File Approach

Open Questions

Running Persistent Agent

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Persistent Agent: MongoDB as Persistent Memory for Long-Running Autonomous Agents

The Problem Anthropic's Harness Solves — And One It Doesn't

The Persistent Agent Approach: MongoDB as Agent Memory

What 283 Experiments Taught Us

Architecture

Comparison to Anthropic's Progress File Approach

Open Questions

Running Persistent Agent

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages