Skip to content

georgeu2000/persistent-agent

Repository files navigation

Persistent Agent: MongoDB as Persistent Memory for Long-Running Autonomous Agents

A field report from 283 experiments in autonomous ML optimization

Housing Prices Experiment Results


The Problem Anthropic's Harness Solves — And One It Doesn't

Anthropic's recent engineering post on effective harnesses for long-running agents identifies the core challenge precisely: each new agent session begins with no memory of what came before. Their solution — a claude-progress.txt file and git history — works well for software engineering tasks where progress is linear and additive.

But autonomous experimentation is a different problem. In ML optimization, the agent isn't building toward a single goal. It's exploring a search space. And in a search space, what you've already tried is just as important as what worked.

A flat progress file can tell the next agent "we tried gradient boosting." It can't answer:

  • Which of the 47 feature engineering approaches worked on which data distributions?
  • What's the failure mode of target encoding on this specific dataset?
  • Which hyperparameter regions have already been exhausted?

When the search space is large, a text file becomes noise. The agent drowns in history rather than learning from it.


The Persistent Agent Approach: MongoDB as Agent Memory

Persistent Agent is an autonomous ML experimentation system built on a different premise: persistent structured memory, queryable by the agent itself.

Instead of a progress file, every experiment writes a structured document to MongoDB:

{
  "experiment_id": "exp_0283",
  "hypothesis": "Log-transform skewed features before gradient boosting",
  "cv_score": 0.12891,
  "lb_score": 0.12634,
  "features_used": ["GrLivArea_log", "LotArea_log"],
  "model": "XGBRegressor",
  "failed": false,
  "failure_reason": null,
  "parent_experiment": "exp_0231",
  "created_at": "2026-03-15T09:23:11Z"
}

When a new agent session starts, it doesn't read a flat file. It queries:

// What has already been tried?
db.experiments.find({ competition: "house-prices" }).sort({ cv_score: 1 }).limit(20)

// What's the best approach so far?
db.experiments.find({ lb_score: { $exists: true } }).sort({ lb_score: 1 }).limit(1)

// What approaches failed and why?
db.experiments.find({ failed: true }).project({ hypothesis: 1, failure_reason: 1 })

The agent proposes the next experiment with full awareness of everything that came before. Not as a text summary — as structured, queryable data.


What 283 Experiments Taught Us

Running Persistent Agent autonomously on the Kaggle House Prices competition produced some observations that flat-file harnesses can't easily surface:

Failure modes are structured. Target encoding without proper cross-validation leaks systematically. Once that failure is stored in MongoDB, no future agent session proposes target encoding without cross-validation again. A progress file would bury this in prose.

The search space has topology. Some experiments are parents of others. MongoDB preserves this lineage. The agent can query "what experiments branched from exp_0089 and what happened to them" — essential for understanding why a promising direction dead-ended.

CV/LB divergence is detectable. By storing both cross-validation scores and leaderboard scores, Persistent Agent can detect when a model overfits to the validation set. This pattern — invisible in a text log — becomes a queryable signal.

Plateau detection requires history. After 283 experiments, the AnalyzePlateau module queries the last N experiments and detects when marginal improvement has stalled. This drives the decision to explore vs. exploit — a decision that requires structured history, not a summary.


Architecture

┌─────────────────────────────────────────┐
│          Persistent Agent               │
│                                         │
│  Sidekiq Worker                         │
│       │                                 │
│       ▼                                 │
│  Claude Code (Proposer)                 │
│       │  reads experiment history       │
│       │  from MongoDB via tool          │
│       ▼                                 │
│  DSL Experiment Spec                    │
│       │                                 │
│       ▼                                 │
│  Python Runner                          │
│       │  executes ML pipeline           │
│       ▼                                 │
│  MongoDB (Persistent Memory)  ◄─────────┘
│       stores result, scores,            │
│       features, failure reasons         │
└─────────────────────────────────────────┘

The key design decision: Claude Code has a MongoDB query tool. It's not summarized for the agent — the agent queries it directly. This means the agent's awareness of history is limited only by what it thinks to ask, not by what a previous agent thought to write down.


Comparison to Anthropic's Progress File Approach

Anthropic's Harness Persistent Agent / MongoDB
Memory format Flat text file Structured documents
Queryable No Yes
Scales with history Degrades Constant performance
Failure analysis Prose Structured queries
Search space topology Not captured Parent/child lineage
Best for Linear build tasks Iterative search tasks

Neither approach is universally better. For building a web app toward a known goal, a progress file is sufficient and simpler. For open-ended search across a large experiment space, structured memory becomes essential.


Open Questions

Anthropic's post notes that it's unclear whether a single general-purpose agent or specialized agents perform better across contexts. Persistent Agent uses a single proposer (Claude Code) but the MongoDB schema implicitly creates specialization — the proposer behaves differently when querying feature engineering history vs. model selection history.

A natural extension: specialized query agents that pre-process history into focused context before the proposer runs. A "what feature engineering has been tried" agent that summarizes the relevant subset, rather than exposing raw MongoDB queries to the proposer.


Running Persistent Agent

git clone https://github.com/georgeu2000/persistent-agent
cd persistent-agent

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors