chore(migrations): phase out PROFILE.md from disk on schema_version=1#1734
Conversation
Adds a one-shot startup migration that retires the legacy PROFILE.md
system-prompt injection. On the first launch of the new build it:
1. Deletes {workspace}/PROFILE.md so the renderer's
inject_workspace_file_capped silently short-circuits even for agents
that still have omit_profile = false.
2. Strips ### PROFILE.md blocks from every JSONL transcript under
session_raw/ (flat + legacy DDMMYYYY/), rewriting via the existing
write_transcript so the on-disk shape stays byte-compatible.
3. Alters in place any .md companion / legacy HTML-comment transcript
under sessions/**/*.md that still carries the block — preserving the
conversation messages around it (proofs intact).
The migration is gated by a new Config::schema_version (default 0,
bumped to CURRENT_SCHEMA_VERSION=1 on success) and wired into
Config::load_or_init so the first call from run_server_inner fires it
before any agent turn can resume a session. It is idempotent
end-to-end: clean transcripts are byte-identical, fresh installs still
bump the version, second invocations no-op via the version gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (5)
📝 WalkthroughWalkthroughThe PR adds startup schema-versioning (Config.schema_version), runs pending migrations during Config::load_or_init, and implements a 0→1 migration that removes legacy ChangesStartup Schema Versioning and Phase-Out Migration
Sequence Diagram(s)sequenceDiagram
participant App as Application
participant Load as Config::load_or_init
participant RunPending as run_pending
participant PhaseOut as phase_out_profile_md::run
participant Persist as config.save()
App->>Load: load or create config
Load->>RunPending: check schema_version
alt version < CURRENT
RunPending->>PhaseOut: run migration for 0→1
PhaseOut->>PhaseOut: delete PROFILE.md
PhaseOut->>PhaseOut: rewrite session JSONL transcripts
PhaseOut->>PhaseOut: strip PROFILE.md from markdown companions
PhaseOut-->>RunPending: return PhaseOutStats
RunPending->>RunPending: update config.schema_version
RunPending->>Persist: persist bumped version
Persist-->>RunPending: success/fail logged
end
RunPending-->>Load: complete (no-op on errors)
Load-->>App: return Config
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
graycyrus
left a comment
There was a problem hiding this comment.
Walkthrough
This PR introduces a one-shot startup migration framework gated by Config::schema_version. The first migration (0 → 1) retires the legacy PROFILE.md injection by: (1) deleting {workspace}/PROFILE.md, (2) stripping ### PROFILE.md blocks from JSONL transcripts under session_raw/, and (3) sweeping .md companions under sessions/. The runner is wired into Config::load_or_init so it fires before any agent turn can resume a session.
Changes
| File | Change |
|---|---|
types.rs |
Add schema_version: u32 to Config with #[serde(default)] |
load.rs |
Call migrations::run_pending() in both config-load branches |
migrations/mod.rs |
Migration runner: version gate, dispatch, config save, retry |
migrations/phase_out_profile_md.rs |
The migration: file deletion, JSONL walk+strip, .md sweep |
migrations/mod_tests.rs |
4 integration tests for the runner |
migrations/phase_out_profile_md_tests.rs |
18 unit tests for the migration |
mod.rs |
Add pub mod migrations; |
Strengths
- Well-structured module: follows CLAUDE.md dedicated-subdirectory rule, light mod.rs, impl in own file
- 22 tests covering happy paths, edge cases (truncation footer, legacy HTML format, fresh install), idempotency
- Excellent
[migration:phase-out-profile-md]prefixed logging — easy to diagnose in production - Best-effort: per-file errors fold into stats without aborting; runner failures don't block startup
- No
.unwrap()in production code, no hardcoded PII in tests
Issues
See inline comments — 1 major, 4 minor, 1 nit.
| initialized = true, | ||
| "Config loaded" | ||
| ); | ||
| crate::openhuman::migrations::run_pending(&mut config).await; |
There was a problem hiding this comment.
[minor] Fresh installs run the migration over an empty workspace. The new-config branch creates Config with schema_version: 0 (from Default), so run_pending walks an empty workspace. It short-circuits fast, but the intent is wrong — fresh installs have nothing to migrate.
Suggestion: set schema_version: CURRENT_SCHEMA_VERSION in the fresh-install branch so new workspaces skip all legacy migrations by definition.
There was a problem hiding this comment.
Done in c8f5c55 — fresh-install branch now sets schema_version: CURRENT_SCHEMA_VERSION directly so new workspaces skip the migration by definition. The defensive run_pending call stays for symmetry (short-circuits on the gate).
| ); | ||
| } | ||
| Err(err) => { | ||
| stats.errors += 1; |
There was a problem hiding this comment.
[minor] sweep_tainted_md_companions only walks one level of subdirectories under sessions/. If deeper nesting exists (e.g. sessions/2026/05/14/*.md), those files are missed. The JSONL walk has the same single-level assumption. If the layout is guaranteed at most one level deep, worth documenting the assumption explicitly so future contributors know.
There was a problem hiding this comment.
Documented the one-level-deep assumption in c8f5c55 — added a "Layout assumption" paragraph to sweep_tainted_md_companions rustdoc referencing the matching assumption in collect_jsonl_transcripts. Both walkers descend exactly into sessions/<date>/*.md and session_raw/<legacy-date>/*.jsonl, matching the layout produced by agent/harness/session/transcript.
| //! | ||
| //! When `session_raw/` does not exist or contains no transcripts the | ||
| //! migration short-circuits without scanning. The caller still bumps | ||
| //! `schema_version` so future launches don't re-check. |
There was a problem hiding this comment.
[minor] run() is synchronous blocking I/O called from async context. run_pending is an async fn, but phase_out_profile_md::run() does synchronous filesystem I/O (read_dir, read_to_string, write, remove_file). On a workspace with hundreds of transcripts this blocks the tokio runtime thread.
Suggestion: wrap in tokio::task::spawn_blocking in run_pending.
There was a problem hiding this comment.
Fixed in c8f5c55 — run_pending now wraps the migration in tokio::task::spawn_blocking(move || phase_out_profile_md::run(&workspace_dir)) so the fs walk runs on a blocking thread instead of stalling the executor.
| let Some(first) = session.messages.first_mut() else { | ||
| return false; | ||
| }; | ||
| if first.role != "system" { |
There was a problem hiding this comment.
[major] strip_profile_md_block always returns Some when the heading exists, even if the reconstructed output is byte-identical to the input. This means process_transcript will rewrite files unnecessarily, and the skipped counter under-counts.
Add a guard before returning:
if out == prompt {
return None;
}
Some(out)There was a problem hiding this comment.
Good catch, fixed in c8f5c55 — strip_profile_md_block now returns None when out == prompt even after the heading was located, so degenerate-whitespace cases no longer churn the file. Added strip_block_returns_none_when_reconstruction_is_byte_identical to lock the behavior.
| if let Err(err) = config.save().await { | ||
| log::warn!( | ||
| "[migrations] phase_out_profile_md ran but config.save failed: \ | ||
| {err:#} — will retry on next launch" |
There was a problem hiding this comment.
[minor] In-memory schema_version not rolled back on save failure. Line 63 sets config.schema_version = 1, but if config.save() fails (line 64), the early return at line 68 leaves the in-memory config at version 1 while disk says 0. Benign with one migration, but when migration 2 is added the caller sees version 1 in memory and might skip re-running migration 1 on the next code path.
Suggestion: roll back on save failure:
Err(err) => {
config.schema_version = 0;
log::warn!(...);
return;
}There was a problem hiding this comment.
Fixed in c8f5c55 — added the rollback path. If config.save() fails we now reset config.schema_version to its pre-migration value before logging+returning, so the in-memory state never diverges ahead of disk. Added run_pending_rolls_back_schema_version_when_save_fails (forces save() to fail by pointing config_path through a non-directory) to lock the semantics.
| /// the field) load as version `0` and pick up pending migrations on | ||
| /// the first launch of the new build. | ||
| #[serde(default)] | ||
| pub schema_version: u32, |
There was a problem hiding this comment.
[nit] PR title says phase out old PROFILE.md from on schema_version=1 — the from on looks like a leftover edit. Should be on schema_version=1 or from schema_version=0.
There was a problem hiding this comment.
Fixed the PR title — was phase out old PROFILE.md from on schema_version=1, now phase out PROFILE.md from disk on schema_version=1 (matches the commit message).
- load.rs: fresh-install branch stamps `schema_version: CURRENT_SCHEMA_VERSION` up front so new workspaces skip the migration walk by definition. The defensive `run_pending` call stays for symmetry — it short-circuits on the version gate. - mod.rs: wrap `phase_out_profile_md::run` in `tokio::task::spawn_blocking` so the synchronous fs walk doesn't stall the tokio executor when a workspace carries many transcripts. Roll the in-memory `schema_version` back to the pre-migration value if `config.save()` fails, so a later call doesn't see a higher in-memory version than what's on disk. - phase_out_profile_md.rs: `strip_profile_md_block` now returns `None` when the reconstructed string is byte-identical to the input. Document the one-level-deep walk assumption in `sweep_tainted_md_companions` rustdoc. - tests: add `strip_block_returns_none_when_reconstruction_is_byte_identical` and `run_pending_rolls_back_schema_version_when_save_fails`. 24 tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@graycyrus thanks for the review. All six points addressed in
24 tests pass (up from 22), Mind taking another look when you have a minute? |
…tinyhumansai#1734) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
PROFILE.mdinjection across all on-disk artifacts.Config::schema_version(default 0) as the gate; bumped to 1 after the migration completes.Config::load_or_initso it fires before any agent turn can resume a session.Problem
Earlier builds wrote
PROFILE.mdinto the workspace and rendered### PROFILE.mdinto the system prompt for any agent withomit_profile = false(orchestrator, welcome, trigger_*, help). Once injected, the runtime loader (agent/harness/session/turn.rs:1295 try_load_session_transcript) replays those bytes verbatim on every resume for KV-cache stability. A renderer-only fix leaves existing JSONL transcripts and their.mdcompanions replaying the leaked content forever. The producer side (learning/profile_md_renderer.rs,composio/providers/traits.rs:115 merge_provider_into_profile_md) is being retired separately; this PR handles every consumer surface so the leak cannot replay even if the file briefly resurfaces.Solution
New
src/openhuman/migrations/module (sibling to the existing user-triggeredmigration/OpenClaw import) with a runner gated byschema_version:{workspace}/PROFILE.md—inject_workspace_file_cappedskips silently when the file is missing, so the renderer becomes a no-op for the 5omit_profile = falseagents without touching theiragent.toml.session_raw/**/*.jsonl(flat + legacyDDMMYYYY/), read viatranscript::read_transcript, strip### PROFILE.mdfrom the first system message viastrip_profile_md_block, rewrite viatranscript::write_transcript(also re-renders the companion.md).sessions/**/*.mdand alter in place — same stripper, with a broader boundary set (###,---,## [,<!--/MSG-->) so it cleanly bounds both the new JSONL-companion format and the legacy HTML-comment format. The rest of each transcript (all messages, all metadata) is preserved.Hook lives in
Config::load_or_init(config/schema/load.rs) at the end of the existing-config and first-launch branches; the pre-login in-memory branch is deliberately skipped. Idempotent in two ways: the version gate is a fast check on every subsequent call, and each step is self-idempotent (file already gone → no-op, transcript already clean →skipped, .md already clean → byte-identical). Best-effort: any per-file failure folds intostats.errorsand the walk continues; runner-level failures log and never block startup.Live verified against my real workspace (24 JSONLs, 100 .md companions):
scanned=24 cleaned=1 skipped=23 errors=0 profile_md_removed=true md_companions_altered=1, second launch instant no-op. Tests in tempdirs cover the stripper for new-format.md(---boundary), legacy HTML-comment.md(<!--/MSG-->boundary), JSONL with next###heading, JSONL terminating at EOF, JSONL with the truncation footer, plus the version-gate idempotency onrun_pending.Submission Checklist
migrations/module; every public path has direct tests viacargo test openhuman::migrations::(15/15 lib tests + 7 stripper tests pass).TEST-COVERAGE-MATRIX.mdrows track product features, not internal startup migrations.## Related.Closes #NNN.Impact
openhuman-core serve). Fires once per workspace on first launch of the new build.schema_version >= 1early-return and exits the runner in microseconds.schema_version: 0 → 1). No rollback path; if a user re-installs an older build, the field is silently ignored (nodeny_unknown_fields).config.tomlfiles written by older builds load unchanged —#[serde(default)]defaults missingschema_versionto 0 so they pick up the migration on the next launch.merge_provider_into_profile_md(composio/providers/traits.rs:115) andProfileMdRenderer(learning/profile_md_renderer.rs) still exist. If they ever fire, the file reappears and the 5omit_profile = falseagents will re-inject. That cleanup is the explicit v2 follow-up.Related
merge_provider_into_profile_md,ProfileMdRenderer) and flip the 5omit_profile = falseflags in welcome / trigger_triage / trigger_reactor / orchestrator / helpagent.tomlfiles.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
Commit & Branch
chore/phase-out-profile-md09265034313af9f42dd27c96a6af9aff62e90f3bValidation Run
pnpm --filter openhuman-app format:checkpnpm typecheckcargo test --lib openhuman::migrations::→ 22 passed; 0 failedcargo fmt --check(clean) +cargo check(only pre-existing warnings)Validation Blocked
command:N/Aerror:N/Aimpact:N/ABehavior Changes
PROFILE.mdand no tainted persisted transcripts.Parity Contract
### PROFILE.mdblock; the renderer code path is unchanged, only the file it would read is gone.load_or_initstill falls back toDefault::default()on config-load failure (line 798); migration simply doesn't run that launch and retries on the next.🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests