fix: reduce memory overflow from checkpoint reads and writes (#344)#571
fix: reduce memory overflow from checkpoint reads and writes (#344)#571
Conversation
- Make append_checkpoint truly append-only (O(1) instead of O(N) read-write-all) - Use BufReader for streaming JSONL reads instead of read_to_string - Eliminate 3 redundant read_all_checkpoints() calls in checkpoint::run() - Pass pre-loaded checkpoints to get_all_tracked_files - Defer char-level attribution pruning to write_all_checkpoints - Use BufWriter for efficient checkpoint serialization Addresses #344 Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
No AI authorship found for these commits. Please install git-ai to start tracking AI generated code in your commits. |
|
|
Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
|
devin review devin review's feedback |
…ent data loss Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
…during agent loops Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
…_checkpoint) Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
|
Devin address the feedback |
…t=true and is_pre_commit=true Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
|
No AI authorship found for these commits. Please install git-ai to start tracking AI generated code in your commits. |
|
Addressed the Devin Review feedback in ef4a4d7: Reset reordering fix: Moved |
fix: reduce memory overflow from checkpoint reads and writes (#344)
Summary
Addresses the runaway memory usage (30-60GB) reported in #344 by fixing the highest-impact patterns in checkpoint I/O:
1. Streaming prune-and-rewrite on append (
repo_storage.rs):append_checkpointpreviously read ALL checkpoints into memory, appended one, pruned, then wrote ALL back — O(N) memory per append. Now it streams existing checkpoints one-at-a-time through aBufReader, prunes char-level attributions for files superseded by the new checkpoint, writes to a temp file viaBufWriter, appends the new checkpoint, then atomically renames. Peak memory is one checkpoint + the new checkpoint, rather than all checkpoints. The file stays small throughout long agent loops because pruning happens on every append.2. Eliminate redundant full reads (
checkpoint.rs): A singlecheckpoint::run()call previously triggered 4+ independentread_all_checkpoints()deserializations of the entire JSONL file. Now checkpoints are read once at the top ofrun()and passed through toget_all_tracked_filesvia a newpreloaded_checkpointsparameter.3. Streaming reads (
repo_storage.rs):read_all_checkpointsnow usesBufReaderline-by-line instead offs::read_to_string, avoiding holding the full file string and parsed structs in memory simultaneously.4. BufWriter for writes (
repo_storage.rs):write_all_checkpointsnow streams serialization throughBufWriterinstead of building a full string in memory. An explicitflush()call ensures write errors are propagated rather than silently dropped onBufWriter::drop.All 38 checkpoint-related unit tests pass (31 checkpoint tests + 7 repo_storage tests). No new dependencies added.
Updates since last revision
Major rework: The initial approach deferred char-level attribution pruning to
write_all_checkpoints, but this left un-pruned data in the file during the intra-commit loop where memory issues are worst. The new approach prunes on everyappend_checkpointcall using a streaming read-modify-write pattern that keeps only one checkpoint in memory at a time. Theprune_old_char_attributionsmethod was removed entirely (logic inlined intoappend_checkpoint).Reset ordering fix (ef4a4d7): Fixed a data loss bug where
reset=true+is_pre_commit=truewould destroy the working log before the early-exit check, causing the check to always pass (empty checkpoints → no AI edits) and skip checkpoint creation. Now checkpoints are always read first for the early-exit check, then reset happens afterward if needed.Other changes:
write_all_checkpointssignature reverted to&[Checkpoint](no longer needs&mutsince pruning moved to append)writer.flush()?;in bothappend_checkpointandwrite_all_checkpointsto ensure I/O errors are propagatedReview & Testing Checklist for Human
reset=trueandis_pre_commit=true, verify that the early-exit check correctly inspects the old data before reset, and that the subsequent reset properly clears both disk state and in-memory checkpoints. Test with a repo that has AI checkpoints, runcheckpoint --resetduring pre-commit, and verify a fresh checkpoint is created (not skipped).append_checkpointassumes the checkpoint being appended is the newest for its files (clears attributions from older entries with matching files). This should always be true since we append chronologically, but verify no code path appends out-of-order or writes un-pruned checkpoints viawrite_all_checkpointsdirectly.git commitand that attributions are correctly preserved end-to-end. Unit tests validate correctness but not the memory improvement.append_checkpointwrites tocheckpoints.jsonl.tmpthen renames. If the process crashes mid-write, the temp file is left behind (harmless but clutters.git/ai/). Consider if cleanup is needed.Notes
get_all_tracked_filesgained an optionalpreloaded_checkpointsparameter. Existing callers that don't pass it will still work (reads from disk as before).write_all_checkpointsno longer prunes. If any code path writes un-pruned data viawrite_all_checkpoints, it won't be pruned until the nextappend_checkpoint.has_no_ai_editslogic was rewritten fromall_ai_touched_files().is_empty()tocheckpoints.iter().all(|cp| cp.entries.is_empty() || cp.kind != AiAgent/AiTab). These are logically equivalent but worth a careful trace through both code paths.Link to Devin run: https://app.devin.ai/sessions/2a46b6eaa71f4f46913488bef2ff52a1
Requested by: @svarlamov