Add foundational support for durable storage#135
Draft
krisztianfekete wants to merge 1 commit intomainfrom
Draft
Add foundational support for durable storage#135krisztianfekete wants to merge 1 commit intomainfrom
krisztianfekete wants to merge 1 commit intomainfrom
Conversation
18785bd to
99247be
Compare
99247be to
5c6d499
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is opt-in (AGENTEVALS_STORAGE_BACKEND=postgres), so the existing in-memory developer experience is unchanged: agentevals run trace.json keeps working, the React UI behaves identically, OTLP streaming is untouched.
Setup
Look for these log lines on startup:
The async run pipeline (POST /api/runs)
Submit a run, watch the worker pick it up, read the persisted results back:
Idempotency, 409, and cancel
Existing /api/evaluate flows persist when backend=postgres
UI uploads, multipart curl, SSE stream, and the JSON variant all now write a Run row plus Result rows. The response carries an extra runId field that wasn't there before. No UI changes required.
Each call yields a new run row with target.kind = "uploaded". That's the OSS user-facing benefit of this PR: persistent run history for any eval that flows through the existing endpoints.
Inspecting the data in Postgres
Live tail while exercising the worker:
Crash recovery
Submit a slow run using a bigger trace, then Ctrl+C the agentevals process. Wait roughly 35 seconds (one lease window plus slack), restart with
make dev-backend-pg. The previously claimed run is re-claimed by a new worker via theSKIP LOCKEDpredicate and completes; the run row's attempt counter reads 2.Memory backend regression (zero-config flow unchanged)
Cleanup