Skip to content

Add per-eval Docker image override via evals.image property#2153

Merged
dgageot merged 1 commit intodocker:mainfrom
dgageot:board/docker-agent-evals-runs-evals-in-a-docke-fe49578a
Mar 18, 2026
Merged

Add per-eval Docker image override via evals.image property#2153
dgageot merged 1 commit intodocker:mainfrom
dgageot:board/docker-agent-evals-runs-evals-in-a-docke-fe49578a

Conversation

@dgageot
Copy link
Member

@dgageot dgageot commented Mar 18, 2026

Allow each eval JSON to specify a custom Docker image through the
"image" field in the "evals" object, overriding the global --base-image
flag. The image build cache key now includes both workingDir and image
to correctly handle different images for the same working directory.

Assisted-By: docker-agent

Allow each eval JSON to specify a custom Docker image through the
"image" field in the "evals" object, overriding the global --base-image
flag. The image build cache key now includes both workingDir and image
to correctly handle different images for the same working directory.

Assisted-By: docker-agent
Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

This PR adds per-eval Docker image override functionality through the evals.image property. The implementation is sound:

Key Changes:

  • Introduced imageKey struct combining workingDir and image for cache keys
  • Modified getOrBuildImage to accept *session.EvalCriteria instead of just workingDir
  • Added resolveBaseImage to prioritize per-eval image over global --base-image flag
  • Updated preBuildImages to handle the new cache key structure
  • Simplified Dockerfile template (removed docker-in-docker setup)

Verification Results:
All potential issues were investigated and dismissed:

  • Zero-valued imageKey when eval.Evals is nil is intentional and correct (all evals without custom config should share the same base image)
  • Singleflight correctly deduplicates concurrent builds and doesn't cache errors indefinitely
  • All nil pointer access paths are guarded by callers

The cache key design correctly handles the case where multiple evals need the same image configuration, and the singleflight integration prevents redundant concurrent builds.

No issues found in the changed code.

@dgageot dgageot merged commit bd1f0bf into docker:main Mar 18, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants