Skip to content

dev-fleet/evals

Repository files navigation

eval-cc

A CLI tool that runs Claude Code against a GitHub PR inside a Docker sandbox.

Prerequisites

  • Node.js v20 or later
  • Docker installed and running
  • A GitHub personal access token (for private repos)
  • An Anthropic API key

Setup

# Install dependencies
npm install

Configuration

Create a .eval-cc.json file in your working directory:

cp .eval-cc.json.example .eval-cc.json

Edit the file with your settings:

{
  "repository": "owner/repo",
  "githubToken": "ghp_your_github_token",
  "anthropicApiKey": "sk-ant-your_anthropic_key"
}

Environment Variables

You can also provide tokens via environment variables (they take precedence over the config file):

export GITHUB_TOKEN="ghp_your_github_token"
export ANTHROPIC_API_KEY="sk-ant-your_anthropic_key"

Usage

npm run eval -- --prompt <path-to-prompt.md> --pr <pr-number>

Options

Option Alias Required Description
--prompt -p Yes Path to the prompt file
--pr Yes PR number to checkout
--config -c No Path to config file (default: .eval-cc.json)
--rebuild -r No Force rebuild of Docker image
--help -h No Display help
--version -V No Display version

Examples

# Run with a prompt file against PR #5
npm run eval -- --prompt prompts/review.md --pr 5

# Force rebuild the Docker image (after changing Dockerfile or entrypoint)
npm run eval -- --prompt prompts/review.md --pr 5 --rebuild

# Use a custom config file
npm run eval -- --prompt prompts/review.md --pr 5 --config ./my-config.json

# Show help
npm run eval -- --help

How It Works

  1. Loads configuration from .eval-cc.json (or specified config file)
  2. Reads the prompt file into memory
  3. Builds a Docker image with Claude Code installed (first run only)
  4. Runs a Docker container that:
    • Clones the GitHub repository
    • Fetches and checks out the specified PR
    • Runs Claude Code with the prompt and restricted tool permissions
  5. Streams all output to your terminal
  6. Exits when Claude Code completes

Allowed Tools

Claude Code runs with the following tools enabled:

  • Bash(git diff:*) - View git diffs
  • Bash(git status:*) - Check git status
  • Bash(git log:*) - View git history
  • Bash(git show:*) - Show git objects
  • Bash(git remote show:*) - Show remote info
  • Read - Read files
  • Glob - Find files by pattern
  • Grep - Search file contents
  • LS - List directories
  • Task - Create subtasks

Development

# Type check
npm run lint

License

MIT

About

Test agents

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published