Skip to content

test: add prime env eval E2E tests#224

Merged
JannikSt merged 7 commits intomainfrom
test/add-env-eval-tests
Dec 2, 2025
Merged

test: add prime env eval E2E tests#224
JannikSt merged 7 commits intomainfrom
test/add-env-eval-tests

Conversation

@JannikSt
Copy link
Copy Markdown
Member

@JannikSt JannikSt commented Dec 2, 2025

Summary

  • Adds E2E tests for prime env eval command
  • Tests run against Prime Inference with the single_turn_math environment
  • Tests include:
    • Successful evaluation with 1 example/1 rollout
    • Error handling for invalid model names
    • Error handling for missing environments

Test plan

  • Tests require PRIME_API_KEY environment secret (already configured in CI)
  • Run with: uv run pytest packages/prime/tests/test_env_eval.py -v

Note

Adds E2E tests for prime env eval covering a successful run and graceful failures for invalid model and missing environment.

  • Tests:
    • E2E for prime env eval in packages/prime/tests/test_env_eval.py:
      • Installs single_turn_math test environment via uv pip (with cleanup).
      • Verifies successful eval against single_turn_math using model deepseek/deepseek-chat (1 example, 1 rollout).
      • Asserts failure with informative output for invalid model names.
      • Asserts failure when the specified environment is missing.
      • Runs in isolated temp directories and forwards PRIME_API_KEY from environment.

Written by Cursor Bugbot for commit 8d353d7. This will update automatically on new commits. Configure here.

@JannikSt JannikSt force-pushed the test/add-env-eval-tests branch from 7834e32 to 8d353d7 Compare December 2, 2025 03:45
@JannikSt JannikSt merged commit d416604 into main Dec 2, 2025
11 checks passed
@JannikSt JannikSt deleted the test/add-env-eval-tests branch December 2, 2025 04:01
@JannikSt JannikSt mentioned this pull request Dec 4, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant