Skip to content

feat: promptfoo config export (EVAL.yaml → promptfooconfig.yaml) #272

@christso

Description

@christso

Summary

Add agentv export promptfoo command that converts EVAL.yaml into promptfooconfig.yaml, enabling users to run AgentV test suites in promptfoo for red-teaming and broader assertion coverage.

Motivation

promptfoo has 50+ vulnerability plugins for red-teaming and security testing that AgentV doesn't replicate. By exporting EVAL.yaml → promptfooconfig.yaml, users can define test cases once in AgentV and run security scans in promptfoo.

Research reference: integration-assessment-promptfoo-braintrust.md

Key Translation Challenges

  • AgentV's tool_trajectory evaluator (structured match modes) has no direct promptfoo equivalent — decompose into javascript assertion with custom logic
  • AgentV's workspace setup/teardown has no promptfoo equivalent — export as test.options.setup/test.options.teardown custom scripts
  • AgentV's structured provider output (OutputMessage[]) is richer than promptfoo's text output — information loss is expected and should be documented

CLI Interface

# Convert EVAL.yaml to promptfoo config
agentv export promptfoo ./EVAL.yaml

# Convert with output path
agentv export promptfoo ./EVAL.yaml -o ./promptfooconfig.yaml

# Dry run
agentv export promptfoo ./EVAL.yaml --dry-run

Evaluator → Assertion Mapping (Reverse of Import)

AgentEvals evaluator promptfoo assertion Notes
llm_judge (freeform) llm-rubric Direct
llm_judge (rubric) g-eval or model-graded-closedqa Depends on rubric structure
field_accuracy (exact) equals Direct
field_accuracy (contains) contains Direct
field_accuracy (regex) regex Direct
field_accuracy (json_valid) is-json Direct
tool_trajectory javascript (custom) No native equivalent
execution_metrics cost / latency Split into individual assertions
code_judge javascript or python Based on language
composite assert-set (if threshold) or inline Depends on aggregation strategy

Acceptance Criteria

  • Converts core evaluator types to promptfoo assertions
  • Generates valid promptfooconfig.yaml that passes promptfoo eval --dry-run
  • Handles tool_trajectory → custom JavaScript assertion with clear comments
  • Preserves test case descriptions and metadata
  • Warns about information loss (structured outputs → text)
  • --dry-run flag shows the mapping without writing

Effort Estimate

3-5 days (after import is complete, reuses mapping logic)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions