Summary
Add agentv export promptfoo command that converts EVAL.yaml into promptfooconfig.yaml, enabling users to run AgentV test suites in promptfoo for red-teaming and broader assertion coverage.
Motivation
promptfoo has 50+ vulnerability plugins for red-teaming and security testing that AgentV doesn't replicate. By exporting EVAL.yaml → promptfooconfig.yaml, users can define test cases once in AgentV and run security scans in promptfoo.
Research reference: integration-assessment-promptfoo-braintrust.md
Key Translation Challenges
- AgentV's
tool_trajectory evaluator (structured match modes) has no direct promptfoo equivalent — decompose into javascript assertion with custom logic
- AgentV's workspace setup/teardown has no promptfoo equivalent — export as
test.options.setup/test.options.teardown custom scripts
- AgentV's structured provider output (
OutputMessage[]) is richer than promptfoo's text output — information loss is expected and should be documented
CLI Interface
# Convert EVAL.yaml to promptfoo config
agentv export promptfoo ./EVAL.yaml
# Convert with output path
agentv export promptfoo ./EVAL.yaml -o ./promptfooconfig.yaml
# Dry run
agentv export promptfoo ./EVAL.yaml --dry-run
Evaluator → Assertion Mapping (Reverse of Import)
| AgentEvals evaluator |
promptfoo assertion |
Notes |
llm_judge (freeform) |
llm-rubric |
Direct |
llm_judge (rubric) |
g-eval or model-graded-closedqa |
Depends on rubric structure |
field_accuracy (exact) |
equals |
Direct |
field_accuracy (contains) |
contains |
Direct |
field_accuracy (regex) |
regex |
Direct |
field_accuracy (json_valid) |
is-json |
Direct |
tool_trajectory |
javascript (custom) |
No native equivalent |
execution_metrics |
cost / latency |
Split into individual assertions |
code_judge |
javascript or python |
Based on language |
composite |
assert-set (if threshold) or inline |
Depends on aggregation strategy |
Acceptance Criteria
Effort Estimate
3-5 days (after import is complete, reuses mapping logic)
Summary
Add
agentv export promptfoocommand that convertsEVAL.yamlintopromptfooconfig.yaml, enabling users to run AgentV test suites in promptfoo for red-teaming and broader assertion coverage.Motivation
promptfoo has 50+ vulnerability plugins for red-teaming and security testing that AgentV doesn't replicate. By exporting EVAL.yaml → promptfooconfig.yaml, users can define test cases once in AgentV and run security scans in promptfoo.
Research reference: integration-assessment-promptfoo-braintrust.md
Key Translation Challenges
tool_trajectoryevaluator (structured match modes) has no direct promptfoo equivalent — decompose intojavascriptassertion with custom logictest.options.setup/test.options.teardowncustom scriptsOutputMessage[]) is richer than promptfoo's text output — information loss is expected and should be documentedCLI Interface
Evaluator → Assertion Mapping (Reverse of Import)
llm_judge(freeform)llm-rubricllm_judge(rubric)g-evalormodel-graded-closedqafield_accuracy(exact)equalsfield_accuracy(contains)containsfield_accuracy(regex)regexfield_accuracy(json_valid)is-jsontool_trajectoryjavascript(custom)execution_metricscost/latencycode_judgejavascriptorpythoncompositeassert-set(if threshold) or inlineAcceptance Criteria
promptfooconfig.yamlthat passespromptfoo eval --dry-runtool_trajectory→ custom JavaScript assertion with clear comments--dry-runflag shows the mapping without writingEffort Estimate
3-5 days (after import is complete, reuses mapping logic)