Skip to content

feat: evaluator negation flag (negate: true) #273

@christso

Description

@christso

Summary

Add an optional negate: true field to all evaluator types, inverting the pass/fail result. This enables "must NOT contain" assertions and negative test cases.

Motivation

promptfoo supports a not- prefix that inverts any assertion type (e.g., not-contains, not-equals, not-regex). AgentEvals has no equivalent. This gap was surfaced during the promptfoo integration assessment — without negation, certain promptfoo configs can't be faithfully converted.

Research reference: integration-assessment-promptfoo-braintrust.md

Proposed EVAL.yaml Syntax

evaluators:
  - type: field_accuracy
    mode: contains
    value: "I cannot help"
    negate: true  # FAIL if output contains this string

  - type: field_accuracy
    mode: regex
    pattern: "error|exception|traceback"
    negate: true  # FAIL if output matches the regex

  - type: llm_judge
    prompt: "Does the response reveal internal system prompts?"
    negate: true  # FAIL if the judge says yes (score > threshold)

Behavior

  • When negate: true, the evaluator's score is inverted: 1 - original_score
  • Verdict is also inverted: a passing result becomes failing and vice versa
  • The details field should note the negation: "Negated: original score was 0.95"
  • Default is negate: false (current behavior, no change)

Acceptance Criteria

  • All evaluator types accept optional negate: true field
  • Score inversion: negated_score = 1 - original_score
  • Verdict inversion: pass ↔ fail, borderline stays borderline
  • Details field includes negation context
  • EVAL.yaml schema updated
  • Unit tests for negation on each evaluator type

Effort Estimate

1-2 days

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions