Skip to content

Add adversarial/edge-case trap tasks #335

@ScuttleBot

Description

@ScuttleBot

Goal

Add tasks where the obvious approach is wrong, testing genuine reasoning over pattern matching.

Task Types

Red Herring Tasks

  • Provide irrelevant but distracting information
  • Include "obvious" solution that fails on edge cases
  • Context that suggests wrong approach

Edge Case Gauntlets

  • Off-by-one scenarios in dates/times/counting
  • Boundary conditions (empty lists, single items, max values)
  • Unicode/encoding edge cases
  • Timezone handling across DST boundaries

Inherited Mess Tasks (Recovery-Bench style)

  • Workspace containing prior failed attempts that need cleanup
  • Broken state that agent must diagnose before fixing
  • Conflicting partial solutions left behind

Specific Task Ideas

  1. The Misleading Log: Error message points to wrong root cause
  2. Off-by-One Gauntlet: 5 date/time operations where edges matter
  3. The Cleanup Job: Previous agent left half-done work with bugs
  4. The Obvious Trap: Task where copy-paste solution from docs fails

Grading

  • Binary: did they avoid the trap?
  • Bonus: did they explain why the obvious approach fails?

Success Criteria

  • Traps should catch >30% of models
  • Tasks should differentiate reasoning vs. pattern matching

References

  • ARC-AGI design philosophy
  • Recovery-Bench: evaluating error recovery

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions