Skip to content

Add graduated rubric scoring (partial credit) #331

@ScuttleBot

Description

@ScuttleBot

Goal

Convert binary 0/1 criteria to graduated 0.0 / 0.25 / 0.5 / 0.75 / 1.0 scales to better differentiate model quality.

Background

Current tasks often score 0 or 1 per criterion, missing subtle differences in solution quality. A model that produces a correct but inefficient solution scores the same as one with an elegant approach.

Implementation

  1. Update grading infrastructure to support graduated scores
  2. For each criterion, define quality levels:
    • 0.0: Missing/wrong
    • 0.25: Partially correct, major issues
    • 0.5: Mostly correct, some issues
    • 0.75: Correct with minor issues
    • 1.0: Fully correct/excellent
  3. Update existing tasks to use graduated criteria where appropriate

Success Criteria

  • Score distribution should show more variance (not clustering at 0 and 1)
  • Model rankings should be more stable (less sensitivity to binary cutoffs)

References

  • RACE Benchmark multi-dimensional scoring
  • Item Response Theory discrimination parameters

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions