Add graduated rubric scoring (partial credit)

## Goal
Convert binary 0/1 criteria to graduated 0.0 / 0.25 / 0.5 / 0.75 / 1.0 scales to better differentiate model quality.

## Background
Current tasks often score 0 or 1 per criterion, missing subtle differences in solution quality. A model that produces a correct but inefficient solution scores the same as one with an elegant approach.

## Implementation
1. Update grading infrastructure to support graduated scores
2. For each criterion, define quality levels:
   - 0.0: Missing/wrong
   - 0.25: Partially correct, major issues
   - 0.5: Mostly correct, some issues
   - 0.75: Correct with minor issues
   - 1.0: Fully correct/excellent
3. Update existing tasks to use graduated criteria where appropriate

## Success Criteria
- Score distribution should show more variance (not clustering at 0 and 1)
- Model rankings should be more stable (less sensitivity to binary cutoffs)

## References
- RACE Benchmark multi-dimensional scoring
- Item Response Theory discrimination parameters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graduated rubric scoring (partial credit) #331

Goal

Background

Implementation

Success Criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add graduated rubric scoring (partial credit) #331

Description

Goal

Background

Implementation

Success Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions