Add long-context tasks requiring selective retrieval

## Goal
Test ability to manage, navigate, and selectively retrieve from large contexts. Many current tasks provide all information upfront in digestible chunks.

## Capabilities to Test

1. **Selective retrieval**: Find relevant info in large documents
2. **State maintenance**: Track information across many turns
3. **Multi-source synthesis**: Combine info from multiple documents

## Task Ideas

### Long Document Tasks
- 50-page meeting transcript → extract action items for ONE specific attendee
- Large codebase (10+ files) → find and fix bug described only by symptoms
- Legal document → answer specific questions requiring cross-referencing sections

### Multi-Document Synthesis
- Combine info from 5+ related documents into coherent analysis
- Reconcile conflicting information across sources
- Build timeline from scattered references

### State Tracking
- Multi-turn conversation requiring recall of earlier details
- Incremental updates to a complex data structure
- Long debugging session requiring memory of what was tried

## Implementation Notes
- Use workspace_files to provide large documents
- Ensure relevant info is buried, not at the start
- Include plausible distractors

## Success Criteria
- Tasks should require >10K tokens of context navigation
- Models should fail if they can't selectively attend
- Synthesis tasks should differentiate summarization quality

## References
- Long-context benchmarks (RULER, LongBench)
- Real-world: analysts deal with large document sets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add long-context tasks requiring selective retrieval #336

Goal

Capabilities to Test

Task Ideas

Long Document Tasks

Multi-Document Synthesis

State Tracking

Implementation Notes

Success Criteria

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add long-context tasks requiring selective retrieval #336

Description

Goal

Capabilities to Test

Task Ideas

Long Document Tasks

Multi-Document Synthesis

State Tracking

Implementation Notes

Success Criteria

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions