Skip to content

Conversation

@ammar-agent
Copy link
Collaborator

Add safety guidance to the tbench skill for HuggingFace leaderboard uploads:

  • Always check for existing open PRs before creating new ones β€” uploads often timeout but succeed server-side, creating orphaned duplicate PRs
  • Push corrections to existing PRs via revision param, never re-create
  • Close accidental duplicates immediately
  • Do not coalesce multiple runs into a single job folder β€” the validator checks that each trial's config.job_id matches its parent job's id

Generated with mux β€’ Model: anthropic:claude-opus-4-6 β€’ Thinking: xhigh β€’ Cost: $22.40

- Always check for existing open PRs before creating new ones (uploads
  often timeout but succeed server-side)
- Push corrections to existing PRs via revision param, never re-create
- Close accidental duplicates immediately
- Do not coalesce runs into one job folder (validator checks job_id)
@ammar-agent ammar-agent force-pushed the bench/multi-run-leaderboard-submission branch from f3f8874 to 6b5f056 Compare February 12, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant