Skip to content

Add OpenAI benchmark sweep and analysis tooling#99

Closed
Muhtasham wants to merge 1 commit intoCodeClash-ai:mainfrom
Muhtasham:codex/feat-openai-sweep-tooling
Closed

Add OpenAI benchmark sweep and analysis tooling#99
Muhtasham wants to merge 1 commit intoCodeClash-ai:mainfrom
Muhtasham:codex/feat-openai-sweep-tooling

Conversation

@Muhtasham
Copy link
Copy Markdown
Contributor

Summary

  • add OpenAI benchmark runner scripts, progress watching, and eval pipeline helpers for sweep workflows
  • add generated configs, reporting scripts, and feedback docs for GPT-5.4 / GPT-5.3-Codex analysis
  • preserve model aliases in Elo and win-rate analysis, plus fix RoboCode tie handling

Changes

  • sweep tooling: add run_openai_sweep.sh, run_openai_model_benchmarks.sh, run helpers, watcher updates, and eval/report scripts
  • analysis: update Elo/win-rate/heatmap display logic to keep per-alias model names and generate shareable comparison plots
  • arenas: fix RoboCode round winner selection on tied scores and add a regression test
  • configs/docs: add generated OpenAI matchup configs, ablation scaffold files, and OpenAI feedback notes

Test plan

  • reran the eval pipeline on completed OpenAI sweep logs and generated leaderboard/plot artifacts
  • verified RoboCode tie handling with a direct uv run python harness returning Tie for 0-0 scores
  • run full automated pytest coverage once pytest is installed in the project environment

- add benchmark runners, watcher, and eval pipeline scripts for OpenAI sweeps

- add generated configs, reporting utilities, and OpenAI feedback notes

- preserve model aliases in analysis and fix RoboCode tie handling
@Muhtasham Muhtasham closed this Apr 26, 2026
@Muhtasham Muhtasham deleted the codex/feat-openai-sweep-tooling branch April 26, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant