Measuring what Matters is a systematic review of construct validity practices in benchmarks for LLMs. This repository contains the code and data from the review.
am-bean/benchmark_review
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
| Name | Name | Last commit date | ||
|---|---|---|---|---|