This parent issue covers the implementation work needed to run baseline zero-shot evaluations on selected LLMs. Sub-issues under this parent should define the shared API interface, response logging, scoring, and execution of the zero-shot runs.
This parent issue covers the implementation work needed to run baseline zero-shot evaluations on selected LLMs.
Sub-issues under this parent should define the shared API interface, response logging, scoring, and execution of the zero-shot runs.