Distinguish the models used in the executor and evaluator

Suggestion from @Qian-Cheng-nju: I think it might be beneficial to extend the env.toml format to distinguish between the LLM used for workflow/judging and the LLM used as the model under evaluation. With such a structure, the code could then check whether the required evaluation-model API keys are provided. This would make it clearer to users which API keys are optional and which ones are necessary for the benchmark to run correctly.

This also seems helpful for future integrations — when multiple benchmarks coexist in the same installation, I think we need to make it clear to users that some API keys are mandatory; otherwise, the benchmark will not run correctly. Having this distinction also allows us to perform checks on those required API keys before execution.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distinguish the models used in the executor and evaluator #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Distinguish the models used in the executor and evaluator #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions