-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Suggestion from @Qian-Cheng-nju: I think it might be beneficial to extend the env.toml format to distinguish between the LLM used for workflow/judging and the LLM used as the model under evaluation. With such a structure, the code could then check whether the required evaluation-model API keys are provided. This would make it clearer to users which API keys are optional and which ones are necessary for the benchmark to run correctly.
This also seems helpful for future integrations — when multiple benchmarks coexist in the same installation, I think we need to make it clear to users that some API keys are mandatory; otherwise, the benchmark will not run correctly. Having this distinction also allows us to perform checks on those required API keys before execution.