Skip to content

Distinguish the models used in the executor and evaluator #1

@xuafeng

Description

@xuafeng

Suggestion from @Qian-Cheng-nju: I think it might be beneficial to extend the env.toml format to distinguish between the LLM used for workflow/judging and the LLM used as the model under evaluation. With such a structure, the code could then check whether the required evaluation-model API keys are provided. This would make it clearer to users which API keys are optional and which ones are necessary for the benchmark to run correctly.

This also seems helpful for future integrations — when multiple benchmarks coexist in the same installation, I think we need to make it clear to users that some API keys are mandatory; otherwise, the benchmark will not run correctly. Having this distinction also allows us to perform checks on those required API keys before execution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions