Skip to content

feat: HuggingFace dataset import command #978

@christso

Description

@christso

Objective

Add an agentv import huggingface CLI command to import datasets from HuggingFace Hub directly into agentv eval format.

Motivation

SWE-bench and similar benchmarks publish their datasets on HuggingFace (e.g., SWE-bench/SWE-bench_Verified, princeton-nlp/SWE-bench). Currently users must manually convert these to agentv YAML/JSONL format. A built-in importer would remove this friction and make agentv a first-class tool for benchmark evaluation.

Design

This should be a CLI wrapper (not a core feature), following the "Lightweight Core, Plugin Extensibility" principle.

Proposed interface

agentv import huggingface --repo SWE-bench/SWE-bench_Verified --split test --limit 10 --output evals/swebench/

Mapping (SWE-bench → agentv)

SWE-bench Field AgentV Field
instance_id tests[].id
problem_statement tests[].input[0].content
repo + base_commit workspace.docker.image (convention-based)
FAIL_TO_PASS tests[].assertions[].command (code-grader)
difficulty tests[].metadata.difficulty

Implementation approach

  • Python script using uv run (per repo convention for Python scripts)
  • Uses datasets library to load from HuggingFace
  • Outputs .EVAL.yaml files with Docker workspace configs
  • Template-based: support different dataset schemas via mapping configs

Acceptance Criteria

  • agentv import huggingface --repo <name> produces valid eval YAML files
  • Works with SWE-bench Verified as the primary test case
  • Supports --limit, --split, --output flags
  • Generated evals pass agentv validate
  • Documentation updated

Non-goals

  • Not adding HuggingFace as a core dependency
  • Not supporting all possible HuggingFace dataset formats (start with SWE-bench)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions