Skip to content

Enable loading data sets from files for custom tasks#1083

Merged
NathanHB merged 11 commits intohuggingface:mainfrom
davebiagioni:enable-data-files
Jan 6, 2026
Merged

Enable loading data sets from files for custom tasks#1083
NathanHB merged 11 commits intohuggingface:mainfrom
davebiagioni:enable-data-files

Conversation

@davebiagioni
Copy link
Copy Markdown
Contributor

Purpose: Allow custom tasks to load datasets from local files, not only from the Hugging Face Hub. Useful for offline / air‑gapped / otherwise restricted environments

Changes

  • Config: Add optional hf_data_files to LightevalTaskConfig.
  • Loader: Forward hf_data_files as data_files to datasets.load_dataset in LightevalTask.download_dataset_worker.
  • Examples: Add examples/custom_tasks_templates/custom_yourbench_task_from_files.py.
  • Docs: Update docs/source/adding-a-custom-task.mdx with file-based usage.

Checklist

  • Tests pass locally
  • Pre-commit hooks pass locally
  • Added/updated documentation

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@NathanHB NathanHB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great ! few nits and good to merge if tests pass :)

Comment thread docs/source/adding-a-custom-task.mdx Outdated
Comment thread docs/source/adding-a-custom-task.mdx
Comment thread src/lighteval/tasks/lighteval_task.py Outdated
@davebiagioni
Copy link
Copy Markdown
Contributor Author

@NathanHB review comments addressed. thanks!

Comment thread docs/source/offline-evaluation.md Outdated
@davebiagioni
Copy link
Copy Markdown
Contributor Author

@NathanHB wondering about your preferred way to keep approved but unmerged branches up to date with main. should i just hold off until we're closer to merging, or update as we go? i don't want to trigger your CI/CD pipeline more than is needed 😄 thanks again!

@NathanHB
Copy link
Copy Markdown
Member

hey @davebiagioni thanks for asking :)
I usually just merge with, run tests and merge as soon as they are green, we had low bandwidth recently so forgot to merge your but merging today !

@NathanHB NathanHB merged commit 03d8c4e into huggingface:main Jan 6, 2026
4 checks passed
@davebiagioni
Copy link
Copy Markdown
Contributor Author

Thanks @NathanHB !🙏

@NathanHB
Copy link
Copy Markdown
Member

NathanHB commented Jan 7, 2026

thank you, and happy new year :)

rolshoven pushed a commit to rolshoven/lighteval that referenced this pull request Mar 17, 2026
* enable use of data files for custom tasks

* addressing PR comments, create new doc file, update docstring with types

* Update docs/source/offline-evaluation.md

Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

* add new doc to toctree

* Add offline evaluation section to documentation

---------

Co-authored-by: David Biagioni <dbiagioni@proofpoint.com>
Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>
Co-authored-by: Dave Biagioni <7086434+davebiagioni@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants