Skip to content

Datasets/lm evaluation library#724

Merged
ArshaanNazir merged 24 commits intorelease/1.4.0from
dataset-lm-evaluation-library
Aug 31, 2023
Merged

Datasets/lm evaluation library#724
ArshaanNazir merged 24 commits intorelease/1.4.0from
dataset-lm-evaluation-library

Conversation

@RakshitKhajuria
Copy link
Copy Markdown
Contributor

@RakshitKhajuria RakshitKhajuria commented Aug 24, 2023

Description

This PR aims at adding Benchmark Datasets For QA task

Datasets Added:

  • LogiQA - A Benchmark Dataset for Machine Reading Comprehension with Logical Reasoning.

  • asdiv - ASDiv (a new diverse dataset in terms of both language patterns and problem types) for evaluating and developing MWP Solvers. It contains 2305 english Math Word Problems (MWPs), and is published in this paper "A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers".

  • Google/Bigbench - The Beyond the Imitation Game Benchmark (BIG-bench) is a collaborative benchmark intended to probe large language models and extrapolate their future capabilities. Tasks included in BIG-bench are summarized by keyword here, and by task name here

    We added some of the subsets to our library:
      1. AbstractUnderstanding
      2. DisambiguationQA
      3. Disfil qa
      4. Casual Judgement
    

➤ Fixes: Explore lm evaluation library for good datasets #556

➤ Notebook Links:

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Usage

Checklist:

  • I've added Google style docstrings to my code.
  • I've used pydantic for typing when/where necessary.
  • I have linted my code
  • I have added tests to cover my changes.

Screenshots (if appropriate):

LogiQA

image

ASDiv

image

BigBench

image

@RakshitKhajuria RakshitKhajuria added the v2.1.0 Issue or request to be done in v2.1.0 release label Aug 24, 2023
@Prikshit7766 Prikshit7766 linked an issue Aug 24, 2023 that may be closed by this pull request
@ArshaanNazir ArshaanNazir self-requested a review August 31, 2023 05:15
@ArshaanNazir ArshaanNazir merged commit 5f0e3fb into release/1.4.0 Aug 31, 2023
@ArshaanNazir
Copy link
Copy Markdown
Contributor

ArshaanNazir commented Aug 31, 2023

@RakshitKhajuria @Prikshit7766 have you prepared any NB for it. Give link to NB in PR description with some screenshots of generated results.

@RakshitKhajuria
Copy link
Copy Markdown
Contributor Author

NB for it. Give link to NB in PR description with some screenshots of generated results.

Updated

@ArshaanNazir ArshaanNazir deleted the dataset-lm-evaluation-library branch September 6, 2023 04:56
@chakravarthik27 chakravarthik27 removed the v2.1.0 Issue or request to be done in v2.1.0 release label Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Explore lm-evaluation library

4 participants