llama squad by jeff3071 · Pull Request #5 · open-evals/evals

jeff3071 · 2023-03-20T10:14:42Z

Eval details 📑

Eval name

squad

Eval description

We evaluate llama using 100 examples of the SQuAD dataset with the Open-evals framework, which extends OpenAI's Evals for different language models. We consider the sentence immediately following the prompt as the output of Llama and useinclude accuracy as a metric to measure its performance.

For a model completion a and a reference list of correct answers B
include: any([(a in b) for b in B])

model	squad(100)
llama	0.63
gpt-3.5-turbo	0.9
text-davinci-003	0.87
text-davinci-002	0.66
text-davinci-001	0.58
ada	0.35

jeff3071 added 5 commits March 20, 2023 18:02

llama squad

3bde09a

Delete tmp.md

570128d

llama squad

a4da843

fix llama.py

007a228

add bigrams, born-first evals

d6227a2

jeff3071 marked this pull request as ready for review March 23, 2023 02:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama squad#5

llama squad#5
jeff3071 wants to merge 5 commits intoopen-evals:mainfrom
jeff3071:llama-evaluate-squad

jeff3071 commented Mar 20, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jeff3071 commented Mar 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Eval details 📑

Eval name

Eval description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jeff3071 commented Mar 20, 2023 •

edited

Loading