Study how LM Evaluation Harness works and try to implement it

**Note: This issue was copied from [https://github.com/ggml-org/llama.cpp/issues/231](https://github.com/ggml-org/llama.cpp/issues/231)**

**Original Author:** @ggerganov
**Original Issue Number:** #231
**Created:** 2023-03-17T08:32:33Z

---

Update 10 Apr 2024: https://github.com/ggerganov/llama.cpp/issues/231#issuecomment-2047759312

---

It would be great to start doing this kind of quantitative analysis of `ggml`-based inference:

https://bellard.org/ts_server/

It looks like Fabrice evaluates the models using something called LM Evaluation Harness:

https://github.com/EleutherAI/lm-evaluation-harness

I have no idea what this is yet, but would be nice to study it and try to integrate it here and in other `ggml`-based projects.
This will be very important step needed to estimate the quality of the generated output and see if we are on the right track.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Study how LM Evaluation Harness works and try to implement it #316

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Study how LM Evaluation Harness works and try to implement it #316

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions