Note: This issue was copied from ggml-org#231
Original Author: @ggerganov
Original Issue Number: #231
Created: 2023-03-17T08:32:33Z
Update 10 Apr 2024: ggml-org#231 (comment)
It would be great to start doing this kind of quantitative analysis of ggml-based inference:
https://bellard.org/ts_server/
It looks like Fabrice evaluates the models using something called LM Evaluation Harness:
https://github.com/EleutherAI/lm-evaluation-harness
I have no idea what this is yet, but would be nice to study it and try to integrate it here and in other ggml-based projects.
This will be very important step needed to estimate the quality of the generated output and see if we are on the right track.
Note: This issue was copied from ggml-org#231
Original Author: @ggerganov
Original Issue Number: #231
Created: 2023-03-17T08:32:33Z
Update 10 Apr 2024: ggml-org#231 (comment)
It would be great to start doing this kind of quantitative analysis of
ggml-based inference:https://bellard.org/ts_server/
It looks like Fabrice evaluates the models using something called LM Evaluation Harness:
https://github.com/EleutherAI/lm-evaluation-harness
I have no idea what this is yet, but would be nice to study it and try to integrate it here and in other
ggml-based projects.This will be very important step needed to estimate the quality of the generated output and see if we are on the right track.