Skip to content

ci: bench: add more ftype, fix triggers and bot comment#6466

Merged
ggerganov merged 7 commits intoggml-org:masterfrom
phymbert:hp/server/bench/add-quants
Apr 4, 2024
Merged

ci: bench: add more ftype, fix triggers and bot comment#6466
ggerganov merged 7 commits intoggml-org:masterfrom
phymbert:hp/server/bench/add-quants

Conversation

@phymbert
Copy link
Copy Markdown
Collaborator

@phymbert phymbert commented Apr 3, 2024

Motivation

  • PR comment pops up in all PRs even unrelated to speed, it is a little bit distracting.
  • benchmark results vary probably because a fixed seed is not set
  • add more model ftypes

Proposition

  • reduce file workflow path trigger to only llama.cpp, ggml.c and cuda files.
  • add seed param in k6 script.js
  • reduce the comment to the first line, so it does not use too much space in the PR, add a warning notice
  • add q8_0 and f16 phi-2 quants
  • add more metrics in the commit status to later on show performance history

Tests

Tested here on a self-hosted GCP L4 ( sic ^) ) :

References

@phymbert phymbert added performance Speed related topics server/webui labels Apr 3, 2024
@phymbert phymbert requested a review from ggerganov April 3, 2024 19:53
@phymbert
Copy link
Copy Markdown
Collaborator Author

phymbert commented Apr 3, 2024

@ggerganov Georgi, as more and more models are MOE based, I suggest later on adding mixtral8x7b, thoughts ?

@github-actions

This comment was marked as off-topic.

Copy link
Copy Markdown
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can continue to scale with more models in the future. For now, let's give it some more time to see how the existing benchmarks perform and gather feedback about what's useful.

@phymbert
Copy link
Copy Markdown
Collaborator Author

phymbert commented Apr 4, 2024

Understood, please merge once you have restarted the github manager

@ggerganov
Copy link
Copy Markdown
Member

Updated to master of https://github.com/ggml-org/ci and restarted

@ggerganov ggerganov merged commit 7a2c926 into ggml-org:master Apr 4, 2024
@phymbert phymbert deleted the hp/server/bench/add-quants branch April 4, 2024 09:58
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* ci: bench: change trigger path to not spawn on each PR

* ci: bench: add more file type for phi-2: q8_0 and f16.
- do not show the comment by default

* ci: bench: add seed parameter in k6 script

* ci: bench: artefact name perf job

* Add iteration in the commit status, reduce again the autocomment

* ci: bench: add per slot metric in the commit status

* Fix trailing spaces
phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026
* ci: bench: change trigger path to not spawn on each PR

* ci: bench: add more file type for phi-2: q8_0 and f16.
- do not show the comment by default

* ci: bench: add seed parameter in k6 script

* ci: bench: artefact name perf job

* Add iteration in the commit status, reduce again the autocomment

* ci: bench: add per slot metric in the commit status

* Fix trailing spaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Speed related topics server/webui

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants