fix crash on non-AVX systems dynamically loading GGML CPU backends#11780
Merged
slaren merged 1 commit intoggml-org:masterfrom Feb 13, 2025
jmorganca:jmorganca/sgemm-initialization
Merged
fix crash on non-AVX systems dynamically loading GGML CPU backends#11780slaren merged 1 commit intoggml-org:masterfrom jmorganca:jmorganca/sgemm-initialization
slaren merged 1 commit intoggml-org:masterfrom
jmorganca:jmorganca/sgemm-initialization
Conversation
slaren
approved these changes
Feb 10, 2025
Member
slaren
left a comment
There was a problem hiding this comment.
Thanks, I missed this global. The fix looks ok, but if the code is not inlined it may add some overhead to the other types. I will leave this open for a while in case someone knowledgeable about llamafile/tinyblas wants to propose a better solution.
Contributor
Author
|
Thanks for merging @slaren. I'm running some performance tests after noticing ollama/ollama#9087. I'm not sure if this PR is the root cause, but I haven't ruled it out yet. In any case will keep you posted and wanted to give you a heads up in case |
Member
|
Llamafile tinyblas should only be used for prompt processing, so if you are also observing a decrease of performance during generation, it is not very likely that it was caused by this change. |
orca-zhang
pushed a commit
to orca-zhang/llama.cpp
that referenced
this pull request
Feb 26, 2025
arthw
pushed a commit
to arthw/llama.cpp
that referenced
this pull request
Feb 26, 2025
V6ser
pushed a commit
to V6ser/llama.cpp
that referenced
this pull request
Mar 15, 2026
Seunghhon
pushed a commit
to Seunghhon/llama.cpp
that referenced
this pull request
Apr 26, 2026
ljubomirj
pushed a commit
to ljubomirj/llama.cpp
that referenced
this pull request
May 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thanks for the awesome work by @slaren in #10469 (and a few follow up PRs) to enable dynamic GGML backend loading. This made supporting different CPU instructions in GGML much, much easier.
I noticed a small hitch with with the
llamafilecode where a machine with a non-AVX CPU would crash when trying todlopenCPU libraries built withGGML_LLAMAFILE=ON. This moves the AVX-dependent code to do a member variable, fixing the crash ondlopen. I'm not sure howsgemm.cppis vendored, and so let me know the best way/place to suggest a change.