upstream#20
Merged
visorcraft merged 22 commits intovisorcraft:fix/hybrid-cache-reusefrom Mar 9, 2026
Merged
Conversation
* support flash-attention for fp32/fp16/Q4/Q5/Q8 * rm warining * update for JIT
* Revert to OAI-compatible args * Apply workaround::func_args_not_string
* tests: add end-to-end tests per model architecture * fixup for rebase * fix use-after-free in llama-model-loader.cpp * fix CI * fix WebGPU * fix CI * disable CI for macOS-latest-cmake-arm64 * use expert_weights_scale only if != 0.0f * comments
* vulkan: Fix data races in coopmat1 mul_mat(_id) Add barriers between coopmat store and regular loads. We sort of got away with this because it was the same subgroup accessing the values, but it's still a race and may not work. * switch to subgroup control barriers
* ggml-Vulkan: add ELU support * ggml-Vulkan: remove extra spaces and variables * ggml-Vulkan: fix format issue * ggml-Vulkan: fix format issue * fix whitespace issue * Update Vulkan.csv and ops.md
* Fix structured outputs * Update common/chat-auto-parser-generator.cpp Co-authored-by: Aldehir Rojas <hello@alde.dev> --------- Co-authored-by: Aldehir Rojas <hello@alde.dev>
* Fix compile bug * Update common/chat-auto-parser-helpers.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* common : handle incomplete UTF-8 at end of input in PEG parser * cont : if reached end prematurely, emit needs_more_input to propagate partial output * cont: refactor peg parse context to add lenient flag * cont : remove partial flag, keep lenient flag
* PEG parser for LFM2 * Simplify using python_value()
visorcraft
added a commit
that referenced
this pull request
Mar 9, 2026
Merge pull request #20 from ggml-org/master
visorcraft
pushed a commit
that referenced
this pull request
Mar 12, 2026
…better shader parameter handling (ggml-org#20173) * K quant speedup (#20) * Basic JIT compilation for mul_mat, get_rows, and scale (#17) * scale jit working * preliminary working jit for getrows and mulmat, needs refining * simplified mul_mat preprocessing switch statement * get_rows fixes, mul_mat refinement * formatted + last edits * removed some extraneous prints * fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish * small fix * some changes, working * get_rows and mul_mat jit fixed and working * Update formatting * formatting * Add header --------- Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Start work on all-encompassing shader library * refactor argmax, set_rows * Refactor all but flashattention, mat mul * no gibberish, all k quants added, merged * vec memory fix * q6_k matching metal on my machine, tests passing * Set tile size for q6_k separately * Separate out fast shaders --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com> * Move towards writeBuffer for params * Move away from multiple buffers for set_rows errors, remove host buffer for parameter buffers, minor cleanups * Remove extra file * Formatting --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Make sure to read the contributing guidelines before submitting a PR