Releases: srogmann/llama.cpp
Releases · srogmann/llama.cpp
b6791
b6745
metal : add opt_step_adamw and op_sum (#16529) * scaffold to support opt step adamw on metal (not written so far) * add opt-step-adamw kernel for metal * pass op->src[4] as a separate buffer to the pipeline * add bounds check to opt-step-adamw kernel * complete scaffold for GGML_OP_SUM * naive GGML_OP_SUM kernel * remove unwanted comment * change OP_SUM capability gate * Add has_simdgroup_reduction to both ops to pass CI
b6729
vocab : mark EOT token for Granite models (#16499) * vocab : mark EOT token for Granite models * sampling : fallback to EOS when EOT is not found
b6692
model : Granite docling + Idefics3 preprocessing (SmolVLM) (#16206) * feat: Add granite-docling conversion using trillion pretokenizer Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add granite-docling vocab pre enum Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use granite-docling pre Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add clip_is_idefics3 Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Allow multi-token boundary sequences for image templating Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Add tiling support for idefices3 in clip.cpp This should likely be moved into llava_uhd::get_slice_instructions, but for now this avoids disrupting the logic there. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Partial support for full templating for idefics3 in mtmd There are still errors encoding some of the image chunks, but the token sequence now matches transformers _almost_ perfectly, except for the double newline before the global image which shows up as two consecutive newline tokens instead of a single double-newline token. I think this is happening because the blocks are tokenized separately then concatenated. Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Fully working image preprocessing for idefics3 w/ resize and slicing Branch: gabe-l-hart/GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * feat: Parse the preprocessor config's longest side and add it to the mmproj hparams Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Use the longest side instead of size * scale_factor For Granite Docling, these come out to the same value, but that was just a conicidence. Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * fix: Allow batch encoding and remove clip_is_idefics3 Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Remove unnecessary conditionals for empty token vectors Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * refactor: Use image_manipulation util Branch: GraniteDocling Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> * add test model --------- Signed-off-by: Gabe Goodhart <ghart@us.ibm.com> Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
b6686
chat : support Magistral thinking (#16413) * feat: added a dedicated Magistral chat format that preserves [THINK] spans, parses reasoning before tool calls * feat: new flow in the chat template test suite for Magistral
b6651
common : disable progress bar without a tty (#16352) * common : disable progress bar without a tty Signed-off-by: Adrien Gallouët <angt@huggingface.co> * Add missing headers Signed-off-by: Adrien Gallouët <angt@huggingface.co> --------- Signed-off-by: Adrien Gallouët <angt@huggingface.co>
b6601
devops: add s390x & ppc64le CI (#15925)
* devops: move s390x and ppc64le ci build
we have access to ubuntu-24.04-s390x and ppc64le images now
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: disable ppc64le for now since they have compiler errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: stop warnings as errors
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: switch to non-macro flag
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: going the llama macro route
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add big-endian gguf test models
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: disable ppc64le to test s390x, check test build
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: dup .gguf.inp files for big-endian tests
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: dup .gguf.out files for big-endian too
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add python setup and endian byteswap
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: pooring thing does not have s390x python3
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add missing rust compiler for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: try rust actions runner
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Revert "devops: try rust actions runner"
This reverts commit 3f8db04356033d6c1d7eccc75ca396bc5298250c.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: try a different path for rust
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: dump home directory and user info
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: install gguf-py only
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: missed relative path
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: remove big-endian files since local swapping is working
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: revert test-tokenizer-0 cmakelists
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Fix unicode flags conversion from and to uint16_t
Bitfields are allocated in different order on s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Simplify byteswap command
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Add byteswapping and git-lfs for test-tokenizers-ggml-vocabs
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Fix endianness detection in vocab loader
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Disable test-thread-safety on s390x
In this test a model is downloaded,
then immediately loaded to check if more downloads are needed,
and then used for test.
There is no clean way to separate all those steps
to add byteswapping between them, so just skip this test.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Fix q8_0 test in test-quantize-fns
vec_signed uses unexpected rounding mode.
Explicitly use different rounding function.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add big-endian stories260K
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: add s390x test-eval-callback
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix test does not exist
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: fix model not found llama-eval-callback
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Fix q3_K dot product error in test-quantize-fns on s390x
Array q8bytes had only 4 elements allocated, but 8 elements accessed.
This lead to write out of bounds and later read of overwritten values out of bounds
and incorrect result.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: re-enable ppc64le for testing
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: activate test-thread-safety for s390x
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: disable ppc64le tests
for some reason it keeps failing test-thread-safety tests and I do not
have a machine that is able to replicate the tests.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* devops: LLAMA_FATAL_WARNINGS=ON
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Correct repository URL for s390x for test-thread-safety model
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Fix fs_get_cache_directory
Ensure it works even if both XDG_CACHE_HOME and HOME are unset.
This might happen in containers.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Re-enable CI for ppc64le
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Fortify ggml_rope_impl
Only memcpy data from sections argument if it's non-NULL.
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
* Add TODO in struct unicode_cpt_flags to reimplement it in endian-independent way
* Update URL for big-endian model
* Update .github/workflows/build.yml
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
* Update remaining mentions of BE models to ggml-org/models repo
---------
Signed-off-by: Aaron Teo <aaron.teo1@ibm.com>
Co-authored-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com>
Co-authored-by: Aleksei Nikiforov <103434461+AlekseiNikiforovIBM@users.noreply.github.com>
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>