B1565 by Nexesenex · Pull Request #16 · Nexesenex/croco.cpp

Nexesenex · 2023-11-26T01:56:35Z

No description provided.

* Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md

llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.

* ggml-cuda : support stablelm rope * remove unused freq_base kernel parameter * add n_dims parameter to llm_build_k_shift, default to n_rot via overload * llama : fix llm_build_k_shift args --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* Add openai-compatible POST /v1/chat/completions API endpoint to server example * fix code style * Update server README.md * Improve server README.md * Fix server.cpp code style according to review * server : some style changes * server : indentation * server : enable special tokens during tokenization by default * server : minor code style * server : change random string generator * straightforward /v1/models endpoint --------- Co-authored-by: kir-gadjello <111190790+kir-gadjello@users.noreply.github.com> Co-authored-by: Tobi Lütke <tobi@Tobis-MacBook-Pro.local>

…4189)

* reserve space for codepoints * improvement for the appended 0

* Use mmap in torch load, prefer .bin files when loading * Revert .bin > .safetensors preference

Concedo experimental

) * ggml: backend-agnostic tensor parallelism * support for GPT-OSS, Qwen 3 MoE * partial Vulkan fix * add support for 4/8 GPUs * unconditional peer access * re-use buffers + ggml contexts * fix output pattern * NCCL support * GGML: HIP: add RCCL support * Remove shfl and AllReduce from backend interface * move allocation workaround out of ggml-alloc.c * 2d tensor set/get support * Fix the seg fault without NCCL * Apply suggestion from JohannesGaessler * support for tensor dims % n_devs != 0 * fix view_offs scaling * arbitrary num. of GPUs/tensor split * fix compilation * better granularity estimate * Support device-specific host buffer types if all underlying backends expose the same type. This allows using pinned memory instead of pageable memory for CUDA. Fix compilation errors. * partial Qwen 3 Next support * Fix qwen3 30b (#8) * Fix crash with Qwen-30B-A3B Q4_0 Qwen-30B-A3B Q4_0 has an intermediate dimension of 768. Using a granularity of 256 forces an uneven split between GPUs, which is not supported by the current implementation. * Decide block size based on tensor quantization type * Fix crashes due to KV cache serialization (#9) KV cache serialization requires non-zero offsets on the tensor. Add support in the meta backend to set/get a tensor with a non-zero offset. * metal : fix build (#7) * static memory allocations, fix usage count * fix tensor granularity * more even memory distribution * use BF16 for allreduce * rebase fixup * better error message for unsupported architectures * Fix device mismatch during scatter of allReduce. (#11) There is a mismatch between the dst buffer device and the backend device, causing the use of sync copies * Enable the previous allreduce implementation. It is better in both perf and stability (#12) * delay AllReduce for Moe for less I/O * build : clean-up compile warnings * backend : move most of the meta backend API to ggml-backend-impl.h * cont : hide unused public API in the implementation * llama : use llama_device + remove ggml_backend_dev_is_meta() * ggml-backend : remove unused alloc include * minor : remove regex include * ggml : introduce ggml-ext.h for staging new APIs * rebase fixup * fix tests * llama : more robust logic for determining Meta devices (#16) * llama : more robust logic for determining Meta devices * cont : fix devs size check Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * cont : fix log type Co-authored-by: Johannes Gäßler <johannesg@5d6.de> --------- Co-authored-by: Johannes Gäßler <johannesg@5d6.de> * disable roundtrip for meta backend * fix arch selection * Qwen 3.5 support * fix Gemma 4 MoE * fix OpenVino, SYCL * fix test-llama-archs for CPU-only builds * Fix Qwen 3.5 MoE * disable meta backend tests for WebGPU * tests : filter CPU-based devices from the Meta backend tests (#17) * meta : formatting, naming, indentation (#18) * formatting : llama-model.cpp * formatting : ggml-ext.h * formatting : ggml-backend-meta.cpp * meta : add TODO * add documentation * better error messages * fix GPT-OSS --------- Co-authored-by: Carl Philipp Klemm <carl@uvos.xyz> Co-authored-by: Gaurav Garg <gaugarg@nvidia.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

jammm and others added 10 commits November 24, 2023 09:52

readme : use PATH for Windows ROCm (#4195)

b35f3d0

* Update README.md to use PATH for Windows ROCm * Update README.md * Update README.md

main.swift : fix eos checking (#4197)

2568a4b

llama_token_eos(const struct llama_model *) is currently getting struct llama_context type variable context as a parameter.

convert : fix tensors using grad in some models (#4173)

189d684

llama : set metal log callback correctly (#4204)

e9c13ff

readme : update hot topics

04814e7

Update docs for yarn_ext_factor <0.0 as unspecified instead of NaN (#…

3014b54

…4189)

llama : grammar reserve space in decode_utf8 (#4210)

f837c3a

* reserve space for codepoints * improvement for the appended 0

scripts : Use mmap in torch load (#4202)

1ddb52e

* Use mmap in torch load, prefer .bin files when loading * Revert .bin > .safetensors preference

Nexesenex merged this pull request into Nexesenex:master_experimental Nov 26, 2023

Nexesenex pushed a commit that referenced this pull request May 23, 2025

Merge pull request #16 from esolithe/concedo_experimental

5688720

Concedo experimental

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

B1565#16

B1565#16
Nexesenex merged 10 commits intoNexesenex:master_experimentalfrom
ggml-org:master

Nexesenex commented Nov 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Nexesenex commented Nov 26, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants