Add support for ArcticForCausalLM (#7020) by Nexesenex · Pull Request #135 · Nexesenex/croco.cpp

Nexesenex · 2024-05-24T12:57:27Z

common : increase max number of experts to 128
common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn
gguf-py : add architecture-specific block mappings that override selected general block mappings
convert-hf : add model conversion support for ArcticForCausalLM
convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM)
llama : add inference support for LLM_ARCH_ARCTIC

* common : increase max number of experts to 128 * common : add tensor LLM_TENSOR_FFN_NORM_EXPS for normalization before MoE that runs in parallel to attention + ffn * gguf-py : add architecture-specific block mappings that override selected general block mappings * convert-hf : add model conversion support for ArcticForCausalLM * convert-hf : use added_tokens_decoder from tokenizer_config.json to redefine tokens from SentencePiece model (only for ArcticForCausalLM) * llama : add inference support for LLM_ARCH_ARCTIC --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>

* q6_k_r4: Better ARM implementation PP-512(LLaMA-3.1-8B) is now 104.2 t/s up from 83.2 t/s. I.e., q6_k_r4 now beats q6_0_r4. * q5_k_r4: Better ARM implementation PP-512(LLaMA-3.1-8B) is now 107.8 t/s up from 96.9 t/s. I.e., q5_k_r4 now beats q5_0_r4. * q4_k_r4: Better ARM implementation PP-512(LLaMA-3.1-8B) is now 122.1 t/s up from 110 t/s. I.e., q4_k_r4 is now (nearly) on par with q4_0_r4. * iq4_xs_r4: Better ARM implementation PP-512(LLaMA-3.1-8B) is now 131.3 t/s up from 115.8 t/s. iq4_xs_r4 is now the prompt processing champion on ARM. --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Nexesenex merged commit 30ce1ed into Nexesenex:downstream May 24, 2024

github-actions Bot added the python label May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for ArcticForCausalLM (#7020)#135

Add support for ArcticForCausalLM (#7020)#135
Nexesenex merged 1 commit intoNexesenex:downstreamfrom
ggml-org:master

Nexesenex commented May 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Nexesenex commented May 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants