llama: Add support for RWKV v7 architecture by MollySophia · Pull Request #11452 · ggml-org/llama.cpp

MollySophia · 2025-01-27T13:33:36Z

@BlinkDL 's explanation of RWKV v7:
RWKV-7 as a meta-in-context learner
Also there are plenty of tests on trained models (currently 0.1B and 0.4B) posted on his x account. Larger models are coming too in several days.

Current available RWKV v7 model repos in HF format:
https://huggingface.co/SmerkyG/RWKV7-Goose-0.1B-World2.8-HF (not an official published one, tensor names are expected to change in the future)
https://huggingface.co/mollysama/rwkv-7-world-0b4-hf
https://huggingface.co/mollysama/rwkv-7-world-1b5-hf
https://huggingface.co/RWKV-Red-Team/ARWKV-7B-Preview-0.1 (~~hybrid~~distilled model with rwkv v7 "attn" and qwen2.5 7B's mlp, distilled from qwen2.5) (it's not really appropriate to call them "hybrid" models because they actually doesn't have transformer attns)

Distilled DS-R1 models:
https://huggingface.co/RWKV-Red-Team/ARWKV-R1-7B
https://huggingface.co/RWKV-Red-Team/ARWKV-R1-1B5

This PR contains:

GGML_OP_L2_NORM that applies pytorch-style l2 normalization, along the rows. Tested with CPU, CUDA, SYCL, Vulkan, Metal backends.
GGML_OP_RWKV_WKV7 which is the core of the RWKV v7 architecture. Implemented the naive recurrent wkv7 kernel in CPU, CUDA, SYCL, Vulkan, Metal.
Support inference of RWKV7 and ARWKV7 models.
Simple Metal kernel for the old WKV6.
Skip unused tokens in last layer ffn computation for rwkv models. (8000tps -> 8100tps prefilling for 7B v7 model)

TODO:
~~- [ ] (within this PR or in the future) Implement chunkwise wkv7 (and possibly wkv6 as well) as per flash-linear-attention's impl.~~

Note: Current benchmark of ARWKV7-7B f16

# molly @ molly-workstation in ~/llama.cpp on git:rwkv-v7 x [9:49:42] 
$ ./build-test/bin/llama-bench -m ../ARWKV-7B-Preview-0_1-NoG/ARWKV-7B-Preview-0_1-NoG-F16.gguf -ngl 99
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
| model                          |       size |     params | backend    | ngl |          test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------------: | -------------------: |
| arwkv7 7B F16                  |  15.42 GiB |     8.27 B | CUDA       |  99 |         pp512 |      8105.20 ± 15.34 |
| arwkv7 7B F16                  |  15.42 GiB |     8.27 B | CUDA       |  99 |         tg128 |         50.62 ± 0.01 |

build: 76219859 (4579)

which is way faster than RWKV v6 7B when prefilling (still a bit slower than Qwen2.5 7B).

MollySophia · 2025-01-29T05:45:06Z

Update: added support for fla-hub's rwkv7 hf model format. (https://huggingface.co/fla-hub/rwkv7-1.5B-world)

ggerganov · 2025-01-29T08:50:39Z

Just a heads up, this will likely take some time to merge - I want to finish #11213 first and then figure out how to fit RWKV in the new code, likely with it's own implementation of llama_context.

MollySophia · 2025-01-29T08:52:51Z

Just a heads up, this will likely take some time to merge - I want to finish #11213 first and then figure out how to fit RWKV in the new code, likely with it's own implementation of llama_context.

That’s great! I can help with that too

ggerganov · 2025-01-29T08:55:02Z

Great, keep a look at the #11213 PR. It's still very messy, but I hope it will soon start to make sense.

MollySophia · 2025-02-09T04:00:28Z

Great, keep a look at the #11213 PR. It's still very messy, but I hope it will soon start to make sense.

I think maybe we can have this PR done first? I'll help with #11213 too for the future changes.

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

There isn't much peformance gain though. Just for more op coverage Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

They passes on my m2 and m4 devices :| Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

mtasic85 · 2025-03-15T22:03:10Z

I hope that we can merge this one and test new RWKV v7 models. 🤗

MollySophia · 2025-03-16T05:04:26Z

I hope that we can merge this one and test new RWKV v7 models. 🤗

Sure! I'm rebasing the branch of this PR today.

MollySophia · 2025-03-16T14:04:16Z

Superseded by #12412 I think

MollySophia marked this pull request as ready for review January 27, 2025 13:33

MollySophia force-pushed the rwkv-v7 branch from 09d9056 to e9c6311 Compare January 27, 2025 13:38

MollySophia marked this pull request as draft January 27, 2025 14:09

MollySophia marked this pull request as ready for review January 28, 2025 09:10

MollySophia force-pushed the rwkv-v7 branch from 16a8acd to 6588ccd Compare January 29, 2025 05:46

MollySophia force-pushed the rwkv-v7 branch from 7621985 to f48c27d Compare February 1, 2025 01:53

MollySophia force-pushed the rwkv-v7 branch 2 times, most recently from 97c31bb to e6ee7e9 Compare February 10, 2025 05:01

MollySophia and others added 10 commits February 10, 2025 13:02

ggml: Add op l2_norm

5445300

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WIP: Add support for rwkv v7

6dcc21e

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

wkv7 CUDA impl

9cd24dd

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WKV7 Vulkan & sycl

e7794cb

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

initial support for apple

84b4f81

update tests for 1b6 3b 7b

65307d2

Fix metal wkv6 inference

d564c4b

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

ggml: metal unary exp & neg

3a2a97a

There isn't much peformance gain though. Just for more op coverage Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WKV7 Metal

2187607

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

WKV7 Vulkan bugfix

e9ba411

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia added 10 commits February 10, 2025 13:02

Add support for ARWKV7 Hybrid models

f6be4dc

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Apply code-format changes

2175aeb

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv7: converter script simplification

922ebbe

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

Add _set_vocab_rwkv_world as a common function

1fdc00b

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv7: Add some model type variants

cffd099

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv: skip computing output for unused tokens for hybrid models

9cad1ca

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv: better handling for models without gate

39eb446

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

remove duplicate break;

b5be8ff

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

RWKV_WKV6 testing: avoid some weird fails

41a80df

They passes on my m2 and m4 devices :| Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv7: do not quantize small yet 2D lora weights

1a9c263

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia force-pushed the rwkv-v7 branch from e6ee7e9 to 1a9c263 Compare February 10, 2025 05:07

MollySophia added 2 commits February 20, 2025 11:21

rwkv: fix llama-parallel

a2a8109

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

rwkv: fix cases where arwkv7 model doesn't have gate tensors

98076be

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

MollySophia closed this Mar 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama: Add support for RWKV v7 architecture#11452

llama: Add support for RWKV v7 architecture#11452
MollySophia wants to merge 22 commits intoggml-org:masterfrom
MollySophia:rwkv-v7

MollySophia commented Jan 27, 2025 •

edited

Loading

Uh oh!

MollySophia commented Jan 29, 2025

Uh oh!

ggerganov commented Jan 29, 2025

Uh oh!

MollySophia commented Jan 29, 2025

Uh oh!

ggerganov commented Jan 29, 2025

Uh oh!

MollySophia commented Feb 9, 2025

Uh oh!

mtasic85 commented Mar 15, 2025

Uh oh!

MollySophia commented Mar 16, 2025

Uh oh!

MollySophia commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

MollySophia commented Jan 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MollySophia commented Jan 29, 2025

Uh oh!

ggerganov commented Jan 29, 2025

Uh oh!

MollySophia commented Jan 29, 2025

Uh oh!

ggerganov commented Jan 29, 2025

Uh oh!

MollySophia commented Feb 9, 2025

Uh oh!

mtasic85 commented Mar 15, 2025

Uh oh!

MollySophia commented Mar 16, 2025

Uh oh!

MollySophia commented Mar 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MollySophia commented Jan 27, 2025 •

edited

Loading