Skip to content

llama: Add support for RWKV v7 architecture(v2)#12412

Merged
MollySophia merged 8 commits intoggml-org:masterfrom
MollySophia:rwkv-v7-new
Mar 17, 2025
Merged

llama: Add support for RWKV v7 architecture(v2)#12412
MollySophia merged 8 commits intoggml-org:masterfrom
MollySophia:rwkv-v7-new

Conversation

@MollySophia
Copy link
Copy Markdown
Collaborator

@MollySophia MollySophia commented Mar 16, 2025

@BlinkDL 's explanation of RWKV v7:
RWKV-7 as a meta-in-context learner
Also there are plenty of tests on trained models posted on his x account.

Current available RWKV v7 model repos in HF format:

Base models:

https://huggingface.co/fla-hub/rwkv7-191M-world
https://huggingface.co/fla-hub/rwkv7-0.4B-world
https://huggingface.co/fla-hub/rwkv7-1.5B-world
https://huggingface.co/fla-hub/rwkv7-2.9B-world
https://huggingface.co/fla-hub/rwkv7-0.1B-g1 (Haven't add the option to enable it's capability yet.)

Distilled models:

https://huggingface.co/RWKV-Red-Team/ARWKV-R1-1B5
https://huggingface.co/RWKV-Red-Team/ARWKV-R1-7B
https://huggingface.co/RWKV-Red-Team/ARWKV_7B_R1_16K

This PR contains:

  • GGML_OP_L2_NORM that applies pytorch-style l2 normalization, along the rows. Tested with CPU, CUDA, SYCL, Vulkan, Metal backends.
  • GGML_OP_RWKV_WKV7 which is the core of the RWKV v7 architecture. Implemented the naive recurrent wkv7 kernel in CPU, CUDA, SYCL, Vulkan, Metal.
  • Support inference of RWKV7 and ARWKV7 models.
  • Simple Metal kernel for the old WKV6.
  • Skip unused tokens in last layer ffn computation for rwkv models.
  • Fix inference with RWKV6Qwen2.

TODO:

  • llama-parallel seems broken with all rwkv models. Will check what's wrong and try to fix them tomorrow. (Inference is fixed. But the output seems mixed between these parallel sequences. Haven't figured out what's wrong yet)
  • Why is Musa build failing? (Seems that there's some bugs in their vectorization code. Getting rid of a #pragma unroll in wkv.cu fix the build.

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
@github-actions github-actions Bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs Vulkan Issues specific to the Vulkan backend python python script changes ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 16, 2025
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
@MollySophia MollySophia requested a review from ggerganov March 17, 2025 07:02
Copy link
Copy Markdown
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concern with the SYCL changes, thanks

@MollySophia MollySophia merged commit 7dfad38 into ggml-org:master Mar 17, 2025
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Apply code-format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* fix MUSA build

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
@heredos heredos mentioned this pull request Mar 26, 2025
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* ggml: Add op l2_norm

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* ggml: Add op rwkv_wkv7

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: Add support for RWKV7 and ARWKV7 models

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: fix inference with RWKV6Qwen2

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: add more (a)rwkv7 variants in size

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* Apply code-format changes

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* fix MUSA build

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

* llama: fix shape error with rwkv using llama-parallel

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>

---------

Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs python python script changes SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language testing Everything test related Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants