llama : add --n-cpu-moe option by slaren · Pull Request #15077 · ggml-org/llama.cpp

slaren · 2025-08-04T21:45:17Z

Following @jacekpoplawski suggestion in #14992, adds an option to keeps the MoE weights of the first N layers in the CPU. You can use:

--cpu-moe to keep all MoE weights in the CPU
--n-cpu-moe N to keep the MoE weights of the first N layers in the CPU

The goal is to avoid having to write complex regular expressions when trying to optimize the number of MoE layers to keep in the CPU.

These options work by adding the necessary tensor overrides. If you use --override-tensor before these options, your overrides will take priority.

Keeps the MoE weights of the first N layers in the CPU

adding a destructor to common_params would cause issues when the object is copied

jacekpoplawski · 2025-08-05T02:45:37Z

Thank you :)

SlavikCA · 2025-08-05T18:07:52Z

Should this options be added to this page, too:
https://github.com/ggml-org/llama.cpp/tree/master/tools/server
?

--cpu-moe to keep all MoE weights in the CPU --n-cpu-moe N to keep the MoE weights of the first N layers in the CPU ggml-org/llama.cpp#15077

g0t4 · 2025-08-07T06:41:03Z

thank you! just got 108T/s with gpt-oss:120b on my dual 5090s with --n-cpu-moe 3... so awesome I haven't had time to see if I should tweak it further :)

llama-server -hf ggml-org/gpt-oss-120b-GGUF --ctx-size 0 --jinja --flash-attn --n-gpu-layers 99 --reasoning-format none --n-cpu-moe 3

* llama : add --n-cpu-moe option Keeps the MoE weights of the first N layers in the CPU

slaren added 2 commits August 4, 2025 23:41

llama : add --n-cpu-moe option

260e030

Keeps the MoE weights of the first N layers in the CPU

better way to avoid memory leaks in tensor_buft_overrides

fd2d1f9

adding a destructor to common_params would cause issues when the object is copied

slaren force-pushed the sl/ncmoe branch from a7c7ccb to fd2d1f9 Compare August 4, 2025 23:04

slaren merged commit ec428b0 into master Aug 4, 2025
45 of 47 checks passed

slaren deleted the sl/ncmoe branch August 4, 2025 23:05

Nexesenex mentioned this pull request Aug 5, 2025

Add support for GLM-4.5 models ikawrakow/ik_llama.cpp#668

Merged

4 tasks

anzax mentioned this pull request Aug 5, 2025

llama.cpp : add --n-cpu-moe option lmstudio-ai/lmstudio-bug-tracker#840

Open

TheLegendOfKitty mentioned this pull request Aug 6, 2025

Port cpu moe options from mainline ikawrakow/ik_llama.cpp#672

Merged

4 tasks

thad0ctor added a commit to thad0ctor/llama-server-launcher that referenced this pull request Aug 6, 2025

Added MOE options - llama.cpp PR 15077

6674f8f

--cpu-moe to keep all MoE weights in the CPU --n-cpu-moe N to keep the MoE weights of the first N layers in the CPU ggml-org/llama.cpp#15077

Readon mentioned this pull request Aug 7, 2025

use cpu to offload moe weights to reduce the VRAM usage. ollama/ollama#11772

Open

jacekpoplawski mentioned this pull request Sep 12, 2025

llama-bench: add --n-cpu-moe support #15952

Merged

kityr mentioned this pull request Oct 1, 2025

feat: expose the --cpu-moe and --n-cpu-moe llama.cpp flags in GUI janhq/jan#6695

Closed

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

llama : add --n-cpu-moe option (#15077)

78f8315

* llama : add --n-cpu-moe option Keeps the MoE weights of the first N layers in the CPU

Defilan mentioned this pull request Apr 17, 2026

Hybrid GPU/CPU offloading for MoE models defilantech/LLMKube#280

Closed

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama : add --n-cpu-moe option (ggml-org#15077)

84a504f

* llama : add --n-cpu-moe option Keeps the MoE weights of the first N layers in the CPU

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add --n-cpu-moe option#15077

llama : add --n-cpu-moe option#15077
slaren merged 2 commits intomasterfrom
sl/ncmoe

slaren commented Aug 4, 2025

Uh oh!

Uh oh!

jacekpoplawski commented Aug 5, 2025

Uh oh!

SlavikCA commented Aug 5, 2025

Uh oh!

g0t4 commented Aug 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

slaren commented Aug 4, 2025

Uh oh!

Uh oh!

jacekpoplawski commented Aug 5, 2025

Uh oh!

SlavikCA commented Aug 5, 2025

Uh oh!

g0t4 commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

g0t4 commented Aug 7, 2025 •

edited

Loading