Skip to content

llama : add simple option to enable CPU for MoE weights (--cpu-moe)#14992

Merged
slaren merged 1 commit intomasterfrom
sl/moe-switch
Jul 31, 2025
Merged

llama : add simple option to enable CPU for MoE weights (--cpu-moe)#14992
slaren merged 1 commit intomasterfrom
sl/moe-switch

Conversation

@slaren
Copy link
Copy Markdown
Member

@slaren slaren commented Jul 31, 2025

This is intended to be a simple and curated way to use the CPU for the MoE weights. Internally, it is just setting up the appropriate tensor overrides, but this should be easier to use.

Comment thread common/arg.cpp
@slaren slaren merged commit a06ed5f into master Jul 31, 2025
47 checks passed
@slaren slaren deleted the sl/moe-switch branch July 31, 2025 18:15
@jacekpoplawski
Copy link
Copy Markdown
Contributor

Am I correct that this is on/off? It would be better to have an option for the number of layers (similar to -ngl).

@slaren
Copy link
Copy Markdown
Member Author

slaren commented Jul 31, 2025

I am not convinced that it would be worth it. The goal here is to have a very simple option that works well enough for most people. If you want to min-max, you can still use the --override-tensor option to customize it in any way you want.

@jacekpoplawski
Copy link
Copy Markdown
Contributor

Yes, I understand. And now I have an idea for my experiments :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants