Windows fixes by etra0 · Pull Request #31 · ggml-org/llama.cpp

etra0 · 2023-03-12T03:27:55Z

This would be the initial PR to be able to compile stuff in Windows.

In particular, MSVC is very picky about the features you can use and you cannot.

With C++11

You cannot use designated initializers (when initializing a struct, you cannot specify the fields names)
You cannot use VLAs, so I changed it to a vector.

A PR for the CMake build system (as agreed in #22) will be separated.

These changes were tested with MSVC 19.34.31937.0 (VS Studio 2022) and in macOS 12.6 with Apple clang version 13.1.6

Issue: ggml-org#22

CI fine tuning

Mrope fix

implement logging redirection

* print build type * wip * print compiling flags * wip * wip

…31 Block 128: PPL=165.6 (same as block 32) Disabled Q rotation: PPL=165.6 (same) Root cause: dynamic_cast fails for MoE hybrid memory context. Q rotation and V inverse rotation never execute. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…31 Block 128: PPL=165.6 (same as block 32) Disabled Q rotation: PPL=165.6 (same) Root cause: dynamic_cast fails for MoE hybrid memory context. Q rotation and V inverse rotation never execute. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- turbo4 K+V results on Qwen3.5-27B (-0.32% vs q8_0) and Qwen3-14B (+6.3%) - Sparse V dequant benchmarks: MoE native dequant +10.9% at 8K - Gemma-3 turbo3 results post-iSWA fix (+3.3%) - KVLinC no-K-rotation negative result - Speculative decoding negative result - CUDA 13.2 compatibility verified - Experiments #31, TheTom#39, TheTom#42, TheTom#45, TheTom#49, TheTom#50, TheTom#51 status updates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ml-org#31 ggml-org#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…31 Block 128: PPL=165.6 (same as block 32) Disabled Q rotation: PPL=165.6 (same) Root cause: dynamic_cast fails for MoE hybrid memory context. Q rotation and V inverse rotation never execute. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#31 ggml-org#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gml-org#31 Block 128: PPL=165.6 (same as block 32) Disabled Q rotation: PPL=165.6 (same) Root cause: dynamic_cast fails for MoE hybrid memory context. Q rotation and V inverse rotation never execute. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#31 ggml-org#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gml-org#31 Block 128: PPL=165.6 (same as block 32) Disabled Q rotation: PPL=165.6 (same) Root cause: dynamic_cast fails for MoE hybrid memory context. Q rotation and V inverse rotation never execute. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#31 ggml-org#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gml-org#31 Block 128: PPL=165.6 (same as block 32) Disabled Q rotation: PPL=165.6 (same) Root cause: dynamic_cast fails for MoE hybrid memory context. Q rotation and V inverse rotation never execute. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#31 ggml-org#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…gml-org#31 Block 128: PPL=165.6 (same as block 32) Disabled Q rotation: PPL=165.6 (same) Root cause: dynamic_cast fails for MoE hybrid memory context. Q rotation and V inverse rotation never execute. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ml-org#31 ggml-org#30 ROOT CAUSE: pre-rotate-queries never executed because: 1. Q ne[0]=256 (GQA concatenated heads), rotation matrix ne[0]=128 2. mctx dynamic_cast failed for MoE hybrid memory FIX: put inverse WHT rotation back in dequantize_full_block. This is slower (10.7 tok/s vs 77.7) but produces CORRECT results. PERPLEXITY RESULTS: - f16: 6.121 - q8_0: 6.111 - q4_0: 6.142 - turbo3: 6.194 (+1.2% vs q8_0) ✅ The speed optimization (pre-rotate-queries) needs to be reimplemented to work with GQA head layout and hybrid memory types. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Apply fixes suggested to build on windows Issue: ggml-org#22 * Remove unsupported VLAs * MSVC: Remove features that are only available on MSVC C++20. * Fix zero initialization of the other fields. * Change the use of vector for stack allocations.

Ref ggml-org#29 Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

etra0 added 3 commits March 12, 2023 00:20

Apply fixes suggested to build on windows

2d29d4b

Issue: ggml-org#22

Remove unsupported VLAs

718bb98

MSVC: Remove features that are only available on MSVC C++20.

ce0d5fb

NotNite mentioned this pull request Mar 12, 2023

ggml_new_tensor_impl: not enough space in the context's memory pool #29

Closed

etra0 marked this pull request as draft March 12, 2023 04:50

Fix zero initialization of the other fields.

c505a2d

etra0 marked this pull request as ready for review March 12, 2023 05:18

0xbitches mentioned this pull request Mar 12, 2023

Windows 64-bit, Microsoft Visual Studio - it works like a charm after those fixes! #22

Closed

ggerganov requested changes Mar 12, 2023

View reviewed changes

Comment thread utils.cpp Outdated

ggerganov mentioned this pull request Mar 12, 2023

Port to Visual C++. #36

Closed

etra0 requested a review from ggerganov March 12, 2023 15:47

Change the use of vector for stack allocations.

3b20e78

etra0 force-pushed the windows-fixes branch from 6be7fda to 3b20e78 Compare March 12, 2023 16:07

ggerganov approved these changes Mar 12, 2023

View reviewed changes

ggerganov merged commit eb062bb into ggml-org:master Mar 12, 2023

NouamaneTazi mentioned this pull request Mar 16, 2023

[Open-to-community] Benchmark bloomz.cpp on different hardware NouamaneTazi/bloomz.cpp#4

Open

Hades32 pushed a commit to Hades32/llama.cpp that referenced this pull request Mar 21, 2023

Merge pull request ggml-org#31 from anzz1/ci_test

7b24407

CI fine tuning

NouamaneTazi mentioned this pull request Mar 22, 2023

May you release a windows build binary? NouamaneTazi/bloomz.cpp#17

Open

Bearsaerker mentioned this pull request Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Closed

graehl mentioned this pull request Jul 8, 2025

finetune: SGD optimizer, more CLI args #13873

Merged

theo77186 pushed a commit to theo77186/llama.cpp that referenced this pull request Oct 28, 2025

Merge pull request ggml-org#31 from rujialiu/mrope-fix

ef1af68

Mrope fix

jesusmb1995 pushed a commit to jesusmb1995/llama.cpp that referenced this pull request Oct 30, 2025

Merge pull request ggml-org#31 from ogad-tether/temp-latest

87f40dd

implement logging redirection

sainnhe mentioned this pull request Jan 25, 2026

Eval bug: coredump due to ops of discontinuous tensor memory #19078

Closed

rururush pushed a commit to USTC-ADSL/llama.cpp that referenced this pull request Mar 16, 2026

bug: fix benchmark debug warning (ggml-org#31)

8b652dd

* print build type * wip * print compiling flags * wip * wip

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

Fix build when iqk_mul_mat is disabled (ggml-org#31)

5ff9970

Ref ggml-org#29 Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Windows fixes#31

Windows fixes#31
ggerganov merged 5 commits intoggml-org:masterfrom
etra0:windows-fixes

etra0 commented Mar 12, 2023 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

etra0 commented Mar 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

etra0 commented Mar 12, 2023 •

edited

Loading