Skip to content

common/gemma4 : handle parsing edge cases (#21760)#27

Merged
InfernalDread merged 1 commit intoInfernalDread:turboquant_kv_cache_updated_v5from
ggml-org:master
Apr 14, 2026
Merged

common/gemma4 : handle parsing edge cases (#21760)#27
InfernalDread merged 1 commit intoInfernalDread:turboquant_kv_cache_updated_v5from
ggml-org:master

Conversation

@InfernalDread
Copy link
Copy Markdown
Owner

No description provided.

@InfernalDread InfernalDread merged commit 9af22fc into InfernalDread:turboquant_kv_cache_updated_v5 Apr 14, 2026
84 of 144 checks passed
InfernalDread pushed a commit that referenced this pull request Apr 23, 2026
Full investigation log with all tests, results, and the root cause.
Upstream TurboQuant activity tracked in #27.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
InfernalDread pushed a commit that referenced this pull request Apr 23, 2026
Add full asymmetric K/V quantization support for Metal flash attention:

- Pipeline naming uses k{type}_v{type} format for all FA kernels (335 total),
  eliminating underscore ambiguity in type names
- 90 turbo × turbo asymmetric instantiations (turbo2/3/4 all combinations)
- 150 q8_0 × turbo asymmetric instantiations (both directions, all head dims)
- Gatekeeper and assertion updated to allow turbo × turbo and q8_0 × turbo pairs
- Zero regression on existing symmetric paths (validated across 4 models, 2 machines)

The q8_0 × turbo kernels fix a silent dispatch failure where mixed q8_0-K + turbo-V
configs would NaN (turbo4-V) or fall to undefined paths (turbo3-V). This enables
the asymmetric quality rescue: q8_0-K + turbo-V recovers near-baseline PPL on
low-bit models where symmetric turbo-K degrades.

Validated on Metal (M2 Pro + M5 Max):
- phi-4-Q8_0: symmetric turbo3 +4.2%, turbo4 +1.7% (no regression)
- Qwen2.5-7B Q4_K_M: q8_0-K + turbo4-V +1.0%, q8_0-K + turbo3-V +2.0% (rescued)
- Qwen3.5-35B MoE, 27B Dense, Mistral-24B: all healthy (no regression)
- Cross-hardware M2/M5 parity confirmed on all tested configs

Closes #27

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: tturney@psyguard.ai
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants