Skip to content

mtmd: use causal attn for gemma 4 audio (+ small breaking change to mtmd)#21824

Merged
ngxson merged 1 commit intoggml-org:masterfrom
ngxson:xsn/g4_causal_audio
Apr 13, 2026
Merged

mtmd: use causal attn for gemma 4 audio (+ small breaking change to mtmd)#21824
ngxson merged 1 commit intoggml-org:masterfrom
ngxson:xsn/g4_causal_audio

Conversation

@ngxson
Copy link
Copy Markdown
Contributor

@ngxson ngxson commented Apr 12, 2026

Overview

Continue #21421

Fix #21820

Fix #21816

For gemma 4, the text model only use non-causal (aka bidirectional attention) for vision input

Breaking change

mtmd_decode_use_non_causal now requires passing a second param, the current chunk

The chunk is optional and can be nullptr (default: assuming the current chunk is image).

In the case of gemma 4:

  • Vision chunk requires non-causal
  • Audio chunk requires causal (same as text input)

Requirements

@ngxson ngxson merged commit 920b3e7 into ggml-org:master Apr 13, 2026
47 checks passed
cnsiva pushed a commit to saas-home/llama.cpp that referenced this pull request Apr 13, 2026
HermestoAizales pushed a commit to HermestoAizales/llama.cpp that referenced this pull request Apr 13, 2026
ArberSephirotheca pushed a commit to ArberSephirotheca/llama.cpp that referenced this pull request Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Gemma4 E2B does not produce correct transcripts from audio Misc. bug: Gemma 4 E4B audio assert error

3 participants