Skip to content

Eval bug: mtmd prompt format is missing audio begin tag for Voxtral #17868

@steampunque

Description

@steampunque

Name and Version

master

Operating systems

Linux

GGML backends

CUDA

Hardware

NA

Models

Voxtral

Problem description & steps to reproduce

Tracing flaky behavior of Voxtral for audio processing led to discovery that the [BEGIN_AUDIO] tag is not being used in the mtmd prompt template. It sort of works without it but is quite flaky. Recommend fix:

--- mtmd.cpp	2025-12-08 13:13:44.202285955 -0500
+++ mtmd.cpp.new	2025-12-08 13:13:29.850285270 -0500
@@ -330,10 +330,10 @@
             aud_beg = "<|audio_bos|>";
             aud_end = "<|audio_eos|>";
 
-        } else if (proj == PROJECTOR_TYPE_ULTRAVOX) {
+	} else if ((proj == PROJECTOR_TYPE_ULTRAVOX) ||
+		   (proj == PROJECTOR_TYPE_VOXTRAL)) {
             // [BEGIN_AUDIO] ... (embeddings) ...
             aud_beg = "[BEGIN_AUDIO]";
-
         }
     }

First Bad Commit

NA

Relevant log output

NA

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions