PoC: add chat template heuristics by kallewoof · Pull Request #1283 · LostRuins/koboldcpp

kallewoof · 2024-12-24T07:08:24Z

The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf).

This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to.

Determine where this should go. koboldcpp.py seems big enough as it is, but the project does not seem to split into .py files often/at all.
Extend the cases to cover more chat templates, e.g. Llama 3.1, Mistral versions, etc.

Alternative approach: expose the llama chat template mechanism and use that.

The fallback chat template adapter of Vicuna is not ideal in some cases (e.g. a test against a sub-portion of the BBC news classification task on Kaggle gave an 82% accuracy with Vicuna and 88% with the official ChatML format for a q4_k_m Qwen 2.5 3B-Instruct gguf). This PR adds a proof of concept simple heuristic which looks at the chat template and upgrades the adapter when it is able to.

LostRuins · 2024-12-25T02:44:32Z

The problem is the chat template cannot be trusted. It is set by unknown third parties, and very often straight up incorrect or misleading - that's the whole reason why I didn't use the jinja template to begin with. I'm alright with frontends using the /props endpoint to make their own pick, but I'm not sure overwriting the default is a good idea.

The reason why the default is Alpaca (not vicuna in fact) has to do with the fact that \n### Instruction:\n is a tokenizer resistant format. When using ChatML, for example, if the vocab doesn't have a <|im_start|> and <|im_end|> added tokens, the sequence gets split into an awful mess of tokens like [<,|,im,_,start,|,>] which severely degrades the output quality. Meanwhile, Alpaca, being plain English, is relatively unaffected, and the newline ensures that tokenizers with huge vocabs like Llama3 don't end up merging part of the instruct sequence into the input text tokens.

kallewoof · 2024-12-25T02:49:22Z

I see. When I was evaluating a bunch of different models against a test, I was completely unaware of the fact they were all using Alpaca (sorry, mixed them up) and when I switched, there was a significant increase in accuracy across the board. I understand not wanting to trust third parties willy nilly. Perhaps there can be a flag that lets users choose whether or not they want to trust it or not?

Edit: We can also add built in guard rails to deal with fuckery like unknown tokens. I.e. for every chat template profile, we include a list of tokens that must be present in the tokenizer. If any are missing, the chat template is not adopted.

LostRuins · 2024-12-25T03:05:55Z

Well, I think there's a nice way to do it.

What I can suggest instead is we create a new dummy file called AutoGuess.json as a built-in-adapter for chat completion adapters. This can be optionally selected by the user when they wish to apply heuristics.

(they can also simply type AutoGuess)

Then, the heuristics will apply.

kallewoof · 2024-12-25T04:38:50Z

Sounds good. ~~I assume this can be selected via the Open AI API chat completion endpoint as well?~~

~~I do think we may want to include an option to default to using it as well, but that's not a blocker (I just have to add it to my requests).~~ Edit: it looks like people can do --chatcompletionsadapter AutoGuess which is fine I suppose.

kallewoof · 2024-12-25T04:58:52Z

I added two new prints on startup, which come after the text model load:

Notifies user about failure to detect the chat completion adapter, and that Alpaca will be used. This is when heuristics are enabled, but no applicable heuristic was found.
Notifies user about Alpaca being used for OAI API chat completions, for when no --chatcompletionsadapter was provided.

kallewoof · 2024-12-25T07:07:48Z

Giving this some thought, I think this would do well as a json file with search string array -> chat template name + params. Then a for loop that checks each in the code. We could probably even use the AutoGuess.json file for this.

kallewoof · 2024-12-25T12:51:08Z

Better! Any adapter json file can now be a list with dicts with search, name, and adapter keys which will be searched for. Not sure how useful that is, but hey. This seems like a pretty seamless integration.

LostRuins

lgtm

kallewoof added 10 commits December 24, 2024 16:05

gemma 2 heuristic

7b7150f

Phi 4, Llama 3.x heuristics

8c2f83a

better qwen vs generic heuristic

203c4be

cleanup

3eb7712

mistral (generic) heuristic

e980dca

fix sys msg for mistral

fea7766

phi 3.5

4090105

mistral v3

f90a238

cohere (aya expanse 32b based)

eef1a20

only derive from chat template if AutoGuess

df120e8

kallewoof added 2 commits December 25, 2024 14:02

add notes about alpaca fallbacks

d1c273d

added AutoGuess.json dummy

b45380f

kallewoof force-pushed the 202412-ct-heuristics branch from 9de98ca to b45380f Compare December 25, 2024 05:02

add mistral v7

267d6cb

switch to using a json list with search strings

92f33c1

LostRuins approved these changes Dec 28, 2024

View reviewed changes

LostRuins merged commit 23ec550 into LostRuins:concedo_experimental Dec 28, 2024

kallewoof deleted the 202412-ct-heuristics branch December 28, 2024 14:01

LostRuins mentioned this pull request Jul 25, 2025

Adapter fixes #1659

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC: add chat template heuristics#1283

PoC: add chat template heuristics#1283
LostRuins merged 15 commits intoLostRuins:concedo_experimentalfrom
kallewoof:202412-ct-heuristics

kallewoof commented Dec 24, 2024 •

edited

Loading

Uh oh!

LostRuins commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024 •

edited

Loading

Uh oh!

LostRuins commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024 •

edited

Loading

Uh oh!

kallewoof commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024

Uh oh!

LostRuins left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kallewoof commented Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallewoof commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024

Uh oh!

kallewoof commented Dec 25, 2024

Uh oh!

LostRuins left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kallewoof commented Dec 24, 2024 •

edited

Loading

kallewoof commented Dec 25, 2024 •

edited

Loading

kallewoof commented Dec 25, 2024 •

edited

Loading