Use `llama_chat_apply_template` in `main` (WIP) by ngxson · Pull Request #6810 · ggml-org/llama.cpp

ngxson · 2024-04-21T16:20:01Z

Resolve #6391

The core idea is to use llama_chat_apply_template to apply the template twice: with and without the last user message. Then, we find the diff between 2 output strings and finally feed it into inference.

Example:

<start_of_turn>user
You are a helpful assistant

Hello<end_of_turn>
<start_of_turn>model
Hi there<end_of_turn>
<start_of_turn>user
Who are you<end_of_turn>
<start_of_turn>model
I am an assistant<end_of_turn>
<start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

-----
chat_get_added_part(): <start_of_turn>user
Another question<end_of_turn>
<start_of_turn>model

This approach will require minimal effort to maintain the chat template infrastructure, while using the extract same logic for main and server (remind: server also have the notion of "prompt cache" which works the same way)

Having to re-format the whole chat history each time seems inefficient at first glance, but it is needed because:

There're some edge cases, see: Implement (properly) different chat templates in main.cpp #6391 (comment)
That's the same logic with server (which is designed to be stateless)

Then, we find the diff between the 2 strings.

Implement chat_get_added_part to get the diff part with / without the last user message
main must keep track of the list of messages
Update arguments for main, deprecate -cml (but not remove it) while adding -chat-template argument

add chat_get_added_part

eb9a1ff

This was referenced Apr 21, 2024

Implement (properly) different chat templates in main.cpp #6391

Closed

Refactor chat template API #6822

Draft

mofosyne added enhancement New feature or request Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix labels May 9, 2024

ngxson mentioned this pull request Jun 22, 2024

Add chat template support for llama-cli #8068

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `llama_chat_apply_template` in `main` (WIP)#6810

Use `llama_chat_apply_template` in `main` (WIP)#6810
ngxson wants to merge 1 commit intoggml-org:masterfrom
ngxson:xsn/main_chat_template

ngxson commented Apr 21, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngxson commented Apr 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Apr 21, 2024 •

edited

Loading