Models without Vocabulary by Xarbirus · Pull Request #5798 · ggml-org/llama.cpp

Xarbirus · 2024-02-29T14:37:52Z

I made some changes to the model converter so that it could create a gguf model without a built-in dictionary.
This will allow to use any custom external dictionary in an application built with llama.cpp.

cebtenzzre · 2024-02-29T17:11:09Z

Do you have a more specific example of a use case for this feature - e.g., a model with a vocab type not currently supported by llama.cpp, but with weights that are?

ggerganov

This seems something that can be useful

Xarbirus · 2024-03-04T15:44:41Z

@cebtenzzre right now we're using some kind of this tokenizer with the llama model trained by our ml engineers. And in our system the vocab is on the client side, and the server only processes tokens. So there is no need for the vocab to be included in the model.

* additional methods to read model and ctx parameters * vocab size as a part of a model metadata * models without vocabulary, convert.py part * models without vocabulary, llama.cpp part * PR clean up * converter scrypt fixes * llama_vocab_type update (renamed the new key) * pr review fixes * revert function renaming * one more NoVocab assert

cebtenzzre reviewed Feb 29, 2024

View reviewed changes

Comment thread llama.h Outdated

cebtenzzre reviewed Feb 29, 2024

View reviewed changes

Comment thread convert.py Outdated

ggerganov approved these changes Mar 1, 2024

View reviewed changes

ggerganov requested a review from cebtenzzre March 1, 2024 08:54

dranger003 mentioned this pull request Mar 2, 2024

Unable to convert Smaug 72B #5807

Closed

Xarbirus force-pushed the models-without-vocab branch from 735c684 to 2580fe5 Compare March 4, 2024 16:09

Xarbirus added 5 commits March 7, 2024 14:59

additional methods to read model and ctx parameters

e700b44

vocab size as a part of a model metadata

cc3fe18

models without vocabulary, convert.py part

4f4258f

models without vocabulary, llama.cpp part

afa9d09

PR clean up

e0504d5

Xarbirus force-pushed the models-without-vocab branch from 2580fe5 to e0504d5 Compare March 7, 2024 14:11

cebtenzzre reviewed Mar 8, 2024

View reviewed changes

Comment thread convert.py Outdated

Comment thread convert.py Outdated

Comment thread llama.h Outdated

Xarbirus added 2 commits March 10, 2024 18:28

converter scrypt fixes

0c69016

llama_vocab_type update (renamed the new key)

80f66a8

cebtenzzre reviewed Mar 12, 2024

View reviewed changes

Comment thread convert.py Outdated

cebtenzzre reviewed Mar 12, 2024

View reviewed changes

Comment thread convert.py Outdated

Comment thread convert.py Outdated

Xarbirus added 2 commits March 13, 2024 11:44

pr review fixes

0a1322a

revert function renaming

94a1050

cebtenzzre reviewed Mar 14, 2024

View reviewed changes

Comment thread convert.py

one more NoVocab assert

9cb1554

cebtenzzre approved these changes Mar 14, 2024

View reviewed changes

ggerganov merged commit 69ff613 into ggml-org:master Mar 14, 2024

Xarbirus deleted the models-without-vocab branch April 17, 2024 10:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Models without Vocabulary#5798

Models without Vocabulary#5798
ggerganov merged 10 commits intoggml-org:masterfrom
Xarbirus:models-without-vocab

Xarbirus commented Feb 29, 2024

Uh oh!

Uh oh!

Uh oh!

cebtenzzre commented Feb 29, 2024

Uh oh!

ggerganov left a comment

Uh oh!

Xarbirus commented Mar 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Xarbirus commented Feb 29, 2024

Uh oh!

Uh oh!

Uh oh!

cebtenzzre commented Feb 29, 2024

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Xarbirus commented Mar 4, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants