Skip to content

Add support for Chameleon#8543

Merged
ggerganov merged 24 commits intoggml-org:masterfrom
nopperl:chameleon
Sep 28, 2024
Merged

Add support for Chameleon#8543
ggerganov merged 24 commits intoggml-org:masterfrom
nopperl:chameleon

Conversation

@nopperl
Copy link
Copy Markdown
Contributor

@nopperl nopperl commented Jul 17, 2024

This PR adds support for the Chameleon model. For now, this implementation only supports text->text inference and serves as base to implement the (more interesting) image->text, text->image and interleaved pipelines. However, such an implementation will probably require some changes to the CLI and internal architecture, so I suggest to do this in a separate PR.

Chameleon is based on the Llama-2 architecture with the following changes:

  • different (pre-)tokenizer
  • qk-norm
  • swin-norm

Note 1: in order to enable text->text inference, the image token logits are suppressed similar to the HF implementation. This needs to be removed when support for images is added.

Note 2: I implemented swin-norm, but I haven't tested it yet, as it is only used by Chameleon-30B.

To test it:

git clone https://huggingface.co/facebook/chameleon-7b
./convert-hf-to-gguf.py chameleon-7b
build/bin/llama-cli -m chameleon-7b/ggml-model-f16.gguf --temp 0.8 -s 1000 -n 50 -p "Language modeling is " -ngl 33

Output:

Language modeling is “the task of predicting the next word in a sequence of text, given the previous words.”

To implement a language model, we can use a neural network with a bidirectional LSTM layer and a softmax output layer.

Reference (requires transformers>=4.43.0.dev0):

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed
set_seed(1000)
model = AutoModelForCausalLM.from_pretrained("facebook/chameleon-7b", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("facebook/chameleon-7b")
prompt = "Language modeling is "
inputs = tokenizer.encode(prompt, return_pt=True)
out = model.generate(inputs, max_new_tokens=40)
tokenizer.decode(out)

Reference output:

Language modeling is “the task of predicting the next word in a sequence of text given the previous words.”

In other words, it's a machine learning model that takes a sequence of text as input

Partially addresses #7995.

@github-actions github-actions Bot added the python python script changes label Jul 17, 2024
@nopperl
Copy link
Copy Markdown
Contributor Author

nopperl commented Jul 17, 2024

I have uploaded GGUFs to test this PR with here.

@mofosyne mofosyne added the Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level label Jul 19, 2024
Comment thread gguf-py/gguf/gguf_writer.py Outdated
Comment thread src/llama.cpp Outdated
Comment thread src/llama.cpp Outdated
Comment thread src/llama.cpp Outdated
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 15, 2024
Co-Authored-By: nopperl <54780682+nopperl@users.noreply.github.com>
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Aug 16, 2024
Co-Authored-By: nopperl <54780682+nopperl@users.noreply.github.com>
@nate-lrt
Copy link
Copy Markdown

will this ever get added :(

@nopperl
Copy link
Copy Markdown
Contributor Author

nopperl commented Sep 26, 2024

I think it would still be a good addition. I've resolved all conflicts with master now, so it should be ready to merge.

@ggerganov ggerganov merged commit 9a91311 into ggml-org:master Sep 28, 2024
@arch-btw
Copy link
Copy Markdown
Contributor

Thank you @nopperl looks like it got merged!

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <git@compilade.net>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <git@compilade.net>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <git@compilade.net>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <git@compilade.net>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <git@compilade.net>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <git@compilade.net>
@MasterScrat
Copy link
Copy Markdown

@nopperl any plans to tackle image->text and text->image?

@nopperl
Copy link
Copy Markdown
Contributor Author

nopperl commented Dec 19, 2024

@MasterScrat currently no plans, sorry for the late reply. AFAIK multimodal support would require a refactor of llama.cpp (#8010 (comment)). I'd love to work on it, but don't have the time right now.

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
* convert chameleon hf to gguf

* add chameleon tokenizer tests

* fix lint

* implement chameleon graph

* add swin norm param

* return qk norm weights and biases to original format

* implement swin norm

* suppress image token output

* rem tabs

* add comment to conversion

* fix ci

* check for k norm separately

* adapt to new lora implementation

* fix layer input for swin norm

* move swin_norm in gguf writer

* add comment regarding special token regex in chameleon pre-tokenizer

* Update src/llama.cpp

Co-authored-by: compilade <git@compilade.net>

* fix punctuation regex in chameleon pre-tokenizer (@compilade)

Co-authored-by: compilade <git@compilade.net>

* fix lint

* trigger ci

---------

Co-authored-by: compilade <git@compilade.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants