Support StableLM2 12B#6635
Conversation
Galunid
left a comment
There was a problem hiding this comment.
Is this working, or work in progress?
|
Since you require specific branch to convert, perhaps it'd be a good idea to warn user if they are using |
By then the user would already have downloaded more than 20GB of model files. Ideally, the q and k layernorms should be stacked during conversion (similarly to how mixtral's expert tensors are concatenated) if they aren't already and if they're present. |
38a4de3 to
29d940b
Compare
Done, thanks for pointing this out. Makes the conversion a lot simpler. |
…evert change to base class `write_tensors()`
compilade
left a comment
There was a problem hiding this comment.
Hopefully this helps with correcting the flake8 linter errors
|
@ggerganov does this look good to merge now? |
| if (model.layers[il].ffn_norm) { | ||
| // non-parallel residual | ||
| cur = ggml_add(ctx0, cur, ffn_inp); | ||
| } else { | ||
| // add together residual + FFN + self-attention | ||
| cur = ggml_add(ctx0, cur, inpL); | ||
| cur = ggml_add(ctx0, cur, attn_out); | ||
| } |
There was a problem hiding this comment.
Aren't these 2 branches equivalent?
There was a problem hiding this comment.
I don't believe so. One is doing parallel residual (eg 12b) and the other when the ffn norm is present (eg stablelm 1.6 and 3b) is not doing parallel residual. If I am missing something please let me know thanks !
There was a problem hiding this comment.
Since ffn_inp = attn_out + inpL I think these branches do the same and can be replaced by simply with:
cur = ggml_add(ctx0, cur, ffn_inp);I am looking for ways to avoid the unused ffn_inp = ggml_add(...) in the parallel-residual case
There was a problem hiding this comment.
Since
ffn_inp = attn_out + inpLI think these branches do the same
Reasoning from the relevant modeling code in transformers, even though they separate them for clarity, I think you're right, theses branches do the same thing.
There was a problem hiding this comment.
@ggerganov Thanks! I removed the branches. And re-ran on 1.6B, 3B and 12B no problems. Please let me know if there is anything else !
Galunid
left a comment
There was a problem hiding this comment.
Do you want to do anything else, or can we merge?
|
All done from my side |
* StableLM2 12B support for huggingface -> GGUF * StableLM12 tensormapping and constants * StableLM-2-12b model support * fix * Added 12B support * Removed autoformatting; resolved bug where model_arch was not selecting StableLM2 * Formatting * Do QK norm stacking in model conversion step * Converge StableLM and StableLM2 code to simplify graph construction * Fix accidental removal * Removed warnings * Revert formatter * Move QK norm stack to private function so it's easier to read * refactor stablelm graph builder to support 1.6, 3b and 12b more efficiently * Proper check for None type for new_name to avoid crash; formatting; revert change to base class `write_tensors()` * Format * Formatting * format Co-authored-by: compilade <git@compilade.net> * Fix incorrect check for K norm * space after commas; Keep indentation multiple of 4 spaces * Flake8 format * Removed unnecessary conditional branches * Removed unused comment * Fixed incorrect tensor passing * Format --------- Co-authored-by: compilade <git@compilade.net>
* StableLM2 12B support for huggingface -> GGUF * StableLM12 tensormapping and constants * StableLM-2-12b model support * fix * Added 12B support * Removed autoformatting; resolved bug where model_arch was not selecting StableLM2 * Formatting * Do QK norm stacking in model conversion step * Converge StableLM and StableLM2 code to simplify graph construction * Fix accidental removal * Removed warnings * Revert formatter * Move QK norm stack to private function so it's easier to read * refactor stablelm graph builder to support 1.6, 3b and 12b more efficiently * Proper check for None type for new_name to avoid crash; formatting; revert change to base class `write_tensors()` * Format * Formatting * format Co-authored-by: compilade <git@compilade.net> * Fix incorrect check for K norm * space after commas; Keep indentation multiple of 4 spaces * Flake8 format * Removed unnecessary conditional branches * Removed unused comment * Fixed incorrect tensor passing * Format --------- Co-authored-by: compilade <git@compilade.net>




Support for https://huggingface.co/stabilityai/stablelm-2-12b-chat, resolving #6553