Implement the OLMo architecture#6741
Conversation
|
Will upload the GGUF conversions here: https://huggingface.co/collections/nopperl/olmo-gguf-66211a0071b6c3d66303fcf1 |
This comment has been minimized.
This comment has been minimized.
|
@phymbert thanks for checking, I've removed the superfluous code now. The generation results are not affected. |
|
As an aside, I confirm that it works with the new allenai/OLMo-1.7-7B-hf as well. |
Thank you very much! Do you think that some of them would be useful for coding prompts like in CodeGPT? |
If you want to test its performance, I recommend starting from the largest one (f16). However, I did try it on a few coding prompts and I cannot really recommend it for that. For the same weight class (7B), there are better models like deepseek-ai/deepseek-coder-7b-instruct-v1.5 (for QA) or bigcode/starcoder2-7b (for code completion). Bear in mind that this is still a base model, so it will perform worse than instruction-tuned models on these tasks. |
* implement olmo architecture * remove unused variable * remove unused moe branch * remove check for weight * remove superfluous moe, bias and rope tensors * clarified comment * fix clamp_kqv setting * remove obsolete parameter name filter
* implement olmo architecture * remove unused variable * remove unused moe branch * remove check for weight * remove superfluous moe, bias and rope tensors * clarified comment * fix clamp_kqv setting * remove obsolete parameter name filter
Implements the recently released open-source OLMo architecture. Tested with allenai/OLMo-1B-hf and allenai/OLMo-7B-hf, should work with allenai/OLMo-1.7-7B and the future OLMo-70B as well. Fixes #5408.
Implementation differences from Llama:
Test:
Output:
Reference (requires transformers>=4.40.0.dev0):
Output:
Note:
llm_load_vocabshowsmismatch in special tokens definition ( 31/50304 vs 52/50304 ). This is due to special tokens 50254 to 50276, which are sequences of" "of varying length.