Adding support for Nandi Models#45101
Conversation
Co-authored-by: Vishesht27 vishesht27@gmail.com
|
We are the team at RTA AI Labs, a tiny but passionate startup dedicated to making high-performance language models more accessible through efficient architecture. Today, we are excited (and a little nervous!) to submit this PR to add support for Nandi, our custom "smol" model series. As a very small team, we have poured our hearts, late nights, and limited resources into building Nandi. We believe that the future of AI belongs to efficient, edge-compatible models, and we’ve designed Nandi to punch significantly above its weight class in terms of reasoning and throughput. Bringing Nandi to the Hugging Face ecosystem is a massive milestone for us. It is the "make or break" step for our upcoming release, as it will allow the community to easily fine-tune, deploy, and experiment with what we've built. Why this PR matters
A small note to the maintainers |
|
Heya super excited to see this 👋 Just as a heads up, we have holidays + torch conference so reviews will be delayed for at least a week ish. Not sure if I will be the one reviewing or someone else, but as a first step it would be best to fully utilize modular https://huggingface.co/docs/transformers/v5.5.0/en/modular_transformers#implementing-a-modular-file Appreciate the work, and dont hesitate to ping us 😄 |
|
Hi @vasqu, thanks for the heads up! I’ve updated the PR to fully utilise the modular transformer format as suggested in the documentation. Looking forward to your feedback whenever you're back. |
xenova
left a comment
There was a problem hiding this comment.
Love seeing smaller models! Just an FYI before the main reviewers get to this... your usage of modular is not correct, as I see many classes which are mostly duplicates of existing implementations. Take a look at how a modular file like https://github.com/huggingface/transformers/blob/def5e6864fe4f2bbd7f056f37366f4dd0d693097/src/transformers/models/apertus/modular_apertus.py is laid out.
e.g.,
class ApertusRMSNorm(LlamaRMSNorm):
pass
class ApertusRotaryEmbedding(LlamaRotaryEmbedding):
passare valid usages of modular.
| config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias | ||
| ) | ||
|
|
||
| @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58") |
There was a problem hiding this comment.
no need to deprecate here. I don't think v5 shouldn't have these decorators anymore :)
| return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}" | ||
|
|
||
|
|
||
| class NandiRotaryEmbedding(nn.Module): |
There was a problem hiding this comment.
identical to normal LlamaRotaryEmbedding afaict.
| config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias | ||
| ) | ||
|
|
||
| @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58") |
| self.input_layernorm = NandiRMSNorm(config.hidden_size, eps=config.rms_norm_eps) | ||
| self.post_attention_layernorm = NandiRMSNorm(config.hidden_size, eps=config.rms_norm_eps) | ||
|
|
||
| @deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58") |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, nandi |
|
Hi @xenova , I’ve inherited the Llama modules wherever necessary and followed the modular structure closely. I also removed the deprecate_kwargs decorator. Please let me know if there are any other changes you’d like me to make. |
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45101&sha=b384e1 |
|
hi @xenova , we have made all changes from our side, can you review and please give us some feedback |
Co-authored-by: Vishesht27
This PR adds support for codes for the upcoming Nandi series models. We also appreciate the valuable feedback and thorough review provided by @vasqu and @ArthurZucker 🤗🙏