Skip to content

Adding support for Nandi Models#45101

Open
HemanthSai7 wants to merge 9 commits intohuggingface:mainfrom
HemanthSai7:nandi_v1
Open

Adding support for Nandi Models#45101
HemanthSai7 wants to merge 9 commits intohuggingface:mainfrom
HemanthSai7:nandi_v1

Conversation

@HemanthSai7
Copy link
Copy Markdown

@HemanthSai7 HemanthSai7 commented Mar 29, 2026

Co-authored-by: Vishesht27

This PR adds support for codes for the upcoming Nandi series models. We also appreciate the valuable feedback and thorough review provided by @vasqu and @ArthurZucker 🤗🙏

Co-authored-by: Vishesht27 vishesht27@gmail.com
@HemanthSai7
Copy link
Copy Markdown
Author

HemanthSai7 commented Apr 1, 2026

We are the team at RTA AI Labs, a tiny but passionate startup dedicated to making high-performance language models more accessible through efficient architecture. Today, we are excited (and a little nervous!) to submit this PR to add support for Nandi, our custom "smol" model series. As a very small team, we have poured our hearts, late nights, and limited resources into building Nandi. We believe that the future of AI belongs to efficient, edge-compatible models, and we’ve designed Nandi to punch significantly above its weight class in terms of reasoning and throughput. Bringing Nandi to the Hugging Face ecosystem is a massive milestone for us. It is the "make or break" step for our upcoming release, as it will allow the community to easily fine-tune, deploy, and experiment with what we've built.

Why this PR matters

  • Architecture: Combines Factorized embeddings, Grouped Query Attention (GQA), RoPE, layer sharing for high efficiency.
  • Community Impact: Enables developers to build and deploy capable models without large-scale compute.

A small note to the maintainers
We know how incredibly busy the transformers maintainers are, and we have the utmost respect for the work you do in keeping this ecosystem thriving. For a small lab like ours, getting this PR reviewed and merged is more than just a technical update, it is the foundation of our startup’s mission. We have done our best to follow the contribution guidelines strictly to make the review process as smooth as possible for you. We are standing by to make any requested changes immediately.

@Rocketknight1 @vasqu @ArthurZucker @xenova @zucchini-nlp 🤗

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 2, 2026

Heya super excited to see this 👋

Just as a heads up, we have holidays + torch conference so reviews will be delayed for at least a week ish. Not sure if I will be the one reviewing or someone else, but as a first step it would be best to fully utilize modular https://huggingface.co/docs/transformers/v5.5.0/en/modular_transformers#implementing-a-modular-file

Appreciate the work, and dont hesitate to ping us 😄

@HemanthSai7
Copy link
Copy Markdown
Author

Hi @vasqu, thanks for the heads up! I’ve updated the PR to fully utilise the modular transformer format as suggested in the documentation. Looking forward to your feedback whenever you're back.

@xenova
Copy link
Copy Markdown
Contributor

xenova commented Apr 14, 2026

Copy link
Copy Markdown
Contributor

@xenova xenova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love seeing smaller models! Just an FYI before the main reviewers get to this... your usage of modular is not correct, as I see many classes which are mostly duplicates of existing implementations. Take a look at how a modular file like https://github.com/huggingface/transformers/blob/def5e6864fe4f2bbd7f056f37366f4dd0d693097/src/transformers/models/apertus/modular_apertus.py is laid out.

e.g.,

class ApertusRMSNorm(LlamaRMSNorm):
    pass


class ApertusRotaryEmbedding(LlamaRotaryEmbedding):
    pass

are valid usages of modular.

config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias
)

@deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58")
Copy link
Copy Markdown
Contributor

@xenova xenova Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to deprecate here. I don't think v5 shouldn't have these decorators anymore :)

return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"


class NandiRotaryEmbedding(nn.Module):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

identical to normal LlamaRotaryEmbedding afaict.

config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias
)

@deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

self.input_layernorm = NandiRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
self.post_attention_layernorm = NandiRMSNorm(config.hidden_size, eps=config.rms_norm_eps)

@deprecate_kwarg("past_key_value", new_name="past_key_values", version="4.58")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, nandi

@HemanthSai7
Copy link
Copy Markdown
Author

Hi @xenova , I’ve inherited the Llama modules wherever necessary and followed the modular structure closely. I also removed the deprecate_kwargs decorator. Please let me know if there are any other changes you’d like me to make.

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=45101&sha=b384e1

@Vishesht27
Copy link
Copy Markdown

hi @xenova , we have made all changes from our side, can you review and please give us some feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants