Skip to content

Support for BharatGen's Param2MoE model architecture#43888

Open
bhargav-patel-29 wants to merge 10 commits intohuggingface:mainfrom
Bharatgen-Tech:main
Open

Support for BharatGen's Param2MoE model architecture#43888
bhargav-patel-29 wants to merge 10 commits intohuggingface:mainfrom
Bharatgen-Tech:main

Conversation

@bhargav-patel-29
Copy link
Copy Markdown

@bhargav-patel-29 bhargav-patel-29 commented Feb 10, 2026

What does this PR do?

This PR adds support for Param-2-17B-MoE-A2.4B, a large-scale Mixture-of-Experts (MoE) causal language model.

Param-2-17B-MoE-A2.4B uses a Hybrid Dense + MoE architecture with 17B total parameters while activating only 2.4B parameters per token, enabling high model capacity with efficient inference cost.

The model is pretrained from scratch with strong multilingual capabilities and particular emphasis on linguistic diversity and Indian language representation. It is released as a pretrained base model intended for downstream fine-tuning.

This PR introduces the following components:

  • configuration_param2moe.py

    • Implements Param2MoEConfig
    • Defines model hyperparameters including transformer architecture, MoE routing configuration, number of experts, hidden sizes, and other initialization parameters.
  • modeling_param2moe.py

    • Implements:
      • Param2MoEModel
      • Param2MoEForCausalLM
    • Includes hybrid transformer blocks with MoE routing and expert dispatch logic.
    • Supports causal language modeling and text generation workflows.
  • __init__.py

    • Registers the model configuration and modeling classes to enable loading via:
      • AutoConfig
      • AutoModel
      • AutoModelForCausalLM

This integration enables seamless loading, inference, and downstream fine-tuning using standard Transformers APIs.

Fixes # (issue)


Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Text models:
@ArthurZucker
@Cyrilvallez

@Rocketknight1
Copy link
Copy Markdown
Member

Hi @bhargav-patel-29, thank you for the PR! Before we review, can you convert the PR to modular form by adding the changes in a modular file instead of a modeling file and then running make fix-repo to autogenerate the rest? Take a look at other recent PRs like Jais2 to see the structure, and the other stuff like auto mappings that a model PR needs: https://github.com/huggingface/transformers/pull/42684/changes

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, param2moe

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43888&sha=d46c70

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants