Support for BharatGen's Param2MoE model architecture by bhargav-patel-29 · Pull Request #43888 · huggingface/transformers

bhargav-patel-29 · 2026-02-10T11:02:13Z

What does this PR do?

This PR adds support for Param-2-17B-MoE-A2.4B, a large-scale Mixture-of-Experts (MoE) causal language model.

Param-2-17B-MoE-A2.4B uses a Hybrid Dense + MoE architecture with 17B total parameters while activating only 2.4B parameters per token, enabling high model capacity with efficient inference cost.

The model is pretrained from scratch with strong multilingual capabilities and particular emphasis on linguistic diversity and Indian language representation. It is released as a pretrained base model intended for downstream fine-tuning.

This PR introduces the following components:

configuration_param2moe.py
- Implements Param2MoEConfig
- Defines model hyperparameters including transformer architecture, MoE routing configuration, number of experts, hidden sizes, and other initialization parameters.
modeling_param2moe.py
- Implements:
  - Param2MoEModel
  - Param2MoEForCausalLM
- Includes hybrid transformer blocks with MoE routing and expert dispatch logic.
- Supports causal language modeling and text generation workflows.
__init__.py
- Registers the model configuration and modeling classes to enable loading via:
  - AutoConfig
  - AutoModel
  - AutoModelForCausalLM

This integration enables seamless loading, inference, and downstream fine-tuning using standard Transformers APIs.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Text models:
@ArthurZucker
@Cyrilvallez

Rocketknight1 · 2026-02-10T15:08:40Z

Hi @bhargav-patel-29, thank you for the PR! Before we review, can you convert the PR to modular form by adding the changes in a modular file instead of a modeling file and then running make fix-repo to autogenerate the rest? Take a look at other recent PRs like Jais2 to see the structure, and the other stuff like auto mappings that a model PR needs: https://github.com/huggingface/transformers/pull/42684/changes

…oe.py & added tests file

…as suggested in CI/CD pipeline checks

github-actions · 2026-04-16T08:55:54Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, param2moe

github-actions · 2026-04-16T09:31:16Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=43888&sha=d46c70

bhargav-patel-29 added 2 commits February 10, 2026 10:59

support for param2moe

9bd830b

Merge branch 'huggingface:main' into main

1a103e5

bhargav-patel-29 added 7 commits February 11, 2026 04:09

updated models/__init__.py and configuration_param2moe.py files

d92d107

updated configuration.py

2741a3d

Merge branch 'huggingface:main' into main

dbb61a4

updated configuration_auto.py, modeling_auto.py configuration_param2m…

6ce0a05

…oe.py & added tests file

updated configuration modeling test files

c38e2e3

Merge branch 'huggingface:main' into main

4cdc4bc

Updated __init__.py modeling_param2moe.py configuration_param2moe.py …

b585af5

…as suggested in CI/CD pipeline checks

updated imports in test_modeling_param2moe.py

d46c70f

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for BharatGen's Param2MoE model architecture#43888

Support for BharatGen's Param2MoE model architecture#43888
bhargav-patel-29 wants to merge 10 commits intohuggingface:mainfrom
Bharatgen-Tech:main

bhargav-patel-29 commented Feb 10, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Feb 10, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bhargav-patel-29 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Feb 10, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

github-actions Bot commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bhargav-patel-29 commented Feb 10, 2026 •

edited

Loading