Add heterogeneous config support (per-layer configuration) by eladsegal · Pull Request #45333 · huggingface/transformers

eladsegal · 2026-04-09T06:18:11Z

What does this PR do?

Adds heterogeneous model support - the ability for individual layers to differ from the global config (e.g., different intermediate_size, num_key_value_heads) and to skip sub-modules entirely (MLP, attention, etc.). This enables models where layers are not uniform, as in pruned, distilled, or NAS-derived architectures.

Examples of such models:

Model	Derived from
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5	meta-llama/Llama-3.3-70B-Instruct
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1	meta-llama/Llama-3.1-405B-Instruct
nvidia/gpt-oss-puzzle-88B	openai/gpt-oss-120b

These models previously required trust_remote_code=True to modify the classes of the model they are derived from. With this PR (and the follow-up modeling PR #45332), heterogeneous support requires just a few lines.

This PR contains the configuration layer only. The modeling, cache, and masking changes that consume this config are in #45332.

How it works

Configuration (`per_layer_config`)

A new per_layer_config parameter on PreTrainedConfig maps layer indices to attribute overrides:

from transformers import LlamaConfig

config = LlamaConfig(
    ...,
    per_layer_config={
        0: {"intermediate_size": 64},
        2: {"intermediate_size": 96, "skip_attention": True},
    },
)

Under the hood, apply_heterogeneous_config validates the overrides, computes fallback values, and stores a HeterogeneitySpec on the config. When all layers agree on an attribute value, it's promoted back to a global attribute, for clarity. The config supports full save_pretrained / from_pretrained round-trips (keys are zero-padded for correct JSON sort order).

Accessing a per-layer attribute on the global config raises AttributeError with a helpful message directing to config.get_full_layer_config(i).

Key changes

New src/transformers/heterogeneity/ package - configuration utilities (LayerConfig, HeterogeneitySpec, validation, serialization)
configuration_utils.py - per_layer_config property, is_heterogeneous, get_full_layer_config(), serialization hooks, and __getattribute__ guard for per-layer attributes

Tests

Test suite in tests/heterogeneity/test_configuration_utils.py covering per-layer overrides and fallback, uniform value promotion, validation, and save/load round-trip.

Who can review?

@ArthurZucker
@hmellor

eladsegal · 2026-04-09T07:37:59Z

@askliar
Related to vllm-project/vllm#36512

ArthurZucker

Reviewed this first: #45332 (review) !

I think we want stuff tto be explicit and "simple".

We do have 2 choices basically:

per_layer_config: List[PreTrainedConfig]
per_layer_hidden_size: List[int]
Per layer configs would be quite simple, we find a way to have a minimal serialization to on serialize the actual per layer. I like that a bit less? But its the most convenient for modeling changes / inheritance
With this, we can have PreTrainedConfig just init the per-layer-configs based on the lists. This means nice to understand serialization, and you just parse the per_layer_<key> into Config(<key> = value).

WDYT?

Add heterogeneous config support (per-layer configuration)

bed80d5

eladsegal added 2 commits April 12, 2026 11:12

Merge branch 'main' into heterogeneous-config

a27e6fb

Merge branch 'main' into heterogeneous-config

43e6907

ArthurZucker reviewed Apr 14, 2026

View reviewed changes

Merge branch 'main' into heterogeneous-config

351f643

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add heterogeneous config support (per-layer configuration)#45333

Add heterogeneous config support (per-layer configuration)#45333
eladsegal wants to merge 4 commits intohuggingface:mainfrom
eladsegal:heterogeneous-config

eladsegal commented Apr 9, 2026

Uh oh!

eladsegal commented Apr 9, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eladsegal commented Apr 9, 2026

What does this PR do?

How it works

Configuration (per_layer_config)

Key changes

Tests

Who can review?

Uh oh!

eladsegal commented Apr 9, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Configuration (`per_layer_config`)