Add heterogeneous config support (per-layer configuration)#45333
Open
eladsegal wants to merge 4 commits intohuggingface:mainfrom
Open
Add heterogeneous config support (per-layer configuration)#45333eladsegal wants to merge 4 commits intohuggingface:mainfrom
eladsegal wants to merge 4 commits intohuggingface:mainfrom
Conversation
Contributor
Author
|
@askliar |
Collaborator
ArthurZucker
left a comment
There was a problem hiding this comment.
Reviewed this first: #45332 (review) !
I think we want stuff tto be explicit and "simple".
We do have 2 choices basically:
-
per_layer_config: List[PreTrainedConfig]
-
per_layer_hidden_size: List[int]
-
Per layer configs would be quite simple, we find a way to have a minimal serialization to on serialize the actual per layer. I like that a bit less? But its the most convenient for modeling changes / inheritance
-
With this, we can have
PreTrainedConfigjust init the per-layer-configs based on the lists. This means nice to understand serialization, and you just parse theper_layer_<key>intoConfig(<key> = value).
WDYT?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds heterogeneous model support - the ability for individual layers to differ from the global config (e.g., different
intermediate_size,num_key_value_heads) and to skip sub-modules entirely (MLP, attention, etc.). This enables models where layers are not uniform, as in pruned, distilled, or NAS-derived architectures.Examples of such models:
These models previously required
trust_remote_code=Trueto modify the classes of the model they are derived from. With this PR (and the follow-up modeling PR #45332), heterogeneous support requires just a few lines.This PR contains the configuration layer only. The modeling, cache, and masking changes that consume this config are in #45332.
How it works
Configuration (
per_layer_config)A new
per_layer_configparameter onPreTrainedConfigmaps layer indices to attribute overrides:Under the hood,
apply_heterogeneous_configvalidates the overrides, computes fallback values, and stores aHeterogeneitySpecon the config. When all layers agree on an attribute value, it's promoted back to a global attribute, for clarity. The config supports fullsave_pretrained/from_pretrainedround-trips (keys are zero-padded for correct JSON sort order).Accessing a per-layer attribute on the global config raises
AttributeErrorwith a helpful message directing toconfig.get_full_layer_config(i).Key changes
src/transformers/heterogeneity/package - configuration utilities (LayerConfig,HeterogeneitySpec, validation, serialization)configuration_utils.py-per_layer_configproperty,is_heterogeneous,get_full_layer_config(), serialization hooks, and__getattribute__guard for per-layer attributesTests
Test suite in
tests/heterogeneity/test_configuration_utils.pycovering per-layer overrides and fallback, uniform value promotion, validation, and save/load round-trip.Who can review?
@ArthurZucker
@hmellor