-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Apply GradientCheckpointingLayer to the whole repo #38913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Cyrilvallez
merged 148 commits into
huggingface:main
from
qubvel:gradient-checkpointing-layer-propagation
Jun 23, 2025
Merged
Changes from all commits
Commits
Show all changes
148 commits
Select commit
Hold shift + click to select a range
a97ca9f
first batch (4)
qubvel 2627646
align
qubvel c8926f7
altclip
qubvel ae1b29a
beit
qubvel 4dff076
bert
qubvel 0d0d8c7
yolos
qubvel 8b66428
dino, pvt_v2
qubvel 0d387eb
bark, bart, bert_generation
qubvel 6faee3f
big_bird, biogpt
qubvel 3f34606
blnderbot, bloom
qubvel 3bb70d9
bridgetower
qubvel 5757f3e
camambert, canine, chameleon
qubvel c59a7d5
chinese clip, clap, clip
qubvel d7cb795
codegen, conditional detr, convbert
qubvel 39784f7
dab_detr, data2vec
qubvel 203348d
dbrx, deberta
qubvel b2719f3
deberta, decicion_tranformer, deformable_detr
qubvel 2ed2c5b
deit, deta, mctct
qubvel 87704a7
detr, dinov2, distilbert
qubvel cd69033
donut, dpt, electra
qubvel 9a54ad1
ernie, esm, falcon
qubvel 6855515
flava, fnet, falcon_mamba
qubvel f4f8319
focalnet, git, gpt2
qubvel b8f4ecf
gpt - bigcode, neo, neox
qubvel d844b12
gptj, groupvit
qubvel 700d20d
idefics2, idefics3
qubvel 0b3ffba
ijepa, imagegpt, internvl
qubvel 9ed27ef
jetmoe, kosmos2, layoutlm
qubvel 6d3ecbc
layoutlm2-3, led
qubvel e398d8e
lilt, longformer, longt5, luke
qubvel 4363156
m2m, mamba1-2
qubvel dde58de
marian, markuplm, mask2former
qubvel 69b2cf8
maskformer
qubvel d4ccb79
mbart, megatron_bert, mimi
qubvel ab213da
mixtral, mlcd
qubvel cb90916
mobilevit1-2, modernbert
qubvel c2d3cbc
moshi, mpt, mra
qubvel 80bcd7c
mt5, musicgen
qubvel 825e2b1
mvp, nemotron
qubvel 8f6a8fb
nllb_moe
qubvel 6253d78
nystromformer, omdet_turbo
qubvel ab136ef
opt, owlvit, owlv2
qubvel 3fb64a9
pegasus, pegasus_x, presimmon
qubvel 32b2876
phimoe, pix2struct, pixtral
qubvel 942f7a4
plbart, pop2piano, prophetnet
qubvel b083c86
qwen2*
qubvel 429ba11
qwen2, qwen3 moe, rec gemma
qubvel cec0d32
rembert
qubvel bec1fcd
roberta
qubvel 254882f
roberta prelayernorm
qubvel a1a7fda
roc_bert, roformer, rwkv
qubvel d497df9
sam, sam_hq
qubvel 987a880
seggpt, smolvlm, speech_to_text
qubvel 6ef90e1
splinter, stablelm, swin
qubvel 1b7cc3f
swin2sr, switch_transformer, t5, table_transformer
qubvel 5331bc2
tapas, time_series_tranformer, timesformer
qubvel dfe3d8d
trocr, tvp, umt5
qubvel c001253
videomae, vilt, visual_bert
qubvel 76dd7a5
vit, vit_mae, vit_msn
qubvel 0bc5335
vitpose_backbone, vits, vivit
qubvel 43992d9
whisper. x_clip, xglm
qubvel 461961b
xlm_roberta, xmod
qubvel cf470fd
yoso
qubvel 626dde0
zamba
qubvel 59f8879
vitdet, wav2vec2, wav2vec2_bert
qubvel b89a5db
unispeech, wav2vec_conformer
qubvel db524cb
wavlm
qubvel 96db85e
speecht5
qubvel 279041b
swinv2
qubvel 5a3b571
sew / _d
qubvel b1d78cd
seamless_mt4 / _v2
qubvel 9a6d135
deprecated models update
qubvel a18e257
bros
qubvel 66d0a62
gemma2, gemma3
qubvel c0e5690
got, hiera, hubert, llama4, mllama, oneformer, phi, olmoe, informer
qubvel 0942755
fixup
qubvel fe80395
Add use_cache=False and past_key_value=None to GradientCheckpointing…
qubvel d7963dc
fixup
qubvel 73d5614
fix prophetnet
qubvel cd7a426
fix bigbird_pegasus
qubvel 56cb34b
fix blenderbot
qubvel 0347dde
fix mbart
qubvel e83086d
fix mvp
qubvel afbfd62
fix zamba2
qubvel 68f317c
fix bart
qubvel 98fb670
fix blenderbot_small
qubvel 2fd38a3
fix codegen
qubvel 5347767
Update gradient checkpointing layer to support more past_key_values a…
qubvel 10f5fd1
fix data2vec vision
qubvel 792cd7d
fix deformable_detr
qubvel 36415c3
fix gptj
qubvel ff802fe
fix led
qubvel fc14014
fix m2m_100
qubvel f2cc865
add comment
qubvel eab402d
fix nnlb_moe
qubvel aa1f574
Fix pegasus_x
qubvel 7c9d17d
fix plbart
qubvel 5da2216
udop
qubvel 999584c
fix-copies: beit, wav2vec2
qubvel ff33682
fix gpt_bigcode
qubvel c28e913
fixup
qubvel 8104bfb
fix t5
qubvel f9a2db8
fix switch_transformers
qubvel fe1133e
fix longt5
qubvel e51772a
fix mt5
qubvel 69a6a78
update tapas
qubvel eb20826
fix blip2
qubvel eba9a9a
update blip
qubvel aa71309
fix musicgen
qubvel c0b3084
fix gpt2, trocr
qubvel b6ac147
fix copies
qubvel 481ae6f
!!! Revert zamba, mllama
qubvel 07e0995
update autoformer
qubvel a2c8bd6
update bros
qubvel 3160753
update args / kwargs for BERT and copies
qubvel 1f3d7b0
2nd round of updates
qubvel 32433aa
update conditional detr
qubvel 95365b1
Pass encoder_hidden_states as positional arg
qubvel aad2b9e
Update to pass encoder_decoder_position_bias as positional arg
qubvel d34726d
fixup
qubvel 55011be
biogpt modular
qubvel a71d201
modular gemma2
qubvel 0d72857
modular gemma3
qubvel 522df43
modular gpt_neox
qubvel 30a9a90
modular informer
qubvel 89a2c68
modular internvl
qubvel 633378c
modular mixtral
qubvel 8493bad
modular mlcd
qubvel adf5c60
modular modernbert
qubvel 6270ff7
modular phi
qubvel 3ad1fa9
modular qwen2_5_omni
qubvel 7626b31
modular qwen2_5_vl
qubvel e3c61ce
modular sam_hq
qubvel 01934cf
modular sew
qubvel 62683dc
wav2vec2_bert
qubvel 28bd09c
modular wav2vec2_conformer
qubvel 5bc6525
modular wavlm
qubvel 31dbec4
fixup
qubvel b989ba6
Update by modular instructblipvideo
qubvel cdb4c70
modular data2vec_audio
qubvel d50dd86
nit modular mistral
qubvel 4c5aa0b
apply modular minimax
qubvel 6585288
fix modular moonshine
qubvel 4ac7c96
revert zamba2
qubvel 58847e7
fix mask2former
qubvel 2e4e2b1
Merge branch 'main' into gradient-checkpointing-layer-propagation
qubvel 9b8e965
refactor idefics
qubvel 8a8898c
Merge branch 'main' into gradient-checkpointing-layer-propagation
qubvel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update for GradientCheckpointingLayer