fix: prevent accelerate from splitting vision encoder by setting _no_…#43047
fix: prevent accelerate from splitting vision encoder by setting _no_…#43047ArthurZucker merged 3 commits intohuggingface:mainfrom
Conversation
|
can you fix the consistency (python utils/modular_model_converter.py) |
…dular files and regenerate - Update modular_pe_audio_video.py and modular_pe_video.py (source of truth) - Regenerate modeling_pe_audio_video.py and modeling_pe_video.py via modular_model_converter.py - Remove @unittest.skip on test_model_parallelism now that the crash is resolved Fixes huggingface#42918
|
@ArthurZucker, sorry for being late. Fixed, updated |
…_audio, pe_video, pe_audio_video
|
[For maintainers] Suggested jobs to run (before merge) run-slow: pe_audio, pe_audio_video, pe_video |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
huggingface#43047) * fix: add TimmWrapperForImageClassification to _no_split_modules in modular files and regenerate - Update modular_pe_audio_video.py and modular_pe_video.py (source of truth) - Regenerate modeling_pe_audio_video.py and modeling_pe_video.py via modular_model_converter.py - Remove @unittest.skip on test_model_parallelism now that the crash is resolved Fixes huggingface#42918 * fix: add TimmWrapperForImageClassification to _no_split_modules in pe_audio, pe_video, pe_audio_video --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This PR resolves the model parallelism crash in
PeVideoandPeAudioVideomodels by adding_no_split_modules = ["TimmWrapperForImageClassification"]to their configuration. Currently,acceleratenaively splits thetimm-based vision encoder layer-by-layer across devices, breaking internal residual connections and causing aRuntimeErrorduring distributed training. By explicitly marking the wrapper as a non-splittable module, we ensure the vision encoder remains atomic on a single device, restoring stability for FSDP and model parallelism workflows as verified by the now-passingtest_model_parallelismunit tests.Fixes #42918