Fix DeepSpeed mixed precision precedence over Accelerate defaults#39856
Merged
ArthurZucker merged 2 commits intohuggingface:mainfrom Sep 11, 2025
Merged
Fix DeepSpeed mixed precision precedence over Accelerate defaults#39856ArthurZucker merged 2 commits intohuggingface:mainfrom
ArthurZucker merged 2 commits intohuggingface:mainfrom
Conversation
Contributor
Author
bfea358 to
158389a
Compare
Member
|
cc @SunMarc |
158389a to
abcfc42
Compare
abcfc42 to
ad97cfe
Compare
Contributor
Author
Contributor
|
Hey, can you add some tests for this behaviour? |
ad97cfe to
6e81814
Compare
notkisk
added a commit
to notkisk/transformers
that referenced
this pull request
Aug 19, 2025
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
notkisk
added a commit
to notkisk/transformers
that referenced
this pull request
Aug 19, 2025
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
b314333 to
55e9838
Compare
Contributor
Author
|
cc @S1ro1 |
Resolves issue where Accelerate would default to bf16 mixed precision when a DeepSpeed config specifies fp16, causing a ValueError. The fix ensures DeepSpeed config takes precedence over TrainingArguments defaults while preserving explicit user settings. Changes: - Add override_training_args_from_deepspeed() method to handle config precedence - Reorder mixed precision environment variable setting in TrainingArguments - Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices Fixes huggingface#39849
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
55e9838 to
aed6ba5
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Contributor
Author
ArthurZucker
approved these changes
Sep 11, 2025
Collaborator
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM Thanks for the PR
SunMarc
reviewed
Sep 11, 2025
Comment on lines
+147
to
+175
| user_set_fp16 = args.fp16 is True | ||
| user_set_bf16 = args.bf16 is True | ||
|
|
||
| if self.is_true("fp16.enabled"): | ||
| # DeepSpeed config explicitly enables fp16 | ||
| if not user_set_fp16 and not user_set_bf16: | ||
| # User didn't explicitly set either, so apply DeepSpeed config | ||
| args.fp16 = True | ||
| args.bf16 = False | ||
| elif user_set_bf16 and not user_set_fp16: | ||
| # User explicitly chose bf16, but DeepSpeed config wants fp16 | ||
| # This is a potential conflict - let user choice win but log a warning | ||
| pass # Keep user's bf16=True, fp16=False | ||
| elif self.is_true("bf16.enabled"): | ||
| # DeepSpeed config explicitly enables bf16 | ||
| if not user_set_fp16 and not user_set_bf16: | ||
| # User didn't explicitly set either, so apply DeepSpeed config | ||
| args.bf16 = True | ||
| args.fp16 = False | ||
| elif user_set_fp16 and not user_set_bf16: | ||
| # User explicitly chose fp16, but DeepSpeed config wants bf16 | ||
| # This is a potential conflict - let user choice win but log a warning | ||
| pass # Keep user's fp16=True, bf16=False | ||
| elif self.is_false("fp16.enabled") and self.is_false("bf16.enabled"): | ||
| # Both are explicitly disabled in DeepSpeed config | ||
| if not user_set_fp16 and not user_set_bf16: | ||
| # User didn't explicitly set either, so apply DeepSpeed config (fp32) | ||
| args.fp16 = False | ||
| args.bf16 = False |
Member
There was a problem hiding this comment.
I feel like this could have been simpler
Comment on lines
+1436
to
+1458
| @require_deepspeed | ||
| class TestDeepSpeedMixedPrecisionPrecedence(TestCasePlus): | ||
| """Test DeepSpeed mixed precision precedence over Accelerate defaults.""" | ||
|
|
||
| def setUp(self): | ||
| super().setUp() | ||
| unset_hf_deepspeed_config() | ||
|
|
||
| def tearDown(self): | ||
| super().tearDown() | ||
| unset_hf_deepspeed_config() | ||
|
|
||
| def test_deepspeed_fp16_overrides_defaults(self): | ||
| """Test that DeepSpeed fp16 config overrides TrainingArguments defaults""" | ||
| from transformers.integrations.deepspeed import HfTrainerDeepSpeedConfig | ||
|
|
||
| args = TrainingArguments(output_dir="./test_output", fp16=False, bf16=False) | ||
| ds_config = {"fp16": {"enabled": True}, "bf16": {"enabled": False}, "zero_optimization": {"stage": 2}} | ||
| hf_ds_config = HfTrainerDeepSpeedConfig(ds_config) | ||
| hf_ds_config.trainer_config_process(args) | ||
| self.assertTrue(args.fp16) | ||
| self.assertFalse(args.bf16) | ||
|
|
Member
There was a problem hiding this comment.
it would be nice to add a test that reproduce the initial error
Contributor
Author
Member
|
Well, the issue is that we shouldn't overwrite training_args default. Hence i'm reverting this PR. The issue with the original issue was that mixed_precision was set to bf16 somehow |
vijayabhaskar-ev
pushed a commit
to vijayabhaskar-ev/transformers
that referenced
this pull request
Oct 2, 2025
…ggingface#39856) * Fix DeepSpeed mixed precision precedence over Accelerate defaults Resolves issue where Accelerate would default to bf16 mixed precision when a DeepSpeed config specifies fp16, causing a ValueError. The fix ensures DeepSpeed config takes precedence over TrainingArguments defaults while preserving explicit user settings. Changes: - Add override_training_args_from_deepspeed() method to handle config precedence - Reorder mixed precision environment variable setting in TrainingArguments - Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices Fixes huggingface#39849 * Add tests for DeepSpeed mixed precision precedence fix - Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
yuchenxie4645
pushed a commit
to yuchenxie4645/transformers
that referenced
this pull request
Oct 4, 2025
…ggingface#39856) * Fix DeepSpeed mixed precision precedence over Accelerate defaults Resolves issue where Accelerate would default to bf16 mixed precision when a DeepSpeed config specifies fp16, causing a ValueError. The fix ensures DeepSpeed config takes precedence over TrainingArguments defaults while preserving explicit user settings. Changes: - Add override_training_args_from_deepspeed() method to handle config precedence - Reorder mixed precision environment variable setting in TrainingArguments - Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices Fixes huggingface#39849 * Add tests for DeepSpeed mixed precision precedence fix - Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests - Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults - Test user explicit settings being preserved over DeepSpeed config - Test precedence hierarchy: user settings > DeepSpeed config > defaults - Replace massive 934-line test bloat with concise 50-line test suite - Tests cover core functionality of PR huggingface#39856 mixed precision precedence fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes issue [#39849] where Accelerate would default to
bf16mixed precision even when a DeepSpeed config specifiesfp16, causing the following error:This PR ensures that DeepSpeed configuration takes precedence over
TrainingArgumentsdefaults while preserving explicit user settings.Root Cause
The issue was caused by the initialization order in
TrainingArguments.__post_init__(). TheACCELERATE_MIXED_PRECISIONenvironment variable was being set before the DeepSpeed config was processed, preventing it from overriding Accelerate’s defaults.Changes Made
1. Added DeepSpeed Config Override Logic
override_training_args_from_deepspeed()method toHfTrainerDeepSpeedConfigclass.fp16/bf16settings and overridesTrainingArgumentsdefaults accordingly.2. Fixed Initialization Order
TrainingArguments.__post_init__()to occur after DeepSpeed config processing.Behavior
The fix enforces the following precedence hierarchy:
E.g.,
fp16=Trueorbf16=Truepassed by user.E.g.,
"fp16": {"enabled": true}or"bf16": {"enabled": true}in config file.Test Plan
fp16config overrides default correctly.bf16config overrides default correctly.mainand verified fix still works.Files Modified
src/transformers/integrations/deepspeed.py– Added override logic and method call.src/transformers/training_args.py– Reordered mixed precision env var setup.Branch Info
fix-deepspeed-mixed-precision-precedence(rebased on latestmain)main