Skip to content

better grad acc tests#45434

Merged
SunMarc merged 3 commits intomainfrom
update-gradient-acc-tests
Apr 15, 2026
Merged

better grad acc tests#45434
SunMarc merged 3 commits intomainfrom
update-gradient-acc-tests

Conversation

@SunMarc
Copy link
Copy Markdown
Member

@SunMarc SunMarc commented Apr 14, 2026

What does this PR do?

This PR updates the grad acc tests so that we are able to better catch issues. We check the grad norm also in addition to the loss. We are remove some redundant tests. As promised @vasqu. Just read the new test directly, don't bother checking the old tests

@SunMarc SunMarc requested a review from vasqu April 14, 2026 15:17
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thank you, just thinking if we should have a 4th case with proper check against different gas bsz x step (4:2 vs 8:1 which is tested for both)


def test_gradient_accumulation_grad_norm_consistency(self):
trainer = Trainer(model=RegressionModel(), args=args, train_dataset=RegressionDataset())
self.assertEqual(trainer.accelerator.gradient_accumulation_steps, 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add a reference to the issue imo

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +251 to +252
loss_tolerance=0.1,
model_accepts_loss_kwargs=False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense with a tighter tolerance when we accept loss kwargs as last test case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do have a tigher tolerance here already test_gradient_accumulation_grad_norm_with_num_items_in_batch

@SunMarc SunMarc enabled auto-merge April 15, 2026 13:04
@SunMarc SunMarc added this pull request to the merge queue Apr 15, 2026
Merged via the queue into main with commit 075218d Apr 15, 2026
18 checks passed
@SunMarc SunMarc deleted the update-gradient-acc-tests branch April 15, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants