Skip to content

Fix train_batch_size and eval_batch_size to respect split_batches config#45694

Open
MinuriRajapakse wants to merge 1 commit intohuggingface:mainfrom
MinuriRajapakse:main
Open

Fix train_batch_size and eval_batch_size to respect split_batches config#45694
MinuriRajapakse wants to merge 1 commit intohuggingface:mainfrom
MinuriRajapakse:main

Conversation

@MinuriRajapakse
Copy link
Copy Markdown

Fixes #45693

Problem

When split_batches=True is set in accelerator_config, the
train_batch_size and eval_batch_size properties were still
multiplying per_device_batch_size by n_gpu, which is incorrect.

When split_batches=True, the batch is split across devices rather
than replicated, so the total batch size equals per_device_batch_size
directly.

Fix

Added a check for split_batches in both train_batch_size and
eval_batch_size properties in TrainingArguments.

Testing

  • Added a new test test_batch_size_respects_split_batches
  • All 26 existing + new tests pass

@Rocketknight1
Copy link
Copy Markdown
Member

cc @SunMarc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Why the calculation of train_batch_size unrelated to split_batches

2 participants