Which batch size to use with DataLoader

I have a question about the below minimal working example from the DeepSpeed README.

```
for step, batch in enumerate(data_loader):
    #forward() method
    loss = model_engine(batch)

    #runs backpropagation
    model_engine.backward(loss)

    #weight update
    model_engine.step()
```

Suppose that `deepspeed.initialize()` was solely called to wrap around my model, and my client script already creates a `data_loader`. What batch size should be used when instantiating my `data_loader` for the above loop to work fine?

If there was no DeepSpeed, I'd initialize my `data_loader` as follows in the client script:
```
    train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) if args.distributed else None
    train_loader = DataLoader(train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, shuffle=(not args.distributed))
```
where `args.train_batch_size` above corresponds to DeepSpeed's `train_micro_batch_size_per_gpu `, i.e., the batch size for each GPU for each step, independent of gradient accumulation steps.

Should I continue to use the micro batch size when instantiating my `data_loader` in the client script? The documentation is not clear on this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which batch size to use with DataLoader #152

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Which batch size to use with DataLoader #152

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions