Skip to content

Which batch size to use with DataLoader #152

@g-karthik

Description

@g-karthik

I have a question about the below minimal working example from the DeepSpeed README.

for step, batch in enumerate(data_loader):
    #forward() method
    loss = model_engine(batch)

    #runs backpropagation
    model_engine.backward(loss)

    #weight update
    model_engine.step()

Suppose that deepspeed.initialize() was solely called to wrap around my model, and my client script already creates a data_loader. What batch size should be used when instantiating my data_loader for the above loop to work fine?

If there was no DeepSpeed, I'd initialize my data_loader as follows in the client script:

    train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) if args.distributed else None
    train_loader = DataLoader(train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, shuffle=(not args.distributed))

where args.train_batch_size above corresponds to DeepSpeed's train_micro_batch_size_per_gpu , i.e., the batch size for each GPU for each step, independent of gradient accumulation steps.

Should I continue to use the micro batch size when instantiating my data_loader in the client script? The documentation is not clear on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions