I have a question about the below minimal working example from the DeepSpeed README.
for step, batch in enumerate(data_loader):
#forward() method
loss = model_engine(batch)
#runs backpropagation
model_engine.backward(loss)
#weight update
model_engine.step()
Suppose that deepspeed.initialize() was solely called to wrap around my model, and my client script already creates a data_loader. What batch size should be used when instantiating my data_loader for the above loop to work fine?
If there was no DeepSpeed, I'd initialize my data_loader as follows in the client script:
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) if args.distributed else None
train_loader = DataLoader(train_dataset, sampler=train_sampler, batch_size=args.train_batch_size, shuffle=(not args.distributed))
where args.train_batch_size above corresponds to DeepSpeed's train_micro_batch_size_per_gpu , i.e., the batch size for each GPU for each step, independent of gradient accumulation steps.
Should I continue to use the micro batch size when instantiating my data_loader in the client script? The documentation is not clear on this.
I have a question about the below minimal working example from the DeepSpeed README.
Suppose that
deepspeed.initialize()was solely called to wrap around my model, and my client script already creates adata_loader. What batch size should be used when instantiating mydata_loaderfor the above loop to work fine?If there was no DeepSpeed, I'd initialize my
data_loaderas follows in the client script:where
args.train_batch_sizeabove corresponds to DeepSpeed'strain_micro_batch_size_per_gpu, i.e., the batch size for each GPU for each step, independent of gradient accumulation steps.Should I continue to use the micro batch size when instantiating my
data_loaderin the client script? The documentation is not clear on this.