When using dynamic batching with lr_scaling_method="sqrt", training fails with:
"TypeError: sqrt(): argument 'input' (position 1) must be Tensor, not float"
The error originates from scale_lr() on line 159 in deepspeed/runtime/data_pipeline/data_sampling/variable_batch_size_and_lr.py:
return base_lr * torch.sqrt(batch_size / base_batch_size)
Here, batch_size and base_batch_size are Python integers, so batch_size / base_batch_size is a Python float. Passing this float to torch.sqrt() raises a TypeError. torch.sqrt should probably be replaced with math.sqrt here.
When using dynamic batching with
lr_scaling_method="sqrt", training fails with:"TypeError: sqrt(): argument 'input' (position 1) must be Tensor, not float"
The error originates from
scale_lr()on line 159 indeepspeed/runtime/data_pipeline/data_sampling/variable_batch_size_and_lr.py:Here,
batch_sizeandbase_batch_sizeare Python integers, sobatch_size / base_batch_sizeis a Python float. Passing this float totorch.sqrt()raises a TypeError.torch.sqrtshould probably be replaced withmath.sqrthere.