fix dataloader len for tqdm and others#168
fix dataloader len for tqdm and others#168tomekrut wants to merge 2 commits intodeepspeedai:masterfrom
Conversation
|
Just few more comments to the above fix: |
|
@tomekrut Thanks for contributing to DeepSpeed. Apologies for the delay in getting back to you. We need time to investigate and fully understand the underlying issue that you have found. Yes, you found a subtle bug in DeepSpeed, and thanks so much for that. However, this PR does not fully address the underlying issue; it fixes one scenario but breaks others as I will explain below. First I will provide some context to the origin of DeepSpeedDataLoader. The goal is for it to be wrapper around torch DataLoader to enable us conveniently add data loading optimizations in the future. As such we strive to maintain similar semantics as torch DataLoader. I observe that torch DataLoader supports two modes: batching, which returns batched samples, and non-batching, which returns individual samples. This PR fixes the bug for batching mode but introduces a new bug for non-batching mode. It seems to me that more extensive code changes and unit tests are required to support both modes and address other issues in DeepSpeedDataLoader. I have now created #176 to track this bug. |
|
Closing because it is now replaced by #178 |
Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Shaden Smith <ShadenTSmith@gmail.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> Co-authored-by: Jeff Rasley <jerasley@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: eltonzheng <eltonz@microsoft.com> Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com> Co-authored-by: Shaden Smith <Shaden.Smith@microsoft.com> Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Cheng Li <pistasable@gmail.com> Co-authored-by: Reza Yazdani <44502768+RezaYazdaniAminabadi@users.noreply.github.com> Co-authored-by: Samyam Rajbhandari <samyamr@microsoft.com> Co-authored-by: eltonzheng <eltonz@microsoft.com>
I found a small issue see below - the fix created for it.
#167