While #896 solves the leak problem, ideally we should also have a new method to free all optimizer/scheduler related parts to pave wave for inference. In some environments like google colab general RAM is very scarce so every bit counts.
Here is one way to approach this:
engine, optimizer, scheduler = deepspeed.initialize(...)
# do the training and
# then before inference do:
engine.free_optimizer_and_scheduler()
optimizer = None
scheduler = None
# it's then user's responsibility to make sure they have no remaining references to optimizer/scheduler objects for them to be freed.
with a new deepspeed method:
def free_optimizer_and_scheduler(self):
self.lr_scheduler.optimizer = None
self.optimizer.optimizer = None
self.lr_scheduler = None
self.optimizer = None
That way after training is done a lion part of the general RAM used by deepspeed is reclaimed. There are probably other bits to manually clean to reclaim even more.
Let me know if it sounds good to you and I will make another PR with this feature. We can in the future extend it if need be to support other things to benefit inference.
Thank you.
@jeffra, @RezaYazdaniAminabadi
While #896 solves the leak problem, ideally we should also have a new method to free all optimizer/scheduler related parts to pave wave for inference. In some environments like google colab general RAM is very scarce so every bit counts.
Here is one way to approach this:
with a new deepspeed method:
That way after training is done a lion part of the general RAM used by deepspeed is reclaimed. There are probably other bits to manually clean to reclaim even more.
Let me know if it sounds good to you and I will make another PR with this feature. We can in the future extend it if need be to support other things to benefit inference.
Thank you.
@jeffra, @RezaYazdaniAminabadi