Skip to content

reclaiming memory for inference #897

@stas00

Description

@stas00

While #896 solves the leak problem, ideally we should also have a new method to free all optimizer/scheduler related parts to pave wave for inference. In some environments like google colab general RAM is very scarce so every bit counts.

Here is one way to approach this:

engine, optimizer, scheduler = deepspeed.initialize(...)
# do the training and 
# then before inference do:
engine.free_optimizer_and_scheduler()
optimizer = None
scheduler = None
# it's then user's responsibility to make sure they have no remaining references to optimizer/scheduler objects for them to be freed.

with a new deepspeed method:

def free_optimizer_and_scheduler(self):
    self.lr_scheduler.optimizer = None
    self.optimizer.optimizer = None
    self.lr_scheduler = None
    self.optimizer = None

That way after training is done a lion part of the general RAM used by deepspeed is reclaimed. There are probably other bits to manually clean to reclaim even more.

Let me know if it sounds good to you and I will make another PR with this feature. We can in the future extend it if need be to support other things to benefit inference.

Thank you.

@jeffra, @RezaYazdaniAminabadi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions