-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Garbage Collector Handler #1940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Nic-Ma
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
I've noticed a few issues with Pytorch and garbage collecting. Python tensor objects have GPU memory allocated to them plus CPU memory in the C++ layer, while the garbage collector is only aware (I think) of the CPU memory the interpreter allocates so it doesn't know when these need collecting. The collector in Python tries to run only when it needs to because memory is running out, it has an incorrect read on what this is however. Almost always I would say that invoking collection is bad but this is the only way to do this now. Pytorch also likes to keep cached buffersaround internally so collecting won't bring down allocated GPU memory as much as you'd expect often. You need One suggestion I'd make is to run |
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
|
thanks @behxyz I'll make sure this is merged for v0.5. (I'm cancelling this CI for now as we'll merge the high priority ones first) |
|
FYI came across this new error (perhaps the first time in 20+ runs) |
* Implement garbage collector handler Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com> * Make trigger_event lower case Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com> * Add unittest for garbage collector Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com> * Update docs Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com> * Exclude from min test Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com> * Fix a typo Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com> * Fix a bug Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com> Signed-off-by: Neha Srivathsa <nsrivathsa@nvidia.com>
Description
It adds a garbage collector handler that can be called at the end of an epoch or an iteration to make sure that garbages are collected and memory is freed. We are using it specifically in smartcache with whole slide images.
Status
Ready
Types of changes
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests.make htmlcommand in thedocs/folder.