Skip to content

Conversation

@bhashemian
Copy link
Member

Description

It adds a garbage collector handler that can be called at the end of an epoch or an iteration to make sure that garbages are collected and memory is freed. We are using it specifically in smartcache with whole slide images.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
@bhashemian bhashemian requested review from Nic-Ma and wyli April 4, 2021 23:44
bhashemian and others added 3 commits April 4, 2021 20:04
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Copy link
Contributor

@Nic-Ma Nic-Ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good to me.
But I am not very sure about the GC logic in python, maybe python will collect the garbage even we don't enable this handler?
@ericspod @wyli could you please help also review this PR?

Thanks.

@ericspod
Copy link
Member

ericspod commented Apr 5, 2021

I've noticed a few issues with Pytorch and garbage collecting. Python tensor objects have GPU memory allocated to them plus CPU memory in the C++ layer, while the garbage collector is only aware (I think) of the CPU memory the interpreter allocates so it doesn't know when these need collecting. The collector in Python tries to run only when it needs to because memory is running out, it has an incorrect read on what this is however. Almost always I would say that invoking collection is bad but this is the only way to do this now.

Pytorch also likes to keep cached buffersaround internally so collecting won't bring down allocated GPU memory as much as you'd expect often. You need torch.cuda.empty_cache() to do that but I would instead leave that up to Pytorch unless you're getting unexpected OOM exceptions.

One suggestion I'd make is to run gc.collect() twice in a row to get a few more objects cleaned up.

@bhashemian bhashemian enabled auto-merge (squash) April 5, 2021 14:10
Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
@wyli
Copy link
Contributor

wyli commented Apr 5, 2021

thanks @behxyz I'll make sure this is merged for v0.5. (I'm cancelling this CI for now as we'll merge the high priority ones first)

@wyli wyli mentioned this pull request Apr 5, 2021
3 tasks
@bhashemian bhashemian merged commit 713490b into Project-MONAI:master Apr 5, 2021
@wyli
Copy link
Contributor

wyli commented Apr 6, 2021

FYI came across this new error (perhaps the first time in 20+ runs)

======================================================================
FAIL: test_content_0 (tests.test_handler_garbage_collector.TestHandlerGarbageCollector)

----------------------------------------------------------------------

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/parameterized/parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "/__w/MONAI/MONAI/tests/test_handler_garbage_collector.py", line 73, in test_content
    self.assertLess(gb_count[0], first_count)
AssertionError: 44 not less than 38

----------------------------------------------------------------------

@bhashemian bhashemian deleted the garbage_handler branch April 12, 2021 20:22
nsrivathsa pushed a commit to nsrivathsa/MONAI that referenced this pull request Apr 12, 2021
* Implement garbage collector handler

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>

* Make trigger_event lower case

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>

* Add unittest for garbage collector

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>

* Update docs

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>

* Exclude from min test

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>

* Fix a typo

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>

* Fix a bug

Signed-off-by: Behrooz <3968947+behxyz@users.noreply.github.com>
Signed-off-by: Neha Srivathsa <nsrivathsa@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants