Do not include psutil memory_info in GC duration measurement #7127

fjetter · 2022-10-07T14:49:04Z

The test test_tick_interval recently started flaking. It was accompanied by multiple warnings of the GC duration claiming that GC took too long.
While this can certainly be true, I realized that the GC callback is also inspecting process memory. I believe the process memory check actually can be slow-ish and I believe we should not include this in the measurement of the gc callback.

In fact, I'm wondering if this should even be there in the first place. I also wonder if we should implement a similar logic for get_process_memory as we did for gc.collect that ensures it doesn't run too frequently.

cc @crusaderky @gjoseph92

fjetter · 2022-10-07T15:00:37Z

distributed/utils_perf.py

+            # measurement
+            self._fractional_timer.stop_timing()
        if self._proc is not None:
            rss = self._proc.memory_info().rss


At the very least if gc runs frequently, this could be expensive. I'm wondering if we want to add yet another timer for this specifically

github-actions · 2022-10-07T16:03:01Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±0       15 suites ±0 6h 45m 55s ⏱️ + 25m 25s
  3 146 tests ±0   3 060 ✔️ +3   85 💤 - 1 1 ❌ - 2
23 272 runs ±0 22 355 ✔️ +1 916 💤 +1 1 ❌ - 2

For more details on these failures, see this check.

Results for commit 22b27e2. ± Comparison against base commit 07e2259.

crusaderky · 2022-10-07T18:35:21Z

psutil.Process.memory_info() takes 7.76 µs ± 47.4 ns on my Linux box. Where did you observe it being slow?

fjetter · 2022-10-10T08:55:10Z

Where did you observe it being slow?

I can't tell if this specific call is slow or not. I remember interactions with processes via psutil to be slow occasionally but can't point you to an example.

Where I noticed the GC warning was in #7075
https://github.com/dask/distributed/actions/runs/3184216360/jobs/5192365157

which times out after 30s. I assume the test logic is not rock solid but I also see the warnings below

2022-10-04 17:56:04,506 - distributed.utils_perf - WARNING - full garbage collections took 49% CPU time recently (threshold: 10%)
/usr/share/miniconda3/envs/dask-distributed/lib/python3.9/asyncio/base_events.py:647: in run_until_complete
2022-10-04 17:56:05,376 - distributed.utils_perf - WARNING - full garbage collections took 47% CPU time recently (threshold: 10%)
    return future.result()
2022-10-04 17:56:06,295 - distributed.utils_perf - WARNING - full garbage collections took 45% CPU time recently (threshold: 10%)

In this test, nothing is happening really. Specifically we're not accumulating any objects that should clog GC (of course, we may be dealing with leaks from another test) so I was wondering how we're measuring this in the first place...

I'm also surprised that we're running collection three times within two seconds even if this fraction is high (Considering that during runtime we're only running ThrottledGC). I assume this connects to a couple of explicit gc.collect calls we have in the test suite.

Following that logic, maybe the cause for these warnings is not the callback but rather one of the following two loops

In gen_cluster

distributed/distributed/utils_test.py

Lines 1081 to 1085 in 07e2259

    
           while time() < start + 60: 
        
               gc.collect() 
        
               if not get_unclosed(): 
        
                   break 
        
               await asyncio.sleep(0.05)

In pytest_resourceleaks

distributed/distributed/pytest_resourceleaks.py

Lines 399 to 404 in 07e2259

    
           while leaks and time() < deadline: 
        
               sleep(0.1) 
        
               self.cleanup() 
        
               for c, _, _ in leaks: 
        
                   c.on_retry() 
        
               leaks = run_measurements()

Do not include psutil memory_info in GC duration warning

22b27e2

fjetter changed the title ~~Do not include psutil memory_info in GC duration warning~~ Do not include psutil memory_info in GC duration measurement Oct 7, 2022

fjetter commented Oct 7, 2022

View reviewed changes

crusaderky self-requested a review October 7, 2022 18:35

crusaderky assigned fjetter Oct 7, 2022

fjetter closed this Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Do not include psutil memory_info in GC duration measurement #7127

Do not include psutil memory_info in GC duration measurement #7127

Uh oh!

fjetter commented Oct 7, 2022

Uh oh!

fjetter Oct 7, 2022

Uh oh!

github-actions bot commented Oct 7, 2022

Uh oh!

crusaderky commented Oct 7, 2022

Uh oh!

fjetter commented Oct 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Do not include psutil memory_info in GC duration measurement #7127

Do not include psutil memory_info in GC duration measurement #7127

Uh oh!

Conversation

fjetter commented Oct 7, 2022

Uh oh!

fjetter Oct 7, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 7, 2022

Unit Test Results

Uh oh!

crusaderky commented Oct 7, 2022

Uh oh!

fjetter commented Oct 10, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants