Make AMM memory measure configurable #7062

crusaderky · 2022-09-23T21:28:02Z

Closes Make AMM memory measure configurable #6577
Supersedes Remove @avoid_ci from stress tests #6271
Works around Stress test crashes with StreamClosedError #5371

Allow the Active Memory Manager to use a measure other than optimistic memory (managed + unmanaged that appeared more than 30s ago) in its heuristics.

This is particularly useful on MacOSX, where memory deallocation is not as responsive as on Windows or Linux, and on Linux when allocators other than malloc are being used.

This also allows to write the AMM unit tests using Worker instead of Nanny and make them much more robust and fast.

This PR finally enables all AMM stress tests in CI.

Stress test evidence: https://github.com/crusaderky/distributed/actions/runs/3114670981/jobs/5050785452#step:18:1674
There was one failure which I don't believe to be attributed to AMM. Follow-up: #7063

crusaderky · 2022-09-23T21:44:57Z

distributed/tests/test_active_memory_manager.py

+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        b = (a @ a.T).sum().round(3)
+    assert await c.compute(b) == 245.394


Interestingly, (20, 20) would take forever or hang on my PC with Worker=Worker, but took as little as 5s with Worker=Nanny. On CI, Worker=Nanny hangs because of #5371.

github-actions · 2022-09-23T22:23:44Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      15 files ±  0       15 suites ±0 5h 59m 36s ⏱️ - 7m 26s
  3 121 tests +  6   3 035 ✔️ +  7   85 💤 ±0 1 ❌ - 1
23 098 runs +42 22 190 ✔️ +44 907 💤 +1 1 ❌ - 3

For more details on these failures, see this check.

Results for commit d8c9586. ± Comparison against base commit b40c03d.

♻️ This comment has been updated with latest results.

gjoseph92

LGTM, looking forward to having all these tests running. I assume the only blocker is #7065?

gjoseph92 · 2022-09-27T21:44:03Z

distributed/tests/test_active_memory_manager.py

+    x = c.submit(lambda: 123, key="x", workers=[w1.address])
+    await wait(x)
+    # Fill w2 with dummy data so that it's got the highest memory usage
+    clutter = await c.scatter(456, workers=[w2.address])


Suggested change

clutter = await c.scatter(456, workers=[w2.address])

clutter = await c.scatter("c" * 10, workers=[w2.address])

Small note, I would expect 123 and 456 to have the same memory usage. So the comment above is slightly misleading. Something like this would make it definitively larger.

w2 has got the highest memory usage among the workers that aren't being retired, meaning w2 and w3. I updated the comment to clarify.

distributed/tests/test_active_memory_manager.py

crusaderky self-assigned this Sep 23, 2022

crusaderky mentioned this pull request Sep 23, 2022

decide_worker_rootish_queuing_disabled assertion fails when retiring worker #7063

Closed

crusaderky commented Sep 23, 2022

View reviewed changes

crusaderky marked this pull request as draft September 23, 2022 22:23

crusaderky force-pushed the AMM/measure branch from 6f87b5b to 1955f92 Compare September 23, 2022 22:33

crusaderky marked this pull request as ready for review September 23, 2022 23:14

This was referenced Sep 24, 2022

Fix decide_worker_rootish_queuing_disabled assert #7065

Merged

Remove @avoid_ci from stress tests #6271

Closed

AMM/measure

d8c9586

crusaderky force-pushed the AMM/measure branch from 6662655 to d8c9586 Compare September 26, 2022 16:07

hendrikmakait self-requested a review September 27, 2022 13:19

gjoseph92 reviewed Sep 27, 2022

View reviewed changes

crusaderky commented Sep 28, 2022

View reviewed changes

distributed/tests/test_active_memory_manager.py Show resolved Hide resolved

Update distributed/tests/test_active_memory_manager.py

e667a9b

crusaderky merged commit 162a7c0 into dask:main Sep 28, 2022

crusaderky deleted the AMM/measure branch September 28, 2022 11:56

gjoseph92 pushed a commit to gjoseph92/distributed that referenced this pull request Oct 31, 2022

Make AMM memory measure configurable (dask#7062)

591a880

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make AMM memory measure configurable #7062

Make AMM memory measure configurable #7062

Uh oh!

crusaderky commented Sep 23, 2022 •

edited

Loading

Uh oh!

crusaderky Sep 23, 2022

Uh oh!

github-actions bot commented Sep 23, 2022 •

edited

Loading

Uh oh!

gjoseph92 left a comment

Uh oh!

gjoseph92 Sep 27, 2022

Uh oh!

crusaderky Sep 28, 2022

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	clutter = await c.scatter(456, workers=[w2.address])
	clutter = await c.scatter("c" * 10, workers=[w2.address])

Uh oh!

Make AMM memory measure configurable #7062

Make AMM memory measure configurable #7062

Uh oh!

Conversation

crusaderky commented Sep 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusaderky Sep 23, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test Results

Uh oh!

gjoseph92 left a comment

Choose a reason for hiding this comment

Uh oh!

gjoseph92 Sep 27, 2022

Choose a reason for hiding this comment

Uh oh!

crusaderky Sep 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crusaderky commented Sep 23, 2022 •

edited

Loading

github-actions bot commented Sep 23, 2022 •

edited

Loading