-
-
Notifications
You must be signed in to change notification settings - Fork 748
Make AMM memory measure configurable #7062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| with warnings.catch_warnings(): | ||
| warnings.simplefilter("ignore") | ||
| b = (a @ a.T).sum().round(3) | ||
| assert await c.compute(b) == 245.394 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, (20, 20) would take forever or hang on my PC with Worker=Worker, but took as little as 5s with Worker=Nanny. On CI, Worker=Nanny hangs because of #5371.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 5h 59m 36s ⏱️ - 7m 26s For more details on these failures, see this check. Results for commit d8c9586. ± Comparison against base commit b40c03d. ♻️ This comment has been updated with latest results. |
6f87b5b to
1955f92
Compare
6662655 to
d8c9586
Compare
gjoseph92
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, looking forward to having all these tests running. I assume the only blocker is #7065?
| x = c.submit(lambda: 123, key="x", workers=[w1.address]) | ||
| await wait(x) | ||
| # Fill w2 with dummy data so that it's got the highest memory usage | ||
| clutter = await c.scatter(456, workers=[w2.address]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| clutter = await c.scatter(456, workers=[w2.address]) | |
| clutter = await c.scatter("c" * 10, workers=[w2.address]) |
Small note, I would expect 123 and 456 to have the same memory usage. So the comment above is slightly misleading. Something like this would make it definitively larger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w2 has got the highest memory usage among the workers that aren't being retired, meaning w2 and w3. I updated the comment to clarify.
Allow the Active Memory Manager to use a measure other than optimistic memory (managed + unmanaged that appeared more than 30s ago) in its heuristics.
This is particularly useful on MacOSX, where memory deallocation is not as responsive as on Windows or Linux, and on Linux when allocators other than malloc are being used.
This also allows to write the AMM unit tests using Worker instead of Nanny and make them much more robust and fast.
This PR finally enables all AMM stress tests in CI.
Stress test evidence: https://github.com/crusaderky/distributed/actions/runs/3114670981/jobs/5050785452#step:18:1674
There was one failure which I don't believe to be attributed to AMM. Follow-up: #7063