-
-
Notifications
You must be signed in to change notification settings - Fork 748
[DNM] Reduce memory footprint of P2P shuffle #8128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
phofl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to our benchmarks, this actually made it worse, e.g. more likely to fail.
Sorry, meant to put this on draft while working on the other half of this. |
| return pa.concat_tables( | ||
| (deserialize_table(buffer) for buffer in data), promote=True | ||
| ) | ||
| ).combine_chunks() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one isn’t necessary, only the one above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I'm done, it hopefully will :)
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 11 files - 10 11 suites - 10 2h 14m 40s ⏱️ - 8h 59m 25s For more details on these failures and errors, see this check. Results for commit aa2711f. ± Comparison against base commit 03ea2e1. This pull request removes 360 and adds 2 tests. Note that renamed tests count towards both.This pull request removes 13 skipped tests and adds 2 skipped tests. Note that renamed tests count towards both.This pull request skips 148 tests.♻️ This comment has been updated with latest results. |
|
Superseded by #8157 |
Partially addresses #8015.
This PR reduces the final footprint, but it causes the memory spike worse than before during the unpack phase of the shuffle. This makes it more likely for workers to OOM.
cc @phofl
pre-commit run --all-files