-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-45860: [C++] Respect CPU affinity in cpu_count and ThreadPool default capacity #47152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
27e794e to
78b2ccb
Compare
cpp/src/arrow/util/cpu_info.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved from thread_pool.cc to standardize the detection of number of CPUs.
cpp/src/arrow/util/cpu_info.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check was and is unused.
|
Just a quick comment here (I'm about to leave on holiday): given that we would like to get rid of |
78b2ccb to
4da1383
Compare
|
Ha I understood from the previous discussion that this would remain the interface wrapping whichever underlying library. |
|
It might indeed, but I'm not sure anything is settled in that regard. Affinity is also not really a CPU property. |
9625f9c to
44e198d
Compare
44e198d to
668895f
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @AntoinePrv , as I was on vacation.
Changes look good in principle, you'll find a set of minor comments and suggestions below.
|
I have checked locally that it works (Ubuntu 24.04): $ python -c "import pyarrow as pa; print(pa.cpu_count())"
24
$ taskset -c 5,6,7 python -c "import pyarrow as pa; print(pa.cpu_count())"
3
$ OMP_NUM_THREADS=9 taskset -c 5,6,7 python -c "import pyarrow as pa; print(pa.cpu_count())"
9
$ OMP_THREAD_LIMIT=9 taskset -c 5,6,7 python -c "import pyarrow as pa; print(pa.cpu_count())"
3
$ OMP_THREAD_LIMIT=2 taskset -c 5,6,7 python -c "import pyarrow as pa; print(pa.cpu_count())"
2 |
|
@AntoinePrv Please don't mention usernames in the PR description, as they will end up in the final commit message and trigger annoying notifications later on. |
2dbf341 to
d0f31b3
Compare
d0f31b3 to
651d8f2
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks for the update @AntoinePrv !
|
CI failure is unrelated. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 8509ca4. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
Rationale for this change
We want the ThreadPool default capacity to follow the CPU affinity set by the user, if any.
For example:
What changes are included in this PR?
arrow/io_util.h; on non-Linux platform, it returnsStatus::NotImplemented(note: based on original changes by Zihan Qi in PR #46034)
Are these changes tested?
By unit tests on CI, and by hand locally.
Are there any user-facing changes?
ThreadPool capacity now follows CPU affinity settings on Linux.