-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-45860: [C++] Respect CPU affinity in cpu_count and ThreadPool default capacity #46034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
176a6fd to
d90b17e
Compare
| inline int GetAffinityCpuCount() { | ||
| #ifdef __linux__ | ||
| cpu_set_t mask; | ||
| if (sched_getaffinity(0, sizeof(mask), &mask) == 0) { | ||
| return CPU_COUNT(&mask); | ||
| } | ||
| #endif | ||
| return std::thread::hardware_concurrency(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this to cpp/src/arrow/util/cpu_info.cc?
| OsRetrieveCpuInfo(&hardware_flags, &vendor, &model_name); | ||
| original_hardware_flags = hardware_flags; | ||
| num_cores = std::max(static_cast<int>(std::thread::hardware_concurrency()), 1); | ||
| num_cores = std::max(GetAffinityCpuCount(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about not changing this and adding a new method (num_affinity_cores()?) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Just to double-ceck: do we want ThreadPool::DefaultCapacity() to continue using hardware_concurrency() even on Linux, or would it make sense to prefer num_affinity_cores() there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. num_cores() as hardware limit. num_affinity_cores() as actual resource available.
| int expected_capacity = hw_capacity; | ||
| #endif | ||
|
|
||
| ASSERT_EQ(ThreadPool::DefaultCapacity(), expected_capacity); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this?
| ASSERT_EQ(ThreadPool::DefaultCapacity(), expected_capacity); | |
| ASSERT_LT(ThreadPool::DefaultCapacity(), hw_capacity); |
| #include <sched.h> | ||
| #endif | ||
|
|
||
| TEST(CpuInfoTest, CpuAffinity) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks this test will never fail? But I don't have better suggestion :)
| #ifdef __linux__ | ||
| ASSERT_EQ(ThreadPool::DefaultCapacity(), std::min(999, arrow::internal::GetAffinityCpuCount())); | ||
| #else | ||
| ASSERT_EQ(ThreadPool::DefaultCapacity(), std::min(999, hw_capacity)); | ||
| #endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify it? Leave only line 1065, and remove hw_capacity?
|
For the record, I don't think this should be in CpuInfo. We generally put various platform abstractions in arrow/util/io_util.h. Also, in the future we ideally want to replace CpuInfo with a third-party library, which is easier if we don't add more functionality to it.
Le 7 avril 2025 09:59:15 GMT+02:00, "Raúl Cumplido" ***@***.***> a écrit :
…
@raulcd requested your review on: apache/arrow#46034 GH-45860: [C++] Respect CPU affinity in cpu_count and ThreadPool default capacity.
--
Reply to this email directly or view it on GitHub:
#46034 (comment)
You are receiving this because your review was requested.
Message ID: ***@***.***>
|
Oh, we have
I think that we can wrap a third-party library and implement missing features by ourselves (or mix multiple third-party libraries). |
That's what we did originally for |
You mean #785 that imports My idea is "wrap" not "import". For example, if we use https://github.com/google/cpu_features and https://github.com/anrieff/libcpuid , our #include <cpuinfo_x86.h>
#include <libcpuid.h>
struct CpuInfo::Impl {
Impl() {
if (cpu_features::GetX86Info().features.avx) {
hardware_flags |= CpuInfo::AVX;
}
struct cpu_raw_data_t raw;
struct cpu_id_t data;
cpuid_get_raw_data(&raw);
cpu_identify(&raw, &data)
num_cores = data.num_cores;
}
}; |
Ah, I see. Yes, I agree with that. |
…ult capacity (#47152) ### Rationale for this change We want the ThreadPool default capacity to follow the CPU affinity set by the user, if any. For example: ```console $ python -c "import pyarrow as pa; print(pa.cpu_count())" 24 $ taskset -c 5,6,7 python -c "import pyarrow as pa; print(pa.cpu_count())" 3 ``` ### What changes are included in this PR? - Implement and expose CPU affinity detection as a utility function in `arrow/io_util.h`; on non-Linux platform, it returns `Status::NotImplemented` - Use CPU affinity count, if available, to choose the default ThreadPool capacity (note: based on original changes by Zihan Qi in PR #46034) ### Are these changes tested? By unit tests on CI, and by hand locally. ### Are there any user-facing changes? ThreadPool capacity now follows CPU affinity settings on Linux. * GitHub Issue: #45860 Lead-authored-by: AntoinePrv <AntoinePrv@users.noreply.github.com> Co-authored-by: Zihan Qi <zihan.qi@tum.de> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
Arrow’s current CPU thread count detection uses
std::thread::hardware_concurrency()which does not take into account the process-level CPU affinity mask (e.g., set viataskset). This can lead to thread oversubscription and performance issues when Arrow runs in constrained environments.This PR improves Arrow's behavior on Linux by making both
CpuInfo::num_cores()andThreadPool::DefaultCapacity()aware of CPU affinity.What changes are included in this PR?
affinity.hto exposeGetAffinityCpuCount()on LinuxCpuInfo::Implincpu_info.ccto useGetAffinityCpuCount()instead of rawstd::thread::hardware_concurrency()ThreadPool::DefaultCapacity()to use affinity-aware core count as the fallback when no environment variables are setTestGlobalThreadPool.Capacity) to match the new behavior across platformsCpuInfoTest.CpuAffinity) to validate thatCpuInforeflectssched_getaffinity()#ifdef __linux__to ensure cross-platform compatibilityAre these changes tested?
CpuInfoTest.CpuAffinitychecks the affinity-aware logic againstsched_getaffinity()on Linux.TestGlobalThreadPool.Capacitywas updated to test the fallback behavior and env var overrides under the new logic.Are there any user-facing changes?
No