Skip to content

Conversation

@MuteBardTison
Copy link
Contributor

@MuteBardTison MuteBardTison commented Apr 4, 2025

Rationale for this change

Arrow’s current CPU thread count detection uses std::thread::hardware_concurrency() which does not take into account the process-level CPU affinity mask (e.g., set via taskset). This can lead to thread oversubscription and performance issues when Arrow runs in constrained environments.

This PR improves Arrow's behavior on Linux by making both CpuInfo::num_cores() and ThreadPool::DefaultCapacity() aware of CPU affinity.

What changes are included in this PR?

  • Added affinity.h to expose GetAffinityCpuCount() on Linux
  • Updated CpuInfo::Impl in cpu_info.cc to use GetAffinityCpuCount() instead of raw std::thread::hardware_concurrency()
  • Updated ThreadPool::DefaultCapacity() to use affinity-aware core count as the fallback when no environment variables are set
  • Updated the corresponding test (TestGlobalThreadPool.Capacity) to match the new behavior across platforms
  • Added a Linux-only unit test (CpuInfoTest.CpuAffinity) to validate that CpuInfo reflects sched_getaffinity()
  • Used #ifdef __linux__ to ensure cross-platform compatibility

Are these changes tested?

  • CpuInfoTest.CpuAffinity checks the affinity-aware logic against sched_getaffinity() on Linux.
  • TestGlobalThreadPool.Capacity was updated to test the fallback behavior and env var overrides under the new logic.

Are there any user-facing changes?

No

@github-actions
Copy link

github-actions bot commented Apr 4, 2025

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@MuteBardTison MuteBardTison changed the title ARROW-45860: Make cpu_count respect Linux CPU affinity GH-45860: [C++] Make cpu_count respect Linux CPU affinity Apr 4, 2025
@github-actions
Copy link

github-actions bot commented Apr 4, 2025

⚠️ GitHub issue #45860 has been automatically assigned in GitHub to PR creator.

@MuteBardTison MuteBardTison changed the title GH-45860: [C++] Make cpu_count respect Linux CPU affinity GH-45860: [C++] Respect CPU affinity in cpu_count and ThreadPool default capacity Apr 4, 2025
@raulcd raulcd requested review from cyb70289 and pitrou April 7, 2025 07:58
Comment on lines +15 to +23
inline int GetAffinityCpuCount() {
#ifdef __linux__
cpu_set_t mask;
if (sched_getaffinity(0, sizeof(mask), &mask) == 0) {
return CPU_COUNT(&mask);
}
#endif
return std::thread::hardware_concurrency();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving this to cpp/src/arrow/util/cpu_info.cc?

OsRetrieveCpuInfo(&hardware_flags, &vendor, &model_name);
original_hardware_flags = hardware_flags;
num_cores = std::max(static_cast<int>(std::thread::hardware_concurrency()), 1);
num_cores = std::max(GetAffinityCpuCount(), 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about not changing this and adding a new method (num_affinity_cores()?) instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Just to double-ceck: do we want ThreadPool::DefaultCapacity() to continue using hardware_concurrency() even on Linux, or would it make sense to prefer num_affinity_cores() there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion:

  • CpuInfo::num_cores() uses hardware_concurrency()
  • ThreadPool::DefaultCapacity() uses CpuInfo::num_affinity_cores() not CpuInfo::num_cores()

@pitrou @cyb70289 Do you have any opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. num_cores() as hardware limit. num_affinity_cores() as actual resource available.

int expected_capacity = hw_capacity;
#endif

ASSERT_EQ(ThreadPool::DefaultCapacity(), expected_capacity);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this?

Suggested change
ASSERT_EQ(ThreadPool::DefaultCapacity(), expected_capacity);
ASSERT_LT(ThreadPool::DefaultCapacity(), hw_capacity);

#include <sched.h>
#endif

TEST(CpuInfoTest, CpuAffinity) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks this test will never fail? But I don't have better suggestion :)

Comment on lines +1064 to +1068
#ifdef __linux__
ASSERT_EQ(ThreadPool::DefaultCapacity(), std::min(999, arrow::internal::GetAffinityCpuCount()));
#else
ASSERT_EQ(ThreadPool::DefaultCapacity(), std::min(999, hw_capacity));
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify it? Leave only line 1065, and remove hw_capacity?

@pitrou
Copy link
Member

pitrou commented Apr 8, 2025 via email

@kou
Copy link
Member

kou commented Apr 9, 2025

We generally put various platform abstractions in arrow/util/io_util.h.

Oh, we have GetThreadId(), GetCurrentRSS(), ... in arrow/util/io_util.h... I feel that it's strange because it's not related to IO.

Also, in the future we ideally want to replace CpuInfo with a third-party library, which is easier if we don't add more functionality to it.

I think that we can wrap a third-party library and implement missing features by ourselves (or mix multiple third-party libraries).

@pitrou
Copy link
Member

pitrou commented Apr 16, 2025

I think that we can wrap a third-party library and implement missing features by ourselves (or mix multiple third-party libraries).

That's what we did originally for CpuInfo and it means we can't easily backport improvements from upstream. So I'd rather vendor the third-party library as is and implement other functionality in other headers.

@kou
Copy link
Member

kou commented Apr 22, 2025

I think that we can wrap a third-party library and implement missing features by ourselves (or mix multiple third-party libraries).

That's what we did originally for CpuInfo and it means we can't easily backport improvements from upstream. So I'd rather vendor the third-party library as is and implement other functionality in other headers.

You mean #785 that imports CpuInfo from parquet-cpp, right?

My idea is "wrap" not "import". For example, if we use https://github.com/google/cpu_features and https://github.com/anrieff/libcpuid , our CpuInfo will look like the following:

#include <cpuinfo_x86.h>
#include <libcpuid.h>

struct CpuInfo::Impl {
  Impl() {
    if (cpu_features::GetX86Info().features.avx) {
      hardware_flags |= CpuInfo::AVX;
    }

    struct cpu_raw_data_t raw; 
    struct cpu_id_t data;
    cpuid_get_raw_data(&raw);
    cpu_identify(&raw, &data)
    num_cores = data.num_cores;
  }
};

@pitrou
Copy link
Member

pitrou commented Apr 22, 2025

My idea is "wrap" not "import". For example, if we use https://github.com/google/cpu_features and https://github.com/anrieff/libcpuid , our CpuInfo will look like the following:

Ah, I see. Yes, I agree with that.

pitrou pushed a commit that referenced this pull request Aug 20, 2025
…ult capacity (#47152)

### Rationale for this change
We want the ThreadPool default capacity to follow the CPU affinity set by the user, if any.
For example:
```console
$ python -c "import pyarrow as pa; print(pa.cpu_count())"
24
$ taskset -c 5,6,7 python -c "import pyarrow as pa; print(pa.cpu_count())"
3
```

### What changes are included in this PR?
- Implement and expose CPU affinity detection as a utility function in `arrow/io_util.h`; on non-Linux platform, it returns `Status::NotImplemented`
- Use CPU affinity count, if available, to choose the default ThreadPool capacity

(note: based on original changes by Zihan Qi in PR #46034)

### Are these changes tested?
By unit tests on CI, and by hand locally.

### Are there any user-facing changes?
ThreadPool capacity now follows CPU affinity settings on Linux.

* GitHub Issue: #45860

Lead-authored-by: AntoinePrv <AntoinePrv@users.noreply.github.com>
Co-authored-by: Zihan Qi <zihan.qi@tum.de>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants