Add configurable pool properties to PinnedMemoryResource#851
Add configurable pool properties to PinnedMemoryResource#851rapids-bot[bot] merged 21 commits intorapidsai:mainfrom
Conversation
| */ | ||
| struct PinnedPoolProperties { | ||
| std::size_t initial_pool_size = 0; ///< initial size of the pool. Initial size is | ||
| ///< important for pinned memory performance. |
There was a problem hiding this comment.
This comment seems contrary https://github.com/rapidsai/rapidsmpf/pull/851/changes#diff-eabfc19cbc8bce3fdbf90c6fa14736f3e52ae669bcd3822fa39b6c57d26a14d8R34-R35, which one is applicable now?
There was a problem hiding this comment.
I think, priming pools (ie. make some allocations up front and deacllocating) has little effect on device pools. But for pinned memory pools, initial allocation and warming up is important. I extended the comment to include this.
There was a problem hiding this comment.
What does "important for pinned memory performance" mean? How is it important?
| // Before <https://github.com/NVIDIA/cccl/pull/6718>, the default | ||
| // `release_threshold` was 0, which defeats the purpose of having a pool. We | ||
| // now set it so the pool never releases unused pinned memory. |
There was a problem hiding this comment.
This comment seems outdated now, no?
| // `release_threshold` was 0, which defeats the purpose of having a pool. We | ||
| // now set it so the pool never releases unused pinned memory. | ||
| .release_threshold = std::numeric_limits<size_t>::max(), | ||
| .release_threshold = props.max_pool_size > 0 ? props.max_pool_size |
There was a problem hiding this comment.
max_pool_size seems to imply that the pool cannot go beyond that limit. However, AFAIU, release_threshold is something different, meaning it will start releasing unused memory back to the driver once that limit is reached. Can you clarify what max_pool_size really means, and if necessary rename it to release_threshold or something more accurate?
There was a problem hiding this comment.
Thinking about this again, I feel like my change is wrong. I reverted this. Thank you for catching this.
| // It was observed that priming async pools have little effect for performance. | ||
| // See <https://github.com/rapidsai/rmm/issues/1931>. | ||
| .initial_pool_size = 0, | ||
| .initial_pool_size = props.initial_pool_size, | ||
| // Before <https://github.com/NVIDIA/cccl/pull/6718>, the default | ||
| // `release_threshold` was 0, which defeats the purpose of having a pool. We | ||
| // now set it so the pool never releases unused pinned memory. | ||
| .release_threshold = std::numeric_limits<size_t>::max(), | ||
| .release_threshold = props.max_pool_size > 0 ? props.max_pool_size |
There was a problem hiding this comment.
All comments above are applicable here.
| */ | ||
| struct PinnedPoolProperties { | ||
| std::size_t initial_pool_size = 0; ///< initial size of the pool. Initial size is | ||
| ///< important for pinned memory performance. |
There was a problem hiding this comment.
What does "important for pinned memory performance" mean? How is it important?
| // Create a PinnedMemoryResource with max pool size of 1MiB | ||
| auto pinned_mr = std::make_shared<rapidsmpf::PinnedMemoryResource>( | ||
| rapidsmpf::get_current_numa_node(), | ||
| rapidsmpf::PinnedPoolProperties{.initial_pool_size = 0, .max_pool_size = 1_MiB} | ||
| ); | ||
| auto stream = cudf::get_default_stream(); | ||
|
|
||
| void* ptr = pinned_mr->allocate(stream, 512_KiB); | ||
| EXPECT_NE(nullptr, ptr); | ||
| pinned_mr->deallocate(stream, ptr, 512_KiB); | ||
|
|
||
| // NOTE: currently cuda driver rounds up max size to 32MB. So we need to allocate 32MB | ||
| // + 1 byte. | ||
| EXPECT_THROW( | ||
| { | ||
| void* ptr2 = pinned_mr->allocate(stream, 32_MiB + 1); |
There was a problem hiding this comment.
Can we inspect the max size we get so that this test is robust?
| #if CCCL_MAJOR_VERSION > 3 || (CCCL_MAJOR_VERSION == 3 && CCCL_MINOR_VERSION >= 2) | ||
| cuda::memory_pool_properties get_memory_pool_properties() { | ||
| cuda::memory_pool_properties get_memory_pool_properties( |
There was a problem hiding this comment.
I note that we are now on CCCL 3.2.1, so we should be able to remove all of this macro-conditional stuff.
It's also no longer gated behind experimental, so I think we can avoid needing to turn on experimental mode in cccl (and can just get the version from RMM).
And, the headers now no longer require nvcc, so we can remove the pimpl idiom.
Before doing this refactoring, can we migrate to the CCCL >= 3.2 version of the memory pool resource. It's plausible that we can then delete huge swathes of this code anyway
There was a problem hiding this comment.
Yes! Let me send a separate PR for it.
…s_pinned_pool Signed-off-by: niranda perera <niranda.perera@gmail.com>
| /// Discover the actual pool size the driver creates when a small max is requested. | ||
| /// Creates a pool with \p requested_max_pool_size (e.g. 1 MiB), then uses recursive | ||
| /// doubling of allocation size until allocation fails; returns the last successful size. | ||
| std::size_t discover_pinned_pool_actual_size( | ||
| rmm::cuda_stream_view stream, std::size_t requested_max_pool_size = 1_MiB | ||
| ) { | ||
| rapidsmpf::PinnedMemoryResource pinned_mr{ | ||
| rapidsmpf::get_current_numa_node(), | ||
| rapidsmpf::PinnedPoolProperties{.max_pool_size = requested_max_pool_size} | ||
| }; | ||
| std::size_t try_size = requested_max_pool_size; | ||
| while (true) { | ||
| try { | ||
| void* ptr = pinned_mr.allocate(stream, try_size); | ||
| pinned_mr.deallocate(stream, ptr, try_size); | ||
| try_size *= 2; | ||
| } catch (cuda::cuda_error const&) { | ||
| break; | ||
| } | ||
| } | ||
| return std::max(try_size / 2, requested_max_pool_size); | ||
| } |
There was a problem hiding this comment.
This should probably do bisection search, in case the actual size is not a power of two.
Signed-off-by: niranda perera <niranda.perera@gmail.com>
madsbk
left a comment
There was a problem hiding this comment.
Update PinnedMemoryResource::from_options with a pinned_memory_initial_pool_size option (also update READSME.md).
| std::size_t initial_pool_size = 0; | ||
|
|
||
| /// @brief Maximum size of the pool. 0 means no limit. | ||
| std::size_t max_pool_size = 0; |
Signed-off-by: niranda perera <niranda.perera@gmail.com>
|
@nirandaperera, we should introduce an config option |
@madsbk done |
| .max_pool_size = options.get<std::optional<size_t>>( | ||
| "pinned_max_pool_size", [](auto const& s) { | ||
| return s.empty() ? std::nullopt | ||
| : std::optional<size_t>(parse_string<size_t>(s)); | ||
| } | ||
| ) |
There was a problem hiding this comment.
Use parse_optional() to handle the optional case before parsing it to parse_string<size_t> like we do here:
rapidsmpf/cpp/src/memory/buffer_resource.cpp
Line 266 in c48f3a8
| [](auto const& s) { return parse_string<size_t>(s.empty() ? "0" : s); } | ||
| ), | ||
| .max_pool_size = options.get<std::optional<size_t>>( | ||
| "pinned_max_pool_size", [](auto const& s) { |
There was a problem hiding this comment.
Docs the pinned_max_pool_size option:
rapidsmpf/docs/source/configuration.md
Line 33 in c48f3a8
| if (pinned_memory) { | ||
| PinnedPoolProperties pool_properties{ | ||
| .initial_pool_size = options.get<size_t>( | ||
| "pinned_initial_pool_size", |
There was a problem hiding this comment.
Docs the pinned_initial_pool_size option:
rapidsmpf/docs/source/configuration.md
Line 33 in c48f3a8
Co-authored-by: Mads R. B. Kristensen <madsbk@gmail.com>
…apidsmpf into enable_props_pinned_pool
| - **`pinned_initial_pool_size`** | ||
| - **Environment Variable**: `RAPIDSMPF_PINNED_INITIAL_POOL_SIZE` | ||
| - **Default**: `0` | ||
| - **Description**: Initial size (in bytes) of the pinned host memory pool when | ||
| `pinned_memory` is enabled. A value of `0` means the pool starts empty and grows | ||
| on demand. Accepts byte counts (e.g. `"1GiB"`, `"512MiB"`). | ||
|
|
||
| - **`pinned_max_pool_size`** | ||
| - **Environment Variable**: `RAPIDSMPF_PINNED_MAX_POOL_SIZE` | ||
| - **Default**: `"disabled"` | ||
| - **Description**: Maximum size (in bytes) of the pinned host memory pool when | ||
| `pinned_memory` is enabled. When unset or empty, the pool is allowed to grow | ||
| without an upper bound. Accepts byte counts (e.g. `"4GiB"`, `"2048MiB"`). | ||
|
|
There was a problem hiding this comment.
move up to the pinned_memory section
Removed pinned memory pool size parameters from documentation.
|
/merge |
@wence- I will do the requested changes in follow-up PR
This PR adds
PinnedPoolPropertiesstruct to allow configuration of pinned memory pool behavior viainitial_pool_sizeandmax_pool_sizeparameters.Changes
New
PinnedPoolPropertiesstruct with two configurable fields:initial_pool_size: Pre-allocates pinned memory for improved performancemax_pool_size: Limits maximum pool size (0 = unlimited)Updated
PinnedMemoryResourceconstructor to accept optionalPinnedPoolPropertiesparameter with backward-compatible defaultsInitial allocation benchmark
initial_pool_size. On my workstation (RTX A6000) it shows around 10x allocation performance when the pool is set an initial size.