CUDA: Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1#21181
Conversation
We wrongly calculated offset_grid as `ceildiv(nrows, block_size)`, while it must be `ceildiv(nrows + 1, block_size)`. As a consequence, we had uninitialized values in `offset_iterator[nrows]` for the case when `nrows % block_size == 0`. Fixes ggml-org#21162
|
Maybe I misunderstood, but didn't @fairydreaming say that it crashed with CUB disabled? |
Quoting from the issue
CUDA Toolkit (CTK) >= 11.0 always comes bundled with CUB, which was later on integrated into CCCL. So in CTK 12.8 we have CCCL's CUB components available (see here for the compile-time check which limits CUB availability to CTK >= 11.7). I see no way to disable this via cmake. Moreover, we correctly limit the support surface to bitonic sort in the case where CUB is unavailable in CTK 12.8 comes bundled with CUB 2.7, which does not yet have |
|
@ORippler My confusion may stem from the statement in the related PR:
Either way, if @fairydreaming can confirm this completely fixes the issue we're good. |
@CISC I confirm that it's all OK with this PR, no more crashes observed, tested up to |
…l-org#21181) * CUDA: Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 We wrongly calculated offset_grid as `ceildiv(nrows, block_size)`, while it must be `ceildiv(nrows + 1, block_size)`. As a consequence, we had uninitialized values in `offset_iterator[nrows]` for the case when `nrows % block_size == 0`. Fixes ggml-org#21162 * Reduce nrows in test case to 256, don't need 768
…l-org#21181) * CUDA: Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 We wrongly calculated offset_grid as `ceildiv(nrows, block_size)`, while it must be `ceildiv(nrows + 1, block_size)`. As a consequence, we had uninitialized values in `offset_iterator[nrows]` for the case when `nrows % block_size == 0`. Fixes ggml-org#21162 * Reduce nrows in test case to 256, don't need 768
…l-org#21181) * CUDA: Fix CUB's argsort when nrows % block_size == 0 CCCL < 3.1 We wrongly calculated offset_grid as `ceildiv(nrows, block_size)`, while it must be `ceildiv(nrows + 1, block_size)`. As a consequence, we had uninitialized values in `offset_iterator[nrows]` for the case when `nrows % block_size == 0`. Fixes ggml-org#21162 * Reduce nrows in test case to 256, don't need 768
Overview
We wrongly calculated offset_grid as
ceildiv(nrows, block_size), while it must beceildiv(nrows + 1, block_size). As a consequence, we had uninitialized values inoffset_iterator[nrows]for the case whennrows % block_size == 0.This bug affected CCCL < 3.1 only, as newer CCCL versions take the strided_iterator path
Additional information
Fixes #21162
Requirements