Support fancy iterators in cuda.parallel#2788
Conversation
…rators.py (with unit tests).
…el_itertools branch. The other branch is: https://github.com/rwgk/cccl/tree/cuda_parallel_itertools
…to see if numba can still compile it).
…t then fails with: Fatal Python error: Floating point exception
…resolves the Floating point exception (but the `cccl_device_reduce()` call still does not succeed)
LOOOK single_tile_kernel CALL /home/coder/cccl/c/parallel/src/reduce.cu:116 LOOOK EXCEPTION CUDA error: invalid argument /home/coder/cccl/c/parallel/src/reduce.cu:703
…rametrize: `use_numpy_array`: `[True, False]`, `input_generator`: `["constant", "counting", "arbitrary", "nested"]`
…iterators.py (because numba.cuda cannot JIT classes).
… `unary_op`, which is then compiled with `numba.cuda.compile()`
… the `"map_mul2"` test and the added `"map_add10_map_mul2"` test works, too.
…s_iterators branch.
…dInputIterator` `LOAD_CS`
…o make it obvious that they are never used as Python methods, but exclusively as source for `numba.cuda.compile()`
…plied ruff format to newly added code.
|
/ok to test |
🟩 CI finished in 39m 02s: Pass: 100%/3 | Total: 36m 23s | Avg: 12m 07s | Max: 27m 08s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
🏃 Runner counts (total jobs: 3)
| # | Runner |
|---|---|
| 2 | linux-amd64-gpu-v100-latest-1 |
| 1 | linux-amd64-cpu16 |
…from_any(value_type)` function.
|
|
||
|
|
||
| class ConstantIterator: | ||
| def __init__(self, val, ntype): |
There was a problem hiding this comment.
I think ntype->dtype would be better. The use of Numba should be an implementation detail from the user's perspective. Alternately, we could just accept a typed scalar like ConstantIterator(np.int32(0)).
Ditto for CountingIterator.
| def count_advance(this, diff): | ||
| this[0] += diff | ||
|
|
||
|
|
||
| def count_dereference(this): | ||
| return this[0] |
There was a problem hiding this comment.
Coming to think about it, it might be better to make these @staticmethod. After all:
$ python -c "import this" | grep Namespace
Namespaces are one honking great idea -- let's do more of those!
There was a problem hiding this comment.
Done in commit c3c51a5
I added comments:
# Exclusively for numba.cuda.compile (this is not an actual method).
My thinking:
Not adding a decorator (as we had originally), people will think it's a bound method, but wonder why self is called this.
Explicitly adding @staticmethod will make people believe it really is a static method, but that's not actually true.
Being explicit in the comment is only slightly more verbose than adding a decorator but much more informative.
There was a problem hiding this comment.
Explicitly adding @staticmethod will make people believe it really is a static method, but that's not actually true.
I don't think I understand. If anything, adding @staticmethod will make it even more obvious to the reader that the function is independent of the class. Typically functions that have no dependency on the class or its members, but are otherwise related to it are defined as @staticmethod.
There was a problem hiding this comment.
In other words, these are truly staticmethods in every sense
| return self.it.alignment # TODO fix for stateful op | ||
|
|
||
|
|
||
| def TransformIterator(op, it, op_return_ntype): |
There was a problem hiding this comment.
I don't think we should require the op_return_ntype here. Numba should in theory have everything it needs to infer the return type when compiling op.
cuda.compile returns both the LTOIR as well as the inferred return type, which we seem to be discarding in extract_ctypes_ltoirs.
Are we able to use the numba inferred return type and not require it from the user?
If not, it might be because numba doesn't have enough typing information. If that is the case, it will be fixed as part of #3064 by defining numba types corresponding to all of our Iteratortypes.
| return 3 * val | ||
|
|
||
|
|
||
| SUPPORTED_VALUE_TYPE_NAMES = ( |
There was a problem hiding this comment.
Why not just use numpy types, which are trivially convertible to numba types via numba.from_dtype(...)?
| @pytest.mark.parametrize( | ||
| "type_obj_from_str", [_iterators.numba_type_from_any, numpy.dtype, cp.dtype] | ||
| ) | ||
| @pytest.mark.parametrize("value_type_name", SUPPORTED_VALUE_TYPE_NAMES) |
There was a problem hiding this comment.
In general, we have found parametrized fixtures to be the better choice when sharing parameters across tests, especially as the codebase evolves:
There was a problem hiding this comment.
Done in commit 6aeeff3
Nice. I didn't realize fixtures can be used in this way.
| import numba.cuda | ||
| import numba.types | ||
| import cuda.parallel.experimental as cudax | ||
| from cuda.parallel.experimental import _iterators |
There was a problem hiding this comment.
If we need something from a non-public submodule in the tests, then it's possible that:
- it should go in a public API
- we don't really need it
For instance, we are using _iterators.pointer() to construct inputs for one of our tests. This suggests that pointer() should be a public API (OR we are testing something that we don't expect users to ever do).
There was a problem hiding this comment.
I think it might be a leftover. _iterators.pointer() is an implementation detail for transform iterator (glue layer to make it support containers). I would suggest to avoid testing reduce with pointer directly and test only reduction of transformed cp.array.
There was a problem hiding this comment.
Summary of short offline discussion: Maybe in a follow-on PR:
TransformIterator(identity_op, cupy_array, op_return_value_type)
This way we'd still have a test targeted at RawPointer, but through a public API.
| ) | ||
|
|
||
|
|
||
| def TransformIterator(op, it, op_return_value_type): |
There was a problem hiding this comment.
question: can we infer return type of the op(it.value_type) somehow? I'd prefer not having value type parameter on transform iterator if possible.
| def TransformIterator(op, it, op_return_value_type): | |
| def TransformIterator(op, it): |
| from . import _iterators | ||
|
|
||
|
|
||
| def CacheModifiedInputIterator(device_array, value_type, modifier): |
There was a problem hiding this comment.
question: can we infer value type from device_array? I'd prefer not having value_type parameter on this iterator is possible. Value type should match underlying memory's value type exactly.
There was a problem hiding this comment.
Yes, it should be just numba.from_dtype(device_array.dtype).
… functions back to class scope, with comments to explicitly state that these are not actual methods.
shwina
left a comment
There was a problem hiding this comment.
In an offline sync with @rwgk and @gevtushenko, we decided to merge this sooner than later, and follow up to address any remaining review items.
|
/ok to test |
🟩 CI finished in 1h 55m: Pass: 100%/3 | Total: 42m 55s | Avg: 14m 18s | Max: 30m 51s
|
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
Modifications in project or dependencies?
| Project | |
|---|---|
| CCCL Infrastructure | |
| libcu++ | |
| CUB | |
| Thrust | |
| CUDA Experimental | |
| +/- | python |
| +/- | CCCL C Parallel Library |
| Catch2Helper |
🏃 Runner counts (total jobs: 3)
| # | Runner |
|---|---|
| 2 | linux-amd64-gpu-v100-latest-1 |
| 1 | linux-amd64-cpu16 |
Description
closes #2479
closes #2480
closes #2536
Partially done: #2481