Fix cuda_bindings conjugate_gradient_multi_block_cg.py example#1922
Fix cuda_bindings conjugate_gradient_multi_block_cg.py example#1922leofang merged 2 commits intoNVIDIA:mainfrom
Conversation
Remove the unconditional NVRTC waiver from the renamed example so CI can exercise its real execution path again. While re-enabling it, replace the standalone pytest.skip() checks with requirement_not_met() and simplify the platform gating. The waived code path had been hiding several Python-side runtime bugs that required these fixes: - replaced the invalid C-style %d f-string formatting - fixed gen_tridiag() variable shadowing so the CSR row-offset array is actually populated - passed the computed dynamic shared-memory size into cuLaunchCooperativeKernel() and made that size integer-valued - stopped overwriting managed-memory pointer variables with loop indices before kernel launch and cleanup - cached the residual before freeing dot_result, which removed the teardown segfault Made-with: Cursor
QNX is an operating system rather than a machine architecture, so checking platform.machine() can miss the requirement_not_met() path on QNX hosts. Use platform.system() so the example is waived consistently on that platform. Made-with: Cursor
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
This comment has been minimized.
This comment has been minimized.
|
Cursor-generated analysis of CI logs for commit be43e41:
I unpacked and inspected the combined CI logs locally under For completeness, these are the log archives the combined directory came from: I then scanned all Summary
Skip reasonAll 16 skips appear to be consistent with the same grouped pytest summary reason: I did not see any skips for this example caused by its own runtime gates At a high level, this looks as expected:
Passed in
Skipped in
|
mdboom
left a comment
There was a problem hiding this comment.
LGTM.
The pointer arithmetic in this example should probably be removed someday, but that was pre-existing.
|
…A#1922) * fix: re-enable and fix the conjugate gradient multi-block example Remove the unconditional NVRTC waiver from the renamed example so CI can exercise its real execution path again. While re-enabling it, replace the standalone pytest.skip() checks with requirement_not_met() and simplify the platform gating. The waived code path had been hiding several Python-side runtime bugs that required these fixes: - replaced the invalid C-style %d f-string formatting - fixed gen_tridiag() variable shadowing so the CSR row-offset array is actually populated - passed the computed dynamic shared-memory size into cuLaunchCooperativeKernel() and made that size integer-valued - stopped overwriting managed-memory pointer variables with loop indices before kernel launch and cleanup - cached the residual before freeing dot_result, which removed the teardown segfault Made-with: Cursor * fix: check the QNX example gate via platform.system() QNX is an operating system rather than a machine architecture, so checking platform.machine() can miss the requirement_not_met() path on QNX hosts. Use platform.system() so the example is waived consistently on that platform. Made-with: Cursor
This PR started out as a humble attempt to eliminate the only remaining
pytest.skip()in our examples (similar to #1861), but it turned into a bug-fix pass once the example actually began running.cuda_bindings/examples/4_CUDA_Libraries/conjugate_gradient_multi_block_cg.pyby removing its unconditional NVRTC waiver (which dates all the way back to the initial cuda-python commit 4a36f83) and replacing standalonepytest.skip()calls withrequirement_not_met()%dformatting inside an f-stringgen_tridiag()variable shadowing that broke CSR constructiondot_result, which caused a teardown segfaultcuda_bindings/examples/3_CUDA_Features/global_to_shmem_async_copy.pyby checkingplatform.system() == "QNX"instead ofplatform.machine() == "qnx"— QNX is an operating system rather than a machine architecture, sochecking platform.machine() == "qnx"does not detect QNX hosts.