Skip to content

Add debugging code to gpu_dynamic_shared to try to track down hard-to-repro bug on buildbots#5892

Closed
steven-johnson wants to merge 12 commits intomasterfrom
srj/cuda-debug
Closed

Add debugging code to gpu_dynamic_shared to try to track down hard-to-repro bug on buildbots#5892
steven-johnson wants to merge 12 commits intomasterfrom
srj/cuda-debug

Conversation

@steven-johnson
Copy link
Contributor

No description provided.

@steven-johnson
Copy link
Contributor Author

aha: upgrading the nvidia driver on my local linux box to a recent one (460.x) allows me to repro locally - the older driver I was using(440.x-ish) didn't fail. So it's driver-specific to some extent.

@steven-johnson steven-johnson added the skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately. label Apr 13, 2021
@steven-johnson
Copy link
Contributor Author

If I run with cuda-memcheck, I get:

========= Invalid shared write of size 4
========= at 0x00000210 in _kernel_g_s0_x_x___block_id_x
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x00001900 is out of bounds

@steven-johnson
Copy link
Contributor Author

Bisect is suggesting that #5757 is the injection spot; investigating

@steven-johnson
Copy link
Contributor Author

(1) The mystery hang seems to have resolved itself, or at least is no longer reproducing at recent top-of-tree.
(2) The bug with disable-llvm-opt enabled seems to be an unrelated red herring (this test fails with this flag enabled for as far back as I can find). See #5900 for a workaround there.

@steven-johnson steven-johnson deleted the srj/cuda-debug branch April 14, 2021 16:21
@alexreinking alexreinking modified the milestone: v12.0.0 May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip_buildbots Do not run buildbots on this PR. Must add before opening PR as we scan labels immediately.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants