Should not modify index in im2col kernel loop#427
Conversation
|
I think there'll be a problem with the data pointers too |
|
Ok, I have checked if there is any chance of anything dangerous happening (ie, the loop body running more than once) with the current launch configuration. Fortunately, even though the index is decreased in the loop, the value of However, if gridDim.x is changed (eg. halved to compute two elements in one kernel, which is what the grid-stride loop #225 is meant to make possible) then the loop will run an incorrect number of times (and write garbage data) or may never terminate. Here is a python script showing the problem nthreads = 1024
blockDim_x = nthreads
n = 6000
# ceil of n/nthreads
gridDim_x = ((n + nthreads - 1) / nthreads)
width_col = 2
def kernel(blockIdx_x = 1, threadIdx_x = 10):
index = blockIdx_x * blockDim_x + threadIdx_x;
while(index < n):
print 'index0', index
index /= width_col
print 'index1', index
# do stuff...
index += blockDim_x * gridDim_x
print 'index2', index
# With current launch config will run once
kernel()
# With different launch config will loop wrongly
gridDim_x /= 2
kernel()
# Or infinitely
width_col = 6
kernel() |
|
@shelhamer et al may want to check |
|
@jamt9000 thanks for investigating this and making an illustrative example! We'll be sure to review this and make changes post-NIPS deadline (or sooner, time permitting...). |
There was a problem hiding this comment.
Minor comment: maybe suppressing the above 4 lines to 2:
Dtype* data_col_ptr = data_col + (channel_out * height_col + h_out) * width_col + w_out;
const Dtype* data_im_ptr = data_im + (channel_in * height + h_in) * width + w_in;
|
I took a look and it looks great. Adding a unit test could be even better if you have time to write one :) |
|
Thanks for the kernel fix. Please follow-up on Yangqing's comments then we'll merge. |
|
I'm not sure how to go about writing a test for it. I probably want to test On Sun, Jun 8, 2014 at 7:38 AM, Evan Shelhamer notifications@github.com
|
I agree that is right kind of test. I don't see a problem with adding this as a further GPU test with the required Makefile change. @jeffdonahue thoughts? |
|
Merging, as this has been illustrated to be correct. Tests to verify launch configurations can follow in a future PR. Thanks @jamt9000! |
Should not modify index in im2col kernel loop
Should not modify index in im2col kernel loop
I assume this is a mistake, although I don't think it actually breaks anything in practice with the launch configuration used.