gpu - CUDA shared memory addressing -
i understand when declare shared memory array in kernel, same sized array declared threads. code like
__shared__ int s[5];
will create 20 byte array in each thread. way understand addressing shared memory is universal across threads. so, if address subscript 10 follows
s[10] = 1900;
it exact same memory location across threads. won't case different threads access different shared memory address subscript 10. correct? compiler of course throws warnings subscript out of range.
actually create 20-byte array per block, not per thread.
every thread within block able access these 20 bytes. if need have n bytes per thread, , block m threads, you'll need create n*m buffer per block.
in case, if there 128 threads, have had
__shared__ int array[5*128];
and array[10] have been valid address thread within block.
Comments
Post a Comment