gpu - CUDA shared memory addressing -


i understand when declare shared memory array in kernel, same sized array declared threads. code like

__shared__ int s[5]; 

will create 20 byte array in each thread. way understand addressing shared memory is universal across threads. so, if address subscript 10 follows

s[10] = 1900; 

it exact same memory location across threads. won't case different threads access different shared memory address subscript 10. correct? compiler of course throws warnings subscript out of range.

actually create 20-byte array per block, not per thread.

every thread within block able access these 20 bytes. if need have n bytes per thread, , block m threads, you'll need create n*m buffer per block.

in case, if there 128 threads, have had

__shared__ int array[5*128]; 

and array[10] have been valid address thread within block.


Comments

Popular posts from this blog

c# - SVN Error : "svnadmin: E205000: Too many arguments" -

c# - Copy ObservableCollection to another ObservableCollection -

All overlapping substrings matching a java regex -