please refer to this page about method to access thread 1D.
->http://feelmare.blogspot.com/2015/03/meaning-of-threadidx-blockidx-blockdim.html
If you make kernel like that
dim3 blocks(2,3);
dim3 thread(3,2);
Kernel<<< blocks, threads >>>
The threads are made as follows figure.
36 threads are made and gridDim and blockDim is (2,3) and (3,2).
problem is now...
How to access 15th thread??
See the this figure..
Do you understand?
We have to do indexing calculation, because threadIdx.x, threadIdx.y is only indicate indexing in their block.
For more detail, refer to below figure that represent the index list of tid calculation result.
#include "cuda_runtime.h" #include "device_launch_parameters.h" #include < stdio.h> #define N 40 __global__ void increase(int *c){ int x = threadIdx.x + blockIdx.x * blockDim.x; int y = threadIdx.y + blockIdx.y * blockDim.y; int tid = x + y*blockDim.x * gridDim.x; if(tid < N) c[tid] = tid; } int main(void) { int c[N]; int *dev_c; cudaMalloc( (void**)&dev_c, N*sizeof(int) ); for(int i=0; i< N; ++i) { c[i] = -1; } cudaMemcpy(dev_c, c, N*sizeof(int), cudaMemcpyHostToDevice); dim3 blocks(2,3); dim3 threads(3,2); increase<<< blocks, threads>>>(dev_c); cudaMemcpy(c, dev_c, N*sizeof(int), cudaMemcpyDeviceToHost ); for(int i=0; i< N; ++i) { printf("c[%d] = %d \n" ,i, c[i] ); } cudaFree( dev_c ); }...
In the source code, threads are made only 36. so 37th 38th 39th 40th array have left initial value -1.
No comments:
Post a Comment