Meaning of threadIdx, blockIdx, blockDim, gridDim in the cuda (2D)

This article explain how to access the thread index when you make block and thread with two dimensions.

please refer to this page about method to access thread 1D.

If you make kernel like that

dim3 blocks(2,3);
dim3 thread(3,2);
Kernel<<< blocks, threads >>>

The threads are made as follows figure.

36 threads are made and gridDim and blockDim is (2,3) and (3,2).

problem is now...
How to access 15th thread??
See the this figure..

Do you understand?
We have to do indexing calculation, because threadIdx.x, threadIdx.y is only indicate indexing in their block.

For more detail, refer to below figure that represent the index list of tid calculation result.

#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include < stdio.h>

#define N 40

__global__ void increase(int *c){
 int x = threadIdx.x + blockIdx.x * blockDim.x;
 int y = threadIdx.y + blockIdx.y * blockDim.y;
 int tid = x + y*blockDim.x * gridDim.x;
 if(tid < N)
  c[tid] = tid;

int main(void)
 int c[N];
 int *dev_c;

 cudaMalloc( (void**)&dev_c, N*sizeof(int) );

 for(int i=0; i< N; ++i)
  c[i] = -1;

 cudaMemcpy(dev_c, c, N*sizeof(int), cudaMemcpyHostToDevice);

 dim3 blocks(2,3);
 dim3 threads(3,2);
 increase<<< blocks, threads>>>(dev_c);

 cudaMemcpy(c, dev_c, N*sizeof(int), cudaMemcpyDeviceToHost );

 for(int i=0; i< N; ++i)
  printf("c[%d] = %d \n" ,i, c[i] );

 cudaFree( dev_c );


In the source code, threads are made only 36. so 37th 38th 39th 40th array have left initial value -1.

No comments:

Post a Comment