Showing posts with label threadIdx. Show all posts
Showing posts with label threadIdx. Show all posts

3/12/2015

Meaning of threadIdx, blockIdx, blockDim, gridDim in the cuda (2D)

This article explain how to access the thread index when you make block and thread with two dimensions.

please refer to this page about method to access thread 1D.
->http://feelmare.blogspot.com/2015/03/meaning-of-threadidx-blockidx-blockdim.html

If you make kernel like that

dim3 blocks(2,3);
dim3 thread(3,2);
Kernel<<< blocks, threads >>>

The threads are made as follows figure.


36 threads are made and gridDim and blockDim is (2,3) and (3,2).

problem is now...
How to access 15th thread??
See the this figure..



Do you understand?
We have to do indexing calculation, because threadIdx.x, threadIdx.y is only indicate indexing in their block.

For more detail, refer to below figure that represent the index list of tid calculation result.




#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include < stdio.h>



#define N 40

__global__ void increase(int *c){
 int x = threadIdx.x + blockIdx.x * blockDim.x;
 int y = threadIdx.y + blockIdx.y * blockDim.y;
 int tid = x + y*blockDim.x * gridDim.x;
 if(tid < N)
  c[tid] = tid;
}



int main(void)
{
 int c[N];
 int *dev_c;

 cudaMalloc( (void**)&dev_c, N*sizeof(int) );

 for(int i=0; i< N; ++i)
 {
  c[i] = -1;
 }

 cudaMemcpy(dev_c, c, N*sizeof(int), cudaMemcpyHostToDevice);

 dim3 blocks(2,3);
 dim3 threads(3,2);
 increase<<< blocks, threads>>>(dev_c);

 cudaMemcpy(c, dev_c, N*sizeof(int), cudaMemcpyDeviceToHost );

 for(int i=0; i< N; ++i)
 {
  printf("c[%d] = %d \n" ,i, c[i] );
 }

 cudaFree( dev_c );
}

...


In the source code, threads are made only 36. so 37th 38th 39th 40th array have left initial value -1.





Meaning of threadIdx, blockIdx, blockDim, gridDim in the cuda (1D)


When we study cuda firstly, thread indexing is very confusing.
So I tried to clean up.

First, Let's grab a sense of looking at this example
...
#include "cuda_runtime.h"
#include "device_launch_parameters.h"

#include < stdio.h>

#define N 15

__global__ void increase(int *c){
 int tid = threadIdx.x + blockIdx.x * blockDim.x;
 
 if(tid < N)
  c[tid] = tid;
}

int main(void)
{
 int c[N];
 int *dev_c;

 cudaMalloc( (void**)&dev_c, N*sizeof(int) );

 for(int i=0; i< N; ++i)
 {
  c[i] = -1;
 }

 cudaMemcpy(dev_c, c, N*sizeof(int), cudaMemcpyHostToDevice);

 increase<<< 4, 3>>>(dev_c);

 cudaMemcpy(c, dev_c, N*sizeof(int), cudaMemcpyDeviceToHost );

 for(int i=0; i< N; ++i)
 {
  printf("c[%d] = %d \n" ,i, c[i] );
 }

 cudaFree( dev_c );
}
...

The result of this example source is

In the source code, kernel function(increase) is created by <<< 4, 3 >>>, this means to create 12 threads.
12 threads are executed at the same time.
So, the kernel function need to know what number of thread am I?
The method is threadIdx and blockIdx.

But we have to calculate thread index, because threadIdx and blockIdx is different space index.
like that " int tid = threadIdx.x + blockIdx.x * blockDim.x; "

threadIdx tells current thread index.
blockIdx tells current block index.

gridDim tells number of blocks in a grid
blockDim tells number of a threads in a block


Did you more confused?

My final explanation. See the this figure.




Next time, I will introduce 2D kernel.
http://study.marearts.com/2015/03/meaning-of-threadidx-blockidx-blockdim_12.html
Thank you.