2. Gather : each calculation gathers input data elements together from different places to compute an output result.
3. Scatter : tasks compute where to write output
*out[i] = pi * in[i]
There is a 1-to-1 correspondence between the output and the input, so that's clearly an Map operation.
*out[i+j*128] = in[j+i*128];
i, j is reorder the array, so this is Transpose operation
*out[i-1] += pi * in[i]; out[i+1] += pi * in[i]
the value of calculation is placing the into a couple of different places in the output.
So Scatter operator
*out[i] = (in[i] + in[i-1] + in[i+1])* pi/3.0f;
every thread is writing a single location in the output array, and it's reading from multiple places in the input array, locations that it computes.
this looks very much like a stencil operation since it's reading from a local neighborhood, but because " if(i%2) ", it's not writing into every location.
image captured from UdaCity