



Also instead of dividing by 2 to calculate average using this method, one can use shift right by 1 time.
( 25 32) Average of 3 elements can be calculated using averages of 2 elements twice. When we call a kernel using the instruction <<< > we automatically define a dim3 type variable defining the number of blocks per grid and threads per.I am writing cuda version of merge sort and if I am using cudaMemcpyDeviceToHost in order to get back list of elements from GPU, it's giving memory error, on the other side if I am commenting out the line then the program is not sorting properly. Instead of multiplying by 32 why not shift left 5 times.
