Working out CUDA Just-In-Time (JIT) compilation

The CUDA workflow followed by many programmers consists of writing a code by distributing it in various .cpp and .cu files, where the .cu files contain the __global__ functions, while the.cpp files contain allocations of memory GPU spaces worked out by cudaMalloc, memory movements from host to device and vice versa performed by cudaMemcpy and __global__ function invokations executed by the