Both CUDA and OpenCL are well-known GPGPU-languages. Unfortunately there are some slight differences between the languages, which are shown below.
You might have heard of HiP, the language that AMD made to support both modern AMD Fiji GPUs and CUDA-devices. CUDA can be (mostly automatically) translated to HiP and from that moment your code also supports AMD high-end devices.
To give an overview how HiP compares to other APIs, Ben Sanders made an overview. Below you’ll find the table for CUDA, OpenCL and HiP, slightly altered to be more complete. The languages HC and C++AMP can be found in the original.
Term | CUDA | OpenCL | HiP |
---|---|---|---|
Device | int deviceId | cl_device | int deviceId |
Queue | cudaStream_t | cl_command_queue | hipStream_t |
Event | cudaEvent_t | cl_event | hipEvent_t |
Memory | void * | cl_mem | void * |
Grid of threads | grid | NDRange | grid |
Subgroup of threads | block | work-group | block |
Thread | thread | work-item | thread |
Scheduled execution | warp | sub-group (warp, wavefront, etc) | warp |
Thread-index | threadIdx.x | get_local_id(0) | hipThreadIdx_x |
Block-index | blockIdx.x | get_group_id(0) | hipBlockIdx_x |
Block-dim | blockDim.x | get_local_size(0) | hipBlockDim_x |
Grid-dim | gridDim.x | get_global_size(0) | hipGridDim_x |
Device Kernel | __global__ | __kernel | __global__ |
Device Function | __device__ | N/A. Implied in device compilation | __device__ |
Host Function | __host_ (default) | N/A. Implied in host compilation. | __host_ (default) |
Host + Device Function | __host____device__ | N/A. | __host____device__ |
Kernel Launch | <<< >>> | clEnqueueNDRangeKernel | hipLaunchKernel |
Global Memory | __global__ | __global | __global__ |
Group Memory | __shared__ | __local | __shared__ |
Private Memory | (default) | __private | (default) |
Constant | __constant__ | __constant | __constant__ |
Thread Synchronisation | __syncthreads | barrier(CLK_LOCAL_MEMFENCE) | __syncthreads |
Atomic Builtins | atomicAdd | atomic_add | atomicAdd |
Precise Math | cos(f) | cos(f) | cos(f) |
Fast Math | __cos(f) | native_cos(f) | __cos(f) |
Vector | float4 | float4 | float4 |
You see that HiP borrowed from CUDA.
The discussion is ofcourse if all alike APIs shouldn’t use the same wordings. A best thing would be to mix for the best, as CUDA’s “shared” is much more clearer than OpenCL’s “local”. OpenCL’s functions on locations and dimensions (get_global_id(0) and such) on the other had, are often more appreciated than what CUDA offers. CUDA’s “<<< >>>” breaks all C/C++ compilers, making it very hard to make a frontend of IDE-plugin.
I hope you found the above useful to better understand the differences between CUDA and OpenCL, but also to see how HiP comes into the picture.
Related Posts
OpenCL vs CUDA Misconceptions
... year I explained the main differences between CUDA and OpenCL. Now I want to get some old (and partly) false stories ...
NVIDIA ended their support for OpenCL in 2012
... for the samples in one zip-file, scroll down. The removed OpenCL-PDFs are also available for download. This sentence ...
How to install OpenCL on Windows
... your Windows machine ready for OpenCL is rather straightforward. In short, you only need the latest drivers ... GPUs, ...
IWOCL 2017 – all the talks
... to submit queues in a single task and stream-like object, comparing CPU, SYCL and OpenCL versions Demonstrate how to access ...