Comparing Syntax for CUDA, OpenCL and HiP

Both CUDA and OpenCL are well-known GPGPU-languages. Unfortunately there are some slight differences between the languages, which are shown below.

You might have heard of HiP, the language that AMD made to support both modern AMD Fiji GPUs and CUDA-devices. CUDA can be (mostly automatically) translated to HiP and from that moment your code also supports AMD high-end devices.

To give an overview how HiP compares to other APIs, Ben Sanders made an overview. Below you’ll find the table for CUDA, OpenCL and HiP, slightly altered to be more complete. The languages HC and C++AMP can be found in the original.

Term CUDA OpenCL HiP
Device int deviceId cl_device int deviceId
Queue cudaStream_t cl_command_queue hipStream_t
Event cudaEvent_t cl_event hipEvent_t
Memory void * cl_mem void *
Grid of threads grid NDRange grid
Subgroup of threads block work-group block
Thread thread work-item thread
Scheduled execution warp sub-group (warp, wavefront, etc) warp
Thread-index threadIdx.x get_local_id(0) hipThreadIdx_x
Block-index blockIdx.x get_group_id(0) hipBlockIdx_x
Block-dim blockDim.x get_local_size(0) hipBlockDim_x
Grid-dim gridDim.x get_global_size(0) hipGridDim_x
Device Kernel __global__ __kernel __global__
Device Function __device__ N/A. Implied in device compilation __device__
Host Function __host_ (default) N/A. Implied in host compilation. __host_ (default)
Host + Device Function __host____device__ N/A. __host____device__
Kernel Launch <<< >>> clEnqueueNDRangeKernel hipLaunchKernel
Global Memory __global__ __global __global__
Group Memory __shared__ __local __shared__
Private Memory (default) __private (default)
Constant __constant__ __constant __constant__
Thread Synchronisation __syncthreads barrier(CLK_LOCAL_MEMFENCE) __syncthreads
Atomic Builtins atomicAdd atomic_add atomicAdd
Precise Math cos(f) cos(f) cos(f)
Fast Math __cos(f) native_cos(f) __cos(f)
Vector float4 float4 float4

You see that HiP borrowed from CUDA.

The discussion is ofcourse if all alike APIs shouldn’t use the same wordings. A best thing would be to mix for the best, as CUDA’s “shared” is much more clearer than OpenCL’s “local”. OpenCL’s functions on locations and dimensions (get_global_id(0) and such) on the other had, are often more appreciated than what CUDA offers. CUDA’s “<<< >>>” breaks all C/C++ compilers, making it very hard to make a frontend of IDE-plugin.

I hope you found the above useful to better understand the differences between CUDA and OpenCL, but also to see how HiP comes into the picture.

Related Posts


Get ready for conversions of large-scale CUDA software to AMD hardware

...  translating several types of software to AMD, targeting OpenCL (and HSA). The main problem was that manual porting limits the size ...