Comparing Syntax for CUDA, OpenCL and HiP

Reading Time: 2 minutes

Both CUDA and OpenCL are well-known GPGPU-languages. Unfortunately there are some slight differences between the languages, which are shown below.

You might have heard of HiP, the language that AMD made to support both modern AMD Fiji GPUs and CUDA-devices. CUDA can be (mostly automatically) translated to HiP and from that moment your code also supports AMD high-end devices.

To give an overview how HiP compares to other APIs, Ben Sanders made an overview. Below you’ll find the table for CUDA, OpenCL and HiP, slightly altered to be more complete. The languages HC and C++AMP can be found in the original.

TermCUDAOpenCLHiP
Deviceint deviceIdcl_deviceint deviceId
QueuecudaStream_tcl_command_queuehipStream_t
EventcudaEvent_tcl_eventhipEvent_t
Memoryvoid *cl_memvoid *
Grid of threadsgridNDRangegrid
Subgroup of threadsblockwork-groupblock
Threadthreadwork-itemthread
Scheduled executionwarpsub-group (warp, wavefront, etc)warp
Thread-indexthreadIdx.xget_local_id(0)hipThreadIdx_x
Block-indexblockIdx.xget_group_id(0)hipBlockIdx_x
Block-dimblockDim.xget_local_size(0)hipBlockDim_x
Grid-dimgridDim.xget_global_size(0)hipGridDim_x
Device Kernel__global____kernel__global__
Device Function__device__N/A. Implied in device compilation__device__
Host Function__host_ (default)N/A. Implied in host compilation.__host_ (default)
Host + Device Function__host____device__N/A.__host____device__
Kernel Launch<<< >>>clEnqueueNDRangeKernelhipLaunchKernel
Global Memory__global____global__global__
Group Memory__shared____local__shared__
Private Memory(default)__private(default)
Constant__constant____constant__constant__
Thread Synchronisation__syncthreadsbarrier(CLK_LOCAL_MEMFENCE)__syncthreads
Atomic BuiltinsatomicAddatomic_addatomicAdd
Precise Mathcos(f)cos(f)cos(f)
Fast Math__cos(f)native_cos(f)__cos(f)
Vectorfloat4float4float4

You see that HiP borrowed from CUDA.

The discussion is ofcourse if all alike APIs shouldn’t use the same wordings. A best thing would be to mix for the best, as CUDA’s “shared” is much more clearer than OpenCL’s “local”. OpenCL’s functions on locations and dimensions (get_global_id(0) and such) on the other had, are often more appreciated than what CUDA offers. CUDA’s “<<< >>>” breaks all C/C++ compilers, making it very hard to make a frontend of IDE-plugin.

I hope you found the above useful to better understand the differences between CUDA and OpenCL, but also to see how HiP comes into the picture.

 

Related Posts

IMG_20160829_172857_cropped

Get ready for conversions of large-scale CUDA software to AMD hardware

...  translating several types of software to AMD, targeting OpenCL (and HSA). The main problem was that manual porting limits the size ...