Comparing Syntax for CUDA, OpenCL and HiP

Both CUDA and OpenCL are well-known GPGPU-languages. Unfortunately there are some slight differences between the languages, which are shown below.

You might have heard of HiP, the language that AMD made to support both modern AMD Fiji GPUs and CUDA-devices. CUDA can be (mostly automatically) translated to HiP and from that moment your code also supports AMD high-end devices.

To give an overview how HiP compares to other APIs, Ben Sanders made an overview. Below you’ll find the table for CUDA, OpenCL and HiP, slightly altered to be more complete. The languages HC and C++AMP can be found in the original.

Term	CUDA	OpenCL	HiP
Device	int deviceId	cl_device	int deviceId
Queue	cudaStream_t	cl_command_queue	hipStream_t
Event	cudaEvent_t	cl_event	hipEvent_t
Memory	void *	cl_mem	void *
Grid of threads	grid	NDRange	grid
Subgroup of threads	block	work-group	block
Thread	thread	work-item	thread
Scheduled execution	warp	sub-group (warp, wavefront, etc)	warp
Thread-index	threadIdx.x	get_local_id(0)	hipThreadIdx_x
Block-index	blockIdx.x	get_group_id(0)	hipBlockIdx_x
Block-dim	blockDim.x	get_local_size(0)	hipBlockDim_x
Grid-dim	gridDim.x	get_global_size(0)	hipGridDim_x
Device Kernel	__global__	__kernel	__global__
Device Function	__device__	N/A. Implied in device compilation	__device__
Host Function	__host_ (default)	N/A. Implied in host compilation.	__host_ (default)
Host + Device Function	__host____device__	N/A.	__host____device__
Kernel Launch	<<< >>>	clEnqueueNDRangeKernel	hipLaunchKernel
Global Memory	__global__	__global	__global__
Group Memory	__shared__	__local	__shared__
Private Memory	(default)	__private	(default)
Constant	__constant__	__constant	__constant__
Thread Synchronisation	__syncthreads	barrier(CLK_LOCAL_MEMFENCE)	__syncthreads
Atomic Builtins	atomicAdd	atomic_add	atomicAdd
Precise Math	cos(f)	cos(f)	cos(f)
Fast Math	__cos(f)	native_cos(f)	__cos(f)
Vector	float4	float4	float4

You see that HiP borrowed from CUDA.

The discussion is ofcourse if all alike APIs shouldn’t use the same wordings. A best thing would be to mix for the best, as CUDA’s “shared” is much more clearer than OpenCL’s “local”. OpenCL’s functions on locations and dimensions (get_global_id(0) and such) on the other had, are often more appreciated than what CUDA offers. CUDA’s “<<< >>>” breaks all C/C++ compilers, making it very hard to make a frontend of IDE-plugin.

I hope you found the above useful to better understand the differences between CUDA and OpenCL, but also to see how HiP comes into the picture.

StreamHPC communications

Comparing Syntax for CUDA, OpenCL and HiP

One thought on “Comparing Syntax for CUDA, OpenCL and HiP”

StreamHPC communications

One thought on “Comparing Syntax for CUDA, OpenCL and HiP”

Discover more from StreamHPC