CUDA Compute Capability 6.1 Features in OpenCL 2.0

Reading Time: 2 minutes

On the CUDA page of Wikipedia there is a table with compute capabilities, as shown below. While double checking support for AMD Fijij GPUs (like Radeon Nano and FirePro S9300X2) I got curious how much support is still missing in OpenCL. For the support of Fiji it looks like there is 100% support of all features. For OpenCL 2.0 read on.

CUDA features per Compute Capability on Wikipedia

Feature overview

The below table does not discuss performance, which is ofcourse also a factor.

CUDA 3.5 or higherOpenCL 2.0
Integer atomic functions operating on 32-bit words in global memoryyes
atomicExch() operating on 32-bit floating point values in global memoryfunction: atomic_xchg()
Integer atomic functions operating on 32-bit words in shared memoryyes
atomicExch() operating on 32-bit floating point values in shared memoryfunction: atomic_xchg()
Integer atomic functions operating on 64-bit words in global memoryextensions: cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
Double-precision floating-point operationsif device info CL_DEVICE_DOUBLE_FP_CONFIG is not empty, it is supported. For backwards compatibility the extension cl_khr_fp64 is still available.
Atomic functions operating on 64-bit integer values in shared memoryextensions: cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
Floating-point atomic addition operating on 32-bit words in global and shared memoryN/A – see this post for a hack.
Warp vote functionsImplemented in the new Work-group Functions – see this post by Intel.
_ballot()Hack: work_group_all() with bit-shift using get_local_id().
_threadfence_system()Hack: needs a sync from the host.
_syncthreads_count()Hack: work_group_reduce_sum() + barrier()
_syncthreads_and()Hack: work_group_all() + work_group_barrier()
_syncthreads_or()Hack: work_group_any() + work_group_barrier()
Surface functionsImages
3D grid of thread block3 dimensional work-groups
Warp shuffle functionsN/A – see the notes below
Funnel shiftThis is a bit-shift where the shifted bits are not filled with zeroes but with the bits from the second integer.
hack: bit-shifting both integers (one left N bits and the other right (32-N) bits) and then doing a bit-wise sum.
Dynamic parallelismNested Parallelism

So you see, that OpenCL almost covers what CUDA offers – most notable missing is the workgroup shuffle, whereas other missing functions can be implemented in two steps.

If you want to know what is new in OpenCL (including features not existing in CUDA, like pipes), see this blog post.


Related Posts

nvidia logo

NVIDIA enables OpenCL 2.0 beta-support

...  to be enqueued with global_work_size larger than the compute capability of the NVIDIA GPU. The current implementation supports ...


Get ready for conversions of large-scale CUDA software to AMD hardware

...  in town. AMD now offers HIP, which converts over 95% of CUDA, such that it works on both AMD and NVIDIA hardware. That 5% is ...


Targetting various architectures in OpenCL and CUDA

...  as a primary target. The funny thing is that with CUDA 5.0 it has become clearer that NVIDIA has the problem in their ...  ...


OpenCL Videos of AMD’s AFDS 2012

...  abstraction that C has provided for more than 30 years. Computer systems programming has for more than two decades been able to do ...