We were too busy lately to tell you about it: OpenCL 2.0 is getting ready for prime time! As it makes use of the more recent hardware features, it’s therefore more powerful than OpenCL 1.x could ever be.
To get you up to speed, see this list of new OpenCL 2.0 features:
- Shared Virtual Memory: host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.
- Dynamic Parallelism: device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks.
- Generic Address Space: functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application.
- Improved image support: including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop.
- C11 Atomics: a subset of C11 atomics and synchronization operations to enable assignments in one work-item to be visible to other work-items in a work-group, across work-groups executing on a device or for sharing data between the OpenCL device and host.
- Pipes: memory objects that store data organized as a FIFO and OpenCL 2.0 provides built-in functions for kernels to read from or write to a pipe, providing straightforward programming of pipe data structures that can be highly optimized by OpenCL implementers.
- Android Installable Client Driver Extension: Enables OpenCL implementations to be discovered and loaded as a shared object on Android systems.
I could write many articles about the above subjects, but leave that for later. This article won’t get into these technical details, but more into what’s available from the vendors. So let’s see what toys we were given!
A note: don’t start with OpenCL 2.0 directly, if you don’t know the basic concepts of OpenCL.
AMD has support on every new GPU, APU and CPU, like the R9 300 series. The AMD OpenCL 2.0 driver is also compatible with the following AMD products.
|AMD Desktop Product Family Compatibility|
|AMD Radeon™ R9 200 Series||AMD Radeon™ HD 8950|
|AMD Radeon™ R7 200 Series||AMD Radeon™ HD 8600 Series|
|AMD Radeon™ HD 7700 Series||AMD Radeon™ HD 8500 Series|
|AMD Workstation Product Family Compatibility|
|AMD FirePro™ W9100||AMD FirePro™ W5100|
|AMD FirePro™ S9150||AMD FirePro™ W2100|
|AMD APU Product Family Compatibility|
|AMD FX-7600P||AMD A10-7300|
|AMD FX-7500||AMD A8-7600|
|AMD A10 PRO-7850B||AMD A8-7200P|
|AMD A8 PRO-7600B||AMD A8-7100|
|AMD A6 PRO-7400B||AMD A6-7400K|
|AMD A4 PRO-7350B||AMD A6-7000|
|AMD A10-7850K||AMD RX225FB|
|AMD A10-7700K||AMD FX427BB|
|AMD A10-7400P||AMD FX425BB|
|AMD Mobility Product Family Compatibility |
|AMD Radeon™ HD 8790M||AMD Radeon™ R9 M280X|
|AMD Radeon™ HD 8530M||AMD Radeon™ R9 M270X|
|AMD Radeon™ HD 8600/8700M||AMD Radeon™ R7 M265 Series|
|AMD Radeon™ HD 8500/8700M||AMD Radeon™ R7 M260 Series|
|AMD Radeon™ HD 8600M Series||AMD Radeon™ R5 M200 Series|
|AMD Radeon™ HD 8500M Series|||
|AMD Embedded Product Family Compatibility |
|AMD A6-6000 Series||AMD Athlon™ 5350|
|AMD A4-5000 Series||AMD GX-420CA|
|AMD A4-1000 Series||AMD GX-415CA|
|AMD A10 Mircro-6700T||AMD GX-217GA|
|AMD A4 Micro-6400T||AMD GX-210HA|
|AMD E1 Micro-6200T||AMD E2/E1 Series|
Driver, SDK & samples
The driver has support, but is known not to be fully compliant yet. It is in the latest drivers, called “Omega”, which you can download via their website: Linux 64 and Windows 8.1 64 bit.
The SDK you can download here. AMD blogged about the beta SDK here, describing all the 20 (!) samples written/updated for OpenCL 2.0.
AMD has been blogging about new OpenCL 2.0 concepts for some months now.
- Device Enqueue and Workgroup Built-in Functions.
- Shared Virtual Memory.
- Fine-Grain Shared Virtual Memory.
- Image Enhancements.
- Generic Address Space and Program-Scope Variables.
OpenCL 2.0 support is limited to CPUs with Intel HD Graphics GPUs of type 5300, 5500 and newer. On older CPUs there is support for OpenCL 1.2 only.
Note that early Broadwell systems with HD Graphics 5300 do not support fine grained buffer Shared Virtual Memory. See the samples and articles below for more information.
There is no support for XeonPhi anymore, unfortunately.
Driver & SDK
The 15.1 driver can be downloaded here.
Intel has a central starting point for OpenCL here, with links to software and points of interest. Most interesting are the Experimental Development Environment (free) and Vtune for OpenCL (paid software)
Below are the samples for Intel. Some of the articles also have code.
- Shared Virtual Memory Code Sample. The fundamentals of using Shared Virtual Memory (SVM) capabilities in OpenCL applications.
- GPU-Quicksort in OpenCL 2.0. Nested Parallelism and Work-Group Scan Functions.
- Sierpiński Carpet. How to create a Sierpinski Carpet in OpenCL 2.0. Uses Dynamic Parallelism.
- Using Image2D From Buffer Extension. How to connect buffer-based kernel and image-based kernel into pipeline using the cl_khr_image2d_from_buffer extension.
More samples here – most before September 2014 are for 1.x. Later samples at least mention 2.0.
Also Intel is blogging on OpenCL 2.0 concepts:
- Shared Virtual Memory Overview. A new feature that enables OpenCL developers to write code with extensive use of pointer-linked data structures like linked lists or trees that are shared between the host and a device side of an OpenCL application.
- Non-Uniform Work-Groups (with code). A new feature that loosens the requirement that the work-group size must be evenly divided over the NDRange size.
- Using 2.0 Work-group Functions (with code). An introduction to work-group functions and their usage
- The Generic Address Space. Parameters without global, local, private or constant identifier.
- Using the new sRGB Image Format (with code). Writing OpenCL code to be viewed on CRT-monitors.
- Using 2.0 Atomics. Discusses some caveats in the atomics usage and applicability to various GPU programming tasks.
Not yet, but they do took “pipes” from the 2.0 specs (under some conditions). We’ll discuss this in a separate blog article.
Even though they lead the Khronos group, they prefer to only implement the new standard in their proprietary CUDA language. On who invented the new functionality in OpenCL 2.0, I can only say one thing: NVidia doesn’t have patents to stop OpenCL from “copying”.
There are rumours that Nvidia will start supporting 2.0 this year, but are not confirmed. A new discussion on it is on LinkedIn groups. Let other know, if you know more.
Want to port your software to OpenCL 2.0?
If you want to target AMD hardware next to Nvidia CUDA, then now is a good moment to go for AMD + OpenCL 2.0. For Intel it’s probably best to support both OpenCL 1.2 and 2.0 for now – it is possible to support several versions of OpenCL.
The above information should help to get you going, to make the right design decisions and to start porting. Know that several 2.0-concepts are quite easy to upgrade to.
If you want to know if it’s the right time for you to go to OpenCL 2.0, just ask.