 We were too busy lately to tell you about it: OpenCL 2.0 is getting ready for prime time! As it makes use of the more recent hardware features, it’s therefore more powerful than OpenCL 1.x could ever be.
We were too busy lately to tell you about it: OpenCL 2.0 is getting ready for prime time! As it makes use of the more recent hardware features, it’s therefore more powerful than OpenCL 1.x could ever be.
To get you up to speed, see this list of new OpenCL 2.0 features:
- Shared Virtual Memory: host and device kernels can directly share complex, pointer-containing data structures such as trees and linked lists, providing significant programming flexibility and eliminating costly data transfers between host and devices.
- Dynamic Parallelism: device kernels can enqueue kernels to the same device with no host interaction, enabling flexible work scheduling paradigms and avoiding the need to transfer execution control and data between the device and host, often significantly offloading host processor bottlenecks.
- Generic Address Space: functions can be written without specifying a named address space for arguments, especially useful for those arguments that are declared to be a pointer to a type, eliminating the need for multiple functions to be written for each named address space used in an application.
- Improved image support: including sRGB images and 3D image writes, the ability for kernels to read from and write to the same image, and the creation of OpenCL images from a mip-mapped or a multi-sampled OpenGL texture for improved OpenGL interop.
- C11 Atomics: a subset of C11 atomics and synchronization operations to enable assignments in one work-item to be visible to other work-items in a work-group, across work-groups executing on a device or for sharing data between the OpenCL device and host.
- Pipes: memory objects that store data organized as a FIFO and OpenCL 2.0 provides built-in functions for kernels to read from or write to a pipe, providing straightforward programming of pipe data structures that can be highly optimized by OpenCL implementers.
- Android Installable Client Driver Extension: Enables OpenCL implementations to be discovered and loaded as a shared object on Android systems.
I could write many articles about the above subjects, but leave that for later. This article won’t get into these technical details, but more into what’s available from the vendors. So let’s see what toys we were given!
A note: don’t start with OpenCL 2.0 directly, if you don’t know the basic concepts of OpenCL.
AMD
Hardware support
AMD has support on every new GPU, APU and CPU, like the R9 300 series. The AMD OpenCL 2.0 driver is also compatible with the following AMD products.
| AMD Desktop Product Family Compatibility | |
| AMD Radeon™ R9 200 Series | AMD Radeon™ HD 8950 | 
| AMD Radeon™ R7 200 Series | AMD Radeon™ HD 8600 Series | 
| AMD Radeon™ HD 7700 Series | AMD Radeon™ HD 8500 Series | 
| AMD Workstation Product Family Compatibility | |
| AMD FirePro™ W9100 | AMD FirePro™ W5100 | 
| AMD FirePro™ S9150 | AMD FirePro™ W2100 | 
| AMD APU Product Family Compatibility | |
| AMD FX-7600P | AMD A10-7300 | 
| AMD FX-7500 | AMD A8-7600 | 
| AMD A10 PRO-7850B | AMD A8-7200P | 
| AMD A8 PRO-7600B | AMD A8-7100 | 
| AMD A6 PRO-7400B | AMD A6-7400K | 
| AMD A4 PRO-7350B | AMD A6-7000 | 
| AMD A10-7850K | AMD RX225FB | 
| AMD A10-7700K | AMD FX427BB | 
| AMD A10-7400P | AMD FX425BB | 
| AMD Mobility Product Family Compatibility  | |
| AMD Radeon™ HD 8790M | AMD Radeon™ R9 M280X | 
| AMD Radeon™ HD 8530M | AMD Radeon™ R9 M270X | 
| AMD Radeon™ HD 8600/8700M | AMD Radeon™ R7 M265 Series | 
| AMD Radeon™ HD 8500/8700M | AMD Radeon™ R7 M260 Series | 
| AMD Radeon™ HD 8600M Series | AMD Radeon™ R5 M200 Series | 
| AMD Radeon™ HD 8500M Series |  | 
| AMD Embedded Product Family Compatibility  | |
| AMD A6-6000 Series | AMD Athlon™ 5350 | 
| AMD A4-5000 Series | AMD GX-420CA | 
| AMD A4-1000 Series | AMD GX-415CA | 
| AMD A10 Mircro-6700T | AMD GX-217GA | 
| AMD A4 Micro-6400T | AMD GX-210HA | 
| AMD E1 Micro-6200T | AMD E2/E1 Series | 
Driver, SDK & samples
The driver has support, but is known not to be fully compliant yet. It is in the latest drivers, called “Omega”, which you can download via their website: Linux 64 and Windows 8.1 64 bit.
The SDK you can download here. AMD blogged about the beta SDK here, describing all the 20 (!) samples written/updated for OpenCL 2.0.
Articles
AMD has been blogging about new OpenCL 2.0 concepts for some months now.
- Pipes.
- Device Enqueue and Workgroup Built-in Functions.
- Shared Virtual Memory.
- Fine-Grain Shared Virtual Memory.
- Image Enhancements.
- Generic Address Space and Program-Scope Variables.
Intel
Hardware Support
OpenCL 2.0 support is limited to CPUs with Intel HD Graphics GPUs of type 5300, 5500 and newer. On older CPUs there is support for OpenCL 1.2 only.
Note that early Broadwell systems with HD Graphics 5300 do not support fine grained buffer Shared Virtual Memory. See the samples and articles below for more information.
There is no support for XeonPhi anymore, unfortunately.
Driver & SDK
The 15.1 driver can be downloaded here.
Intel has a central starting point for OpenCL here, with links to software and points of interest. Most interesting are the Experimental Development Environment (free) and Vtune for OpenCL (paid software)
Samples
Below are the samples for Intel. Some of the articles also have code.
- Shared Virtual Memory Code Sample. The fundamentals of using Shared Virtual Memory (SVM) capabilities in OpenCL applications.
- GPU-Quicksort in OpenCL 2.0. Nested Parallelism and Work-Group Scan Functions.
- Sierpiński Carpet. How to create a Sierpinski Carpet in OpenCL 2.0. Uses Dynamic Parallelism.
- Using Image2D From Buffer Extension. How to connect buffer-based kernel and image-based kernel into pipeline using the cl_khr_image2d_from_buffer extension.
More samples here – most before September 2014 are for 1.x. Later samples at least mention 2.0.
Articles
Also Intel is blogging on OpenCL 2.0 concepts:
- Shared Virtual Memory Overview. A new feature that enables OpenCL developers to write code with extensive use of pointer-linked data structures like linked lists or trees that are shared between the host and a device side of an OpenCL application.
- Non-Uniform Work-Groups (with code). A new feature that loosens the requirement that the work-group size must be evenly divided over the NDRange size.
- Using 2.0 Work-group Functions (with code). An introduction to work-group functions and their usage
- The Generic Address Space. Parameters without global, local, private or constant identifier.
- Using the new sRGB Image Format (with code). Writing OpenCL code to be viewed on CRT-monitors.
- Using 2.0 Atomics. Discusses some caveats in the atomics usage and applicability to various GPU programming tasks.
Altera
Not yet, but they do took “pipes” from the 2.0 specs (under some conditions). We’ll discuss this in a separate blog article.
NVidia
Even though they lead the Khronos group, they prefer to only implement the new standard in their proprietary CUDA language. On who invented the new functionality in OpenCL 2.0, I can only say one thing: NVidia doesn’t have patents to stop OpenCL from “copying”.
There are rumours that Nvidia will start supporting 2.0 this year, but are not confirmed. A new discussion on it is on LinkedIn groups. Let other know, if you know more.
Want to port your software to OpenCL 2.0?
If you want to target AMD hardware next to Nvidia CUDA, then now is a good moment to go for AMD + OpenCL 2.0. For Intel it’s probably best to support both OpenCL 1.2 and 2.0 for now – it is possible to support several versions of OpenCL.
The above information should help to get you going, to make the right design decisions and to start porting. Know that several 2.0-concepts are quite easy to upgrade to.
If you want to know if it’s the right time for you to go to OpenCL 2.0, just ask.
 
					
Vincent, great summary! Thanks.
You missed one of the AMD blogs, however. It’s on SVM (as distinguished from fine grain SVM): http://developer.amd.com/community/blog/2014/10/24/opencl-2-shared-virtual-memory/. Happy coding!
Thanks! I’ve added it.
Couple of comments on the Intel OpenCL 2.0 offerings:
1. Intel released OpenCL 2.0 drivers for 5th Generation Intel(R) Core Processors, which means not only processors with Intel HD Graphics 5300, but also processors with Intel HD Graphics 5500 and beyond.
2. Some of the articles you mentioned also contain code samples (Non-Uniform Work-Groups, Using OpenCL 2.0 Work-group Functions).
3. The first link under Hardware Support is broken.
4. Intel has an article and a smaple on Generic Address Space in OpenCL 2.0: https://software.intel.com/en-us/articles/the-generic-address-space-in-opencl-20
5. More OpenCL 2.0 articles with samples from Intel: Using OpenCL 2.0 sRGB Image Format: https://software.intel.com/en-us/articles/using-opencl-20-srgb-image-format , Using OpenCL 2.0 Atomics: https://software.intel.com/en-us/articles/using-opencl-20-atomics
6. The link to Using Image2D From Buffer Extension is broken. The right link is https://software.intel.com/en-us/articles/using-image2d-from-buffer-extension
Please update the article accordingly.
Thanks for your feedback! And more thank you for writing some of the articles!
1) I’ve been searching for this information thoroughly, but I chose to copy the information from Intel’s website.
2) True. I’ll make that more clear.
3) It was not broken when I published this article 10 days ago. Do you know where it’s now?
4+5) Thanks. Added.
6) Thanks! I seem not have noticed this during the final check.
Thanks for fixing some of the links. One link is still broken: The sentence in Drivers & SDK section starting with “Intel has a central starting point for OpenCL ” should point to https:/software.intel.com/en-us/intel-opencl-support
Also note, that Intel drivers are Release drivers, not Beta, so the first sentence in Hardware Support should change.
Not sure about what 3) referred too. Tried to search Intel site without any luck. Bottom line: OpenCL 2.0 supported on all 5th Gen Intel(R) Core Processors on Windows. Early Broadwell systems with HD Graphics 5300 do not support fine grained buffer SVM, but systems with HD Graphics 5500 and above support fine grained buffer SVM in addition to coarse-grained buffer SVM, supported on all 5th Gen processors.
Pingback: AMD 放大招了…推出 Boltzmann Initiative、支援 CUDA 程式的轉移! | Heresy's Space