Warning: below is raw material, and needs some editing.
Today there was quite some news around OpenCL, I’m afraid I can’t wait till later to have all news covered. Some news is unexpected, some is great. Let’s start with the great news, as the unexpected news needs some discussion.
Khronos released OpenCL 2.1 final specs
As of today you can download the header files and specs from https://www.khronos.org/opencl/. The biggest changes are:
- C++ kernels (still separate source files, which is to be tackled by SYCL)
- Subgroups are now a core functionality. This enables finer grain control of hardware threading.
- New function clCloneKernel enables copying of kernel objects and state for safe implementation of copy constructors in wrapper classes. Hear all Java and .NET folks cheer?
- Low-latency device timer queries for alignment of profiling data between device and host code.
OpenCL 2.1 will be supported by AMD. Intel was very loud with support when the provisional specs got released, but gave no comments today. Other vendors did not release an official statement.
Khronos released SPIR-V 1.0 final specs
SPIR-V 1.0 can represent the full capabilities of OpenCL 2.1 kernels.
This is very important! OpenCL is not the only language anymore that is seen as input for GPU-compilers. Neither is OpenCL hostcode the only API that can handle the compute shaders, as also Vulkan can do this. Lots of details still have to be seen, as not all SPIRV compilers will have full support for all OpenCL-related commands.
With the specs the following tools have been released:
- A bi-directional translator between LLVM to SPIR-V to enable flexible use of both intermediate languages in tool chains.
- An OpenCL C to LLVM compiler that generates SPIR-V through the above translator, as Clang can compile OpenCL 1.2/2.0 C kernels.
- A SPIR-V assembler and disassembler.
SPIRV will make many frontends possible, giving co-processor powers to every programming language that exists. I will blog more about SPIRV possibilities the coming year.
Intel claims OpenMP is up to 10x faster than OpenCL
The below image appeared on Twitter, claiming that OpenMP was much faster than OpenCL. Some discussion later, we could conclude they compared apples and oranges. We’re happy to peer-review the results, putting the claims in a full perspective where MKL and operation mode is mentioned. Unfortunately they did not react, as <sarcasm>we will be very happy to admit that for the first time in history a directive language is faster than an explicit language – finally we have magic!</sarcasm>
.@IntelHPC Need a second opinion to show it is true that OpenMP is faster than OpenCL on the XeonPhi?
— StreamHPC (@StreamHPC) November 16, 2015

We get back later this week on Intel and their upcoming Xeon+FPGA chip, if OpenCL is the best language for that job. It ofcourse is possible that they try to run OpenMP on the FPGA, but then this would be big surprise. Truth is that Intel doesn’t like this powerful open standard intruding the HPC market, where they have a monopoly.
AMD claims OpenCL is hardly used in HPC
Well, this is one of those claims that they did not really think through. OpenCL is used in HPC quite a lot, but mostly on NVidia hardware. Why not just CUDA there? Well, there is demand for OpenCL for several reasons:
- Avoid vendor lock-in.
- Making code work on more hardware.
- General interest in co-processors, not specific one brand.
- Initial code is being developed on different hardware.
- …
Thing is that NVidia did a superb job in getting their processors in supercomputers and clouds. So OpenCL is mostly run on NVidia hardware and a therefore the biggest reason why that company is so successful in slowing the advancement of the standard by rolling out upgrades 4 years later. Even though I tried to get the story out, NVidia is not eager to tell the beautiful love story between OpenCL and the NVidia co-processor, as the latter has CUDA as its wife.
Also at HPC sites with Intel XeonPhi gets OpenCL love. Same here: Intel prefers to tell about their OpenMP instead of OpenCL.
AMD has few HPC sites and indeed there is where OpenCL is used.
No, we’re not happy that AMD tells such things, only to promote its own new languages.
CUDA goes AMD and open source
AMD now supports CUDA! The details: they have made a tool that can compile CUDA to “HiP” – HiP is a new language without much details at the moment. Yes, I have the same questions as you are asking now.

Also Google joined in and showed progress on their open source implementation of CUDA. Phoronix is currently the best source for this initative and today they shared a story with a link to slides from Google on the project. the results are great up: “it is to 51% faster on internal end-to-end benchmarks, on par with open-source benchmarks, compile time is 8% faster on average and 2.4x faster for pathological compilations compared to NVIDIA’s official CUDA compiler (NVCC)”.
For compiling CUDA in LLVM you need three parts:
- a pre-processor that works around the non-standard <<<…>>> notation and splits off the kernels.
- a source-to-source compiler for the kernels.
- an bridge between the CUDA API and another API, like OpenCL.
Google has done most of this and now focuses mostly on performance. The OpenCL community can use this to use this project to make a complete CUDA-to-SPIRV compiler and use the rest to improve POCL.
Khronos gets a more open homepage
Starting today you can help keeping the Khronos webpage more up-to-date. Just put a pull request at https://github.com/KhronosGroup/Khronosdotorg and wait until it gets accepted. This should help the pages be more up-to-date, as you can now improve the webpages in more ways.
More news?
AMD released HCC, a C++ language with OpenMP built-in that doesn’t compile to SPIRV.
There have been tutorials and talks on OpenCL, which I should have shared with you earlier.
Tomorrow another post with more news. If I forgot something on Sunday or Monday, I’ll add it here.























We recently started a new service, which we were actually doing for years already. You can also learn from this: one can become very experienced in a task and then noticing years later that it can be a service on itself. So starting 


We’re starting the beta phase of our AMD FirePro based OpenCL cloud services in about a month, to test our API. If you need to have your OpenCL based service online and don’t want to pay hundreds to thousands of euros for GPU-hosting, then this is what you need. We have place for a few others.


A high-level language has been on OpenCL’s roadmap since the years, and would be started once the foundations were ready. Therefore with OpenCL 2.0, SYCL was born.
For years we have had a good collaboration with the Khronos group, mainly due our community presence. Now it was time to get into a closer collaboration and become an official 
There has been quite some “find OpenCL” code for CMake around. If you haven’t heard of CMake, it’s the most useful cross-platform tool to make cross-platform software.




















