Warning: below is raw material, and needs some editing.
Today there was quite some news around OpenCL, I’m afraid I can’t wait till later to have all news covered. Some news is unexpected, some is great. Let’s start with the great news, as the unexpected news needs some discussion.
Khronos released OpenCL 2.1 final specs
As of today you can download the header files and specs from https://www.khronos.org/opencl/. The biggest changes are:
- C++ kernels (still separate source files, which is to be tackled by SYCL)
- Subgroups are now a core functionality. This enables finer grain control of hardware threading.
- New function clCloneKernel enables copying of kernel objects and state for safe implementation of copy constructors in wrapper classes. Hear all Java and .NET folks cheer?
- Low-latency device timer queries for alignment of profiling data between device and host code.
OpenCL 2.1 will be supported by AMD. Intel was very loud with support when the provisional specs got released, but gave no comments today. Other vendors did not release an official statement.
Khronos released SPIR-V 1.0 final specs
SPIR-V 1.0 can represent the full capabilities of OpenCL 2.1 kernels.
This is very important! OpenCL is not the only language anymore that is seen as input for GPU-compilers. Neither is OpenCL hostcode the only API that can handle the compute shaders, as also Vulkan can do this. Lots of details still have to be seen, as not all SPIRV compilers will have full support for all OpenCL-related commands.
With the specs the following tools have been released:
- A bi-directional translator between LLVM to SPIR-V to enable flexible use of both intermediate languages in tool chains.
- An OpenCL C to LLVM compiler that generates SPIR-V through the above translator, as Clang can compile OpenCL 1.2/2.0 C kernels.
- A SPIR-V assembler and disassembler.
SPIRV will make many frontends possible, giving co-processor powers to every programming language that exists. I will blog more about SPIRV possibilities the coming year.
Intel claims OpenMP is up to 10x faster than OpenCL
The below image appeared on Twitter, claiming that OpenMP was much faster than OpenCL. Some discussion later, we could conclude they compared apples and oranges. We’re happy to peer-review the results, putting the claims in a full perspective where MKL and operation mode is mentioned. Unfortunately they did not react, as <sarcasm>we will be very happy to admit that for the first time in history a directive language is faster than an explicit language – finally we have magic!</sarcasm>
We get back later this week on Intel and their upcoming Xeon+FPGA chip, if OpenCL is the best language for that job. It ofcourse is possible that they try to run OpenMP on the FPGA, but then this would be big surprise. Truth is that Intel doesn’t like this powerful open standard intruding the HPC market, where they have a monopoly.
AMD claims OpenCL is hardly used in HPC
Well, this is one of those claims that they did not really think through. OpenCL is used in HPC quite a lot, but mostly on NVidia hardware. Why not just CUDA there? Well, there is demand for OpenCL for several reasons:
- Avoid vendor lock-in.
- Making code work on more hardware.
- General interest in co-processors, not specific one brand.
- Initial code is being developed on different hardware.
Thing is that NVidia did a superb job in getting their processors in supercomputers and clouds. So OpenCL is mostly run on NVidia hardware and a therefore the biggest reason why that company is so successful in slowing the advancement of the standard by rolling out upgrades 4 years later. Even though I tried to get the story out, NVidia is not eager to tell the beautiful love story between OpenCL and the NVidia co-processor, as the latter has CUDA as its wife.
Also at HPC sites with Intel XeonPhi gets OpenCL love. Same here: Intel prefers to tell about their OpenMP instead of OpenCL.
AMD has few HPC sites and indeed there is where OpenCL is used.
No, we’re not happy that AMD tells such things, only to promote its own new languages.
CUDA goes AMD and open source
AMD now supports CUDA! The details: they have made a tool that can compile CUDA to “HiP” – HiP is a new language without much details at the moment. Yes, I have the same questions as you are asking now.
Also Google joined in and showed progress on their open source implementation of CUDA. Phoronix is currently the best source for this initative and today they shared a story with a link to slides from Google on the project. the results are great up: “it is to 51% faster on internal end-to-end benchmarks, on par with open-source benchmarks, compile time is 8% faster on average and 2.4x faster for pathological compilations compared to NVIDIA’s official CUDA compiler (NVCC)”.
For compiling CUDA in LLVM you need three parts:
- a pre-processor that works around the non-standard <<<…>>> notation and splits off the kernels.
- a source-to-source compiler for the kernels.
- an bridge between the CUDA API and another API, like OpenCL.
Google has done most of this and now focuses mostly on performance. The OpenCL community can use this to use this project to make a complete CUDA-to-SPIRV compiler and use the rest to improve POCL.
Khronos gets a more open homepage
Starting today you can help keeping the Khronos webpage more up-to-date. Just put a pull request at https://github.com/KhronosGroup/Khronosdotorg and wait until it gets accepted. This should help the pages be more up-to-date, as you can now improve the webpages in more ways.
AMD released HCC, a C++ language with OpenMP built-in that doesn’t compile to SPIRV.
There have been tutorials and talks on OpenCL, which I should have shared with you earlier.
Tomorrow another post with more news. If I forgot something on Sunday or Monday, I’ll add it here.