The rise of the GPGPU-compilers

Painting "High Rise" made by Huma Mulji
Painting “High Rise” made by Huma Mulji

If you read The Disasters of Visual Designer Tools you’ll find a common thought about easy programming: many programmers don’t learn the hard-to-grasp backgrounds any more, but twiddle around and click a program together. In the early days of BASIC, you could add Assembly-code to speed up calculations; you only needed tot understand registers, cache and other details of the CPU. The people who did that and learnt about hardware, can actually be considered better programmers than the C++ programmer who completely relies on the compiler’s intelligence. So never be happy if the control is taken out of your hands, because it only speeds up the easy parts. An important side-note is that recompiling easy readable code with a future compiler might give faster code than your optimised well-engineered code; it’s a careful trade-off.

Okay, let’s be honest: OpenCL is not easy fun. It is more a kind of readable Assembly than click-and-play programming. But, oh boy, you learn a lot from it! You learn architectures, capabilities of GPUs, special purpose processors and much more. As blogged before, OpenCL probably is going to be the hidden power for non-CPUs wrapped in something like OpenMP.


The first stage of compiler-development for GPUs would be some special language to a shader-language like OpenGL. Since awareness of limits gets known in time, the optimisations would eventually pass by the shader-language. Also the special language would get easier to program in time. I.e. in CUDA 3 you see that in a few years the language got faster in many aspects, and also at the driver-side there have been many changes. The second step (what’s happening now) is automatic recognition of possible to-be-accelerated parts of code. We know that OpenMP-enriched programs can be rewritten to run (partly) on the GPU so now only the compiler has to be told in which cases and how. OpenCL – being an open standard – is perfect for an intermediate language, also for debugging purposes; an intermediate language we need, since for each new hardware the optimisations are different. Such a study has been done using CUDA: “OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization[PDF], by Seyong Lee, Seung-Jai Min and Rudolf Eigenmann.

I want to stress that understanding hardware architectures stays important for GPGPU and any other programming language with performance in mind. In 2010 and 2011 you’ll still see OpenCL in the light and before you know it’s all hidden in libraries and handled by the compiler; so learn! The only difference is that the programmer still must understand where the software is to be used on.

Current GPGPU compilers

The reason I wanted to take a week time to find all GPGPU-compilers is that I got some 404-pages and projects with no changes in a few months time. Where did those projects go? Actually the techniques are so much dependent on driver-optimisations, that most projects are abandoned or now target CUDA or OpenCL instead of a shader-language – I did not expect this to be already happening or more gradually, even if I wrote it above. There are more and more wrappers and compilers which target OpenCL or CUDA, for which a separate “wrappers and libraries”-post will be written for. So we have the following projects left:

  • Nvidia CUDA: Currently the most grown-up GPGPU-compiler, started in 2007. It uses a LLVM-based compiler.
  • Khronos Group OpenCL: The industry standard for GPGPU working in a growing number of processors (AMD, Nvidia, ARM, IBM Cell). LLVM, GCC and other compilers can be used.
  • PathScale ENZO: In collaboration with CAPS (with their HMPP-compiler) they want to put an alternative in the market. Sold for $2495,-
  • CodePlay Offload: only targets IBM’s Cell-processor.
  • Sh: When RapidMind was acquired by Intel, the code for targetting GPUs was opensource(d).
  • Microsoft DirectCompute: It actually is mostly on top of CUDA and OpenCL, but since it’s nicely integrated into DirectX I can’t leave it out.

If you’re missing a project, please let me know.

Disclaimer: PrimeCortex/StreamHPC gives trainings in GPGPU/OpenCL. We like to show you how to learn GPGPU and OpenCL and like to share our enthusiasm via this blog, but of course our training is more thorough.


Related Posts

Felix Fernandez's

SSEx, AVX, FMA and other extensions through OpenCL

...  processors. I talked about this subject before in "The rise of the GPGPU compiler". Virtual machines Java started in 1996 with the ...

  • I mostly agree but I’m not sure are those libraries will be generated by compiler or just normal c++ libraries. Anyway I started an experiment with scripting language for OpenCL and will see how it goes.

  • OpenCL is a clear target and an open standard; it doesn’t matter if it’s binary or human readable. I don’t see the difference between a compiler-as-a-library and a stand-alone compiler, since they serve the same goal.
    I tried to open the discussion how OpenMP and OpenCL would merge, but most liked the part I think OpenCL is somewhat Spartian. Thank you Khronos-group.

  • Jacket includes a JIT compiler for GPUs and to our knowledge, it is the only JIT for GPUs outside of small research projects. Jacket’s JIT currently emits PTX code for NVIDIA chips – this is the major feature that we just released in Jacket 1.4, whereas Jacket 1.3 are earlier emitted CUDA code. Not sure if you wanted to include JIT compilers in your list.

    Currently, Jacket’s JIT capabilities are only available to MATLAB programmers, but by the end of the year, we’ll have a C/C++ version available.

    -John Melonakos, CEO, AccelerEyes

  • There is also Brook+, unfortunately it seems no longer actively developed, as AMD is promoting only OpenCL. OpenCL is good. I’m still awaiting to see good wrappers emerged suitable for wider range of developers and its performance comparison with CUDA.

    • xman, the article is about actively developed software, so Brook+ is not included for that reason. Performance is theoretically the same as CUDA, but NVIDIA’s drivers are currently optimised for CUDA. Once my dual GPU (one AMD, one NVIDIA) is in I can show you some benchmarks. The wrappers are getting better; I discuss a great deal of length in my book about them and an article about it is to be expected in October.

  • Hi Vincent, that’s great. Awaiting to see more benchmarks on CUDA and OpenCL. Many people are still wondering if it’s a good time to use OpenCL as it’s still at its early stage.