Nothing worth having comes easy - "Dr. Kelso" — Quote by Dr. Kelso (from the series “Scrubs”) – click for video

OpenCL is getting more and more important and for more developers a skill worth having. At StreamHPC we saw this coming in 2010 and have been training people in OpenCL since. A few weeks ago I got a question on how to take on OpenCL, which could be interesting for more people: how to take on OpenCL. In other words: the steps to take to learn OpenCL the quickest. Since the last time I wrote on learning OpenCL is almost two years ago, it is a good time to share more recent insights on this matter.

Taking on OpenCL takes four main steps in this order:

Understanding the hardware and architectures.
Thinking both in parallel and in vectors.
Learning the OpenCL language itself.
Profiling and debugging.

You see that is a whole difference from learning for instance Java with a Pascal-background. Learning VHDL for programming FPGAs comes closer, though you don’t need to tinker with timings when doing OpenCL. Let’s go through the steps.

1 – Understanding the new hardware

Imaging that 600 mini-cores are different than 4 big cores is only a small part.

A graphics card (GPU) is totally different than a CPU (the normal processor), with different caches, memories and buses. The difference between various GPU-architectures is also quite big. Same for the vector-extensions on the CPU (SSE, AVX, NEON and such), which are quite different from the normal instructions CPU-programmers are used to. These possibilities of the processor are hidden and only be used when the compiler sees ways to use them.

Most people want to skip this step as OpenCL is C-based and thus should it be very comparable. The familiarity in language is only a convenience, not a . Compare it to the type of manager who knows all the buzz-words, and therefore wants to have his say – but he just does not get it.

This is a main part of our trainings and not much shared via the blog. But most information on hardware is found on the web, or is available via the OpenCL-command clGetDeviceInfo.

2 – Thinking in parallel and in vectors

This step is the actual tinkering of the brain to end up being able to write good kernels.

How to think in parallel instead of in loops, I have discussed in previous blogs and are part of the series ‘programming theories‘. There is still a lot that can be said and this series is therefore not finished.

Very fast operations like dot-product and cross-products we have last used in university Math-class or in OpenGL. So graphics-programmers have an advantage here, and others need to get the math-books from the attic. Try “Erwin Kreyszig – Advanced Engineering Mathematics” if you threw away all your math-memories.

3 – Learning the OpenCL language itself

OpenCL as a programming language in a few sentences. It is an extension of C with a host-part and a kernel-part. The host-part prepares all communication and launches the kernels. The kernels do the actual work and have special extensions for vectors and accessing the different memories.

This step is best done by doing – hands on the keyboard. Once you have succeeded the above two steps, this one will come naturally. In the OpenCL specifications all commands are explained. Also do all vendors ship SDKs with loads of examples, so you can see the commands in context. At this site there is a self-study section, where you can find even more resources.

Most important rule in this step is when you don’t get it, get back to the base (steps 1 and 2).

Instead of OpenCL, you can fill in OpenACC, CUDA or DirectCompute here.

4 – Profiling and debugging

Getting to know where the bottlenecks are in the large software that needs to be sped up and in the software you wrote, you need some tools. Tools for profiling and debugging are provided by the vendor of the hardware – unluckily there is hardly any tool that works with all hardware.

As you can hit driver-bugs, these tools are certainly needed. But to recognise real problems, you need to know what you are doing and what you can expect (step 1). Starting too soon with tools gives you the idea you see it, but develops the understanding quite slower.

Other ways

What can be done with CPU-languages is trial-on-error and just code while having the debugger at the other screen. With GPU-languages this is not a good route. The reason is not because it is a GPU is different from a CPU, but because you have experience with a CPU and will make assumptions.

Going form theoretical understanding to hands-on is the fastest way to get to performing software. In practise it goes often in rounds, getting back to step 1 and then starting the profiler again with the theories in mind. Most developers just want to program instead of getting things explained. For that reason our trainings do have been updated this year to get a better balance between the hands-on and the “dull” theories.

What are your ideas on training OpenCL? And if you learned OpenCL already, how did you keep it both exciting and effective?

StreamHPC communications

1 – Understanding the new hardware

2 – Thinking in parallel and in vectors

3 – Learning the OpenCL language itself

4 – Profiling and debugging

Other ways

Discover more from StreamHPC