OpenCL and CUDA programming training in Amsterdam

As it has been very busy here, we have not done public trainings for a long time. This year we’re opening our Amsterdam offices again to train the future GPU-developers. For now it’s one date, but we’ll add more dates in this blog-post later on.

From Monday 23 to Thursday 26 March 2020 you’re welcome to get training in the architecture and writing of efficient GPU software using OpenCL and CUDA. The dates are subject to change to allow those interested to suggest another date – date will be definite on 14 February.

If you need to learn solid GPU programming, this is the training you should attend. The concepts can be applied to other GPU-languages too, which makes it a good investment for any probable future where GPUs exist.

This is a public training, which means there are attendees from various companies. If you prefer not to be in a public class, get in contact to learn more about our in-company trainings.

It includes:

  • Four days of training in Amsterdam, including coffee, tea, snacks, fruit and lunch;
  • Free code-review after the training, to get feedback on what you created with the new knowledge;
  • 1 month of limited support, so you can avoid StackOverflow;
  • Certificate.

Trainings will be done by employees of Stream HPC, who all have a lot of experience with applying the techniques you are going to learn.


Most trainings have around 40% lectures, 50% lab-sessions and 10% discussions.


  • The training is guaranteed to take place. If you are the only one, you’ll simply get personal training.
  • The below schedule is indicative when it comes to lab-sessions – some lab-sessions can be (partly) skipped or replaced if time is getting too limited.

Day 1: OpenCL/CUDA Foundations

This is close to our standard OpenCL crash course. We start with the basic concepts, write our fist OpenCL program, discuss the architectures, discuss the difficulties of GPU-programming, compare to CUDA and C++, and end with writing simple code that runs on a laptop (CPU or GPU).

  • OpenCL/CUDA model
  • OpenCL language
  • CUDA language
  • Memory objects
  • General hardware overview
  • Task-parallelism and data-parallelism
  • Mapping code to CPUs and GPUs
  • Comparison to other languages like HIP, SYCL, OpenMP and OpenACC

Day 2 + 3: Optimise a program from scratch

During the day, we will increase the level of requirements and touch all important aspects of OpenCL-programming.

As we have several GPU-servers available for developing, we can provide you with login-credentials, a git-account and a short how-to for using the extra GPUs from NVidia and AMD, optionally Intel. This way you can use different graphic cards to find out which optimisations work and don’t work.

Optimisations we discuss during this day:

  • Host-code
  • Data-flow
  • Memory handling
  • Data transfer speed increase
  • Memory alignments
  • Scheduling
  • Parallelism increase
  • Latency reduction
  • The most important kernel-optimisations from the different vendors

Lab sessions

During the days we use various lab-sessions to support the explained theory. We’ll discuss most of the following:

  • Clinfo
  • ColorBalance
  • Matrix-multiplication
  • Convolution
  • Histogram
  • Contrast Stretching
  • Frame correlation
  • Fixing non-optimal and broken code

Day 4: Tools, special subjects and final project


There are various tools you need to understand to get code that runs well on GPUs. This day you will learn to use various vendor-provided and open source tools to help you analyse your code.

  • Software correctness: data-races.
  • Profiling: timing and finding hot-spots
  • Reporting: let tools create useful reports.
  • Debugging: finding bugs and learning what actually happens.

The tools are discussed and used along the following four subjects:

  • Software correctness: data-races.
  • Profiling: timing and finding hot-spots
  • Reporting: let tools create useful reports.
  • Debugging: finding bugs and learning what actually happens.

We only discuss tools last, as you need to understand the concepts before having something solve it for you.

Special subjects

Often these are different per training, as these are defined by the attendees of the training. Subjects that have been discussed more frequently:

  • Splitting work over CPU&GPU: Running different kernels on CPU and GPU, to make maximum use of the whole computer.
  • GL-CL interop: Understanding how interoperability with OpenGL works. With this the results can be shown on the screen with minimal latency.
  • Optimising data-throughput when using multiple kernels.

Final Project

The final project you will need to use all you’ve learnt and try to get the fastest code from class.


  • Fixing non-optimal and/or broken code


We will send a questionnaire to understand the needs of each trainee. For larger groups, we also have a separate phone call with the representative.


Attendees need to bring their own laptops for the lab sessions. The only requirement is for the laptops to be equipped with an OpenCL capable CPU or GPU and OpenCL drivers are correctly installed. A complete list with OpenCL compliant devices can be found here. Regarding the software, laptops need to have installed the following software:

  • cmake 3.1 or higher: lab-sessions are available in cmake, so almost all IDEs are supported.
  • An IDE or text-editor with coding-support.
  • One or more OpenCL SDKs, for each OpenCL-device in the computer.
  • A C/C++ compiler suite. Examples of supported suites are Microsoft Visual Studio, Apple Xcode and GNU GCC/G++. We will send a small project in advance, which can be used to test compilers.
  • ssh/putty: optional. Needed when working at StreamHPC’s GPU servers.
  • git: optional. When easy transfer of lab-work between several computers/servers is needed, this is required. Lab-sessions can also be downloaded as zip-files.


We prefer to focus on the core of the training, and therefore we ask a set of skills to be there. This avoids the others needing to wait.

Attendees are required to have intermediate programming experience and good C/C++ knowledge. This means that you should at least be able to write an application in C/C++ from scratch, high-level debug it with GDB and be very (!) comfortable working with pointers. When this is not the case, do contact us to provide pre-training material.

Info Amsterdam training


The costs are €2500. Additional days for personal training and consultancy are excluded – please ask us for a quote.

Nearby Hotels

At walking distance there are various hotels. In random order:

We have good experience with the Amsterdam ID Aparthotel, which is nearby, affordable and can even provides a kitchen in the apartment. Holiday Inn Express is next to the noisy railway station, so select only as last resort. There are three restaurants in the area, but many more in the city centre. On busy weeks AirBnB can offer good options too.

If you prefer a hotel in the city-center (central station is only 6 minutes by train) or want us to book one of the hotels, give a call to provide your preferences and payment details, and we’ll arrange everything for you. Costs for full handling is €100 per group.

Reserve your spot today!

If you’re interested, do fill in the pre-training questionnaire already.

Initiating the reservation can be done by email or phone.
Phone: +31 854865760

Related Posts


Start your GPU-career here

...  Focus in on fully understanding recent GPU-architectures, CUDA and OpenCL. It will consist of lectures, workshops, discussions, paper ...