Asynchronous and parallel programming in C++26

Posted by Chris Tsiaousis on 11 June 2026 with 0 Comment

For the past decade, every C++ programmer who wanted to do real concurrent work has had the same conversation with themselves. std::async is a toy. std::thread is too low-level. std::future doesn’t compose. So you reach for TBB, or another third-party library, or a thread pool you wrote in 2017, or std::coroutine that you have to wire up to an executor you also wrote yourself.

C++26 finally puts an answer in the standard library: std::execution, which is C++’s execution control library, also known as senders/receivers, also known as P2300. It is the largest single addition to the language since modules, and it is going to take the community a few years to digest. This post is a tour. We’ll start with the parallel algorithm policies you may already know, build up to senders/receivers, schedulers, cancellation, and end with a worked example that ties everything together.

What is `std::execution`

std::execution defines a model for describing asynchronous work and provides a small set of generic algorithms for composing those descriptions into larger ones. The actual execution is delegated to a scheduler, which is a handle to whatever resource happens to run the work: a thread pool, a GPU queue, an I/O reactor, or the calling thread itself.

The three core abstractions are:

A scheduler is a lightweight handle that says where work runs.
A sender is a lazy description of what work runs. It produces one of three completion signals: a value, an error, or “stopped” (cancelled).
A receiver is the callback object that consumes those signals. You almost never write one by hand; the algorithms wire them up for you.

Continue reading →

The 13 application areas where OpenCL and CUDA can be used

Posted by Vincent Hindriksen on 3 June 2013 with 16 Comments

visitekaartje-achter-2013-V — Did you find your specialism in the list? The formula is the easiest introduction to GPGPU I could think of, including the need of auto-tuning.

Which algorithms map is best to which accelerator? In other words: What kind of algorithms are faster when using accelerators and OpenCL/CUDA?

Professor Wu Feng and his group from VirginiaTech took a close look at which types of algorithms were a good fit for vector-processors. This resulted in a document: “The 13 (computational) dwarves of OpenCL” (2011). It became an important document here in StreamHPC, as it gave a good starting point for investigating new problem spaces.

The document is inspired by Phil Colella, who identified seven numerical methods that are important for science and engineering. He named “dwarves” these algorithmic methods. With 6 more application areas in which GPUs and other vector-accelerated processors did well, the list was completed.

As a funny side-note, in Brothers Grimm’s “Snow White” there were 7 dwarves and in Tolkien’s “The Hobbit” there were 13.

Continue reading →

Intel OpenCL CPU-drivers 2013 beta with OpenCL 1.2 support

Posted by Vincent Hindriksen on 7 August 2012 with 7 Comments

38744 — Screenshot from Intel’s “God Rays” demo

This article is still work-in-progress

Intel has just released its OpenCL bit CPU-drivers, version 2013 bèta. It has support for OpenCL 1.1 (not 1.2 as for the CPU) on Intel HD Graphics 4000/2500 of the 3rd generation Core processors (Windows only). The release notes mention support for Windows 7 and 8, but the download-site only mentions windows 8. Support under Linux is limited to 64 bits.

The release notes mention:

General performance improvements for many OpenCL* kernels running on CPU.
Preview Tool: Kernel Builder (Windows)
Preview Feature: support of kernel source code hotspots analysis with the Intel VTuneT Amplifier XE 2011 update 3 or higher.
The GNU Project Debugger (GDB) debugging support on Linux operating systems.
New OpenCL 1.2 extensions supported by the CPU device:
- cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
- cl_khr_fp16
- cl_khr_gl_sharing
- cl_khr_gl_event
- cl_khr_d3d10_sharing
- cl_khr_dx9_media_sharing
- cl_khr_d3d11_sharing.
OpenCL 1.1 extensions that were changed in OpenCL 1.2:
- Device Fission supports both OpenCL 1.1 EXT API’s and also OpenCL* 1.2 fission core features
- Media Sharing support intel 1.1 media sharing extension and also the 1.2 KHR media sharing extension
- Printf extension is aligned with OpenCL 1.2 core feature.

Check the release notes for full information.

The drivers can be found on http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk-2013/. Installation is simple. For Windows there is a installer. If you have Linux, make sure you remove any previous version of Intel’s openCL drivers. If you have a Debian-based Linux, use the command ‘alien’ to convert the rpm to deb, and make sure ‘libnuma1‘ is installed. There are requirements for libc 2.11 or 2.12 – more information on that later as Ubuntu 12.04 has libc6 2.15.

Continue reading “Intel OpenCL CPU-drivers 2013 beta with OpenCL 1.2 support” →

How expensive is an operation on a CPU?

Posted by Vincent Hindriksen on 16 July 2012 with 13 Comments

Programmers know the value of everything and the costs of nothing. I saw this quote a while back and loved it immediately. The quote by Alan Perlis is originally about ~~Perl~~ LISP-programmers, but only highly trained HPC-programmers seem to have obtained this basic knowledge well. In an interview with Andrew Richards of Codeplay I heard it from another perspective: software languages were not developed in a time that cache was 100 times faster than memory. He claimed that it should be exposed to the programmer what is expensive and what isn’t. I agreed again and hence this post.

I think it is very clear that programming languages (and/or IDEs) need to be redesigned to overcome the hardware-changes of the past 5 years. I talked about that in the article “Separation of compute, control and transfer” and “Lots of loops“. But it does not seem to be enough.

So what are the costs of each operation (on CPUs)?

This article is just to help you on your way, and most of all: to make you aware. Note it is incomplete and probably not valid for all kinds of CPUs.

Continue reading “How expensive is an operation on a CPU?” →

Basic Concepts: online kernel compiling

Posted by Vincent Hindriksen on 28 October 2011 with 1 Comment

Typos are a programmers worst nightmare, as they are bad for concentration. The code in your head is not the same as the code on the screen and therefore doesn’t have much to do with the actual problem solving. Code highlighting in the IDE helps, but better is to use the actual OpenCL compiler without running your whole software: an Online OpenCL Compiler. In short is just an OpenCL-program with a variable kernel as input, and thus uses the compilers of Intel, AMD, NVidia or whatever you have installed to try to compile the source. I have found two solutions, which both have to be built from source – so a C-compiler is needed.

CLCC. It needs the boost-libraries, cmake and make to build. Works on Windows, OSX and Linux (needs possibly some fixes, see below).
OnlineCLC. Needs waf to build. Seems to be Linux-only.

Continue reading “Basic Concepts: online kernel compiling” →

Kernels and the GPL. Are we safe and linking?

Posted by Vincent Hindriksen on 19 October 2011

Disclaimer: I am not a lawyer and below is my humble opinion only. The post is for insights only, not for legal matters.

GPL was always a protection that somebody or some company does not run away with your code and makes the money with it. Or at least force that improvements get back into the community. For unprepared companies this was quite some stress when they were forced to give their software away. Now we have host-kernels-languages such as OpenCL, CUDA, DirectCompute, RenderScript don’t really link a kernel, but load it and launch it. As GPL is quite complicated if it comes to mixing with commercial code, I try to give a warning that GPL might not be prepared for this.

If your software is dual-licensed, you cannot assume the GPL is not chosen when eventually used in commercial software. Read below why not.

I hope we can have a discussion here, so we get to the bottom of this.

Continue reading “Kernels and the GPL. Are we safe and linking?” →

Basic Concepts: OpenCL Convenience Methods for Vector Elements and Type Conversions

Posted by Vincent Hindriksen on 18 October 2011

In the series Basic Concepts I try to give an alternative description to what is said everywhere else. This time my eye fell on alternative convenience methods in two cases which were introduced there to be nice to devs with i.e. C/C++ and/or graphics backgrounds. But I see it explained too often from the convenience functions and giving the “preferred” functions as a sort of bonus which works for the cases the old functions don’t get it done. Below is the other way around and I hope it gives better understanding. I assume you have read another definition, so you see it from another view not for the first time.

Continue reading “Basic Concepts: OpenCL Convenience Methods for Vector Elements and Type Conversions” →

Installing both NVidia GTX and AMD Radeon on Linux for OpenCL

Posted by Vincent Hindriksen on 12 October 2011 with 6 Comments

August 2012: article has been completely rewritten and updated. For driver-specific issues, please refer to this article.

Want to have both your GTX and Radeon working as OpenCL-devices under Linux? The bad news is that attempts to get Radeon as a compute device and the GTX as primary all failed. The good news is that the other way around works pretty easy (with some luck). You need to install both drivers and watch out that libglx.so isn’t overwritten by NVidia’s driver as we won’t use that GPU for graphics – this is also the reason why it is impossible to use the second GPU for OpenGL.

Continue reading “Installing both NVidia GTX and AMD Radeon on Linux for OpenCL” →

AMD OpenCL coding competition

Posted by Vincent Hindriksen on 3 October 2011 with 1 Comment

The AMD OpenCL coding competition seems to be Windows 7 64bit only. So if you are on another version of Windows, OSX or (like me) on Linux, you are left behind. Of course StreamHPC supports software that just works anywhere (seriously, how hard is that nowadays?), so here are the instructions how to enter the competition when you work with Eclipse CDT. The reason why it only works with 64-bit Windows I don’t really get (but I understood it was a hint).

I focused on Linux, so it might not work with Windows XP or OSX rightaway. With little hacking, I’m sure you can change the instructions to work with i.e. Xcode or any other IDE which can import C++-projects with makefiles. Let me know if it works for you and what you changed.

Continue reading “AMD OpenCL coding competition” →

Qt Creator OpenCL Syntax Highlighting

Posted by Vincent Hindriksen on 7 July 2011

With highlighting for Gedit, I was happy to give you the convenience of a nice editor to work on OpenCL-files. But it seems that one of the most popular IDEs for C++-programming is Qt Creator. So you receive another free syntax highlighter. You need at least Qt Creator 2.1.0.

The people of Qt have written everything you need to know about their Syntax highlighting, which was enough help to create this file. You see that they use the system of Kate, so logically this file works with this editor too.

In this article there is all you need to know to use Qt Creator with OpenCL.

Installing

First download the file to your computer.

Under Windows and OSX you need to copy this file to the directory shareqtcreatorgeneric-highlighter in the Qt installation dir (i.e. c:Qtqtcreator-2.2.1shareqtcreatorgeneric-highlighter). Under Linux copy this file to ~/.kde/share/apps/katepart/syntax or to /usr/share/kde4/apps/katepart/syntax (all users). That’s all, have fun!

Install OpenCL on Debian, Ubuntu and Mint orderly

Posted by Vincent Hindriksen on 24 June 2011 with 22 Comments

If you read different types of manuals how to compile OpenCL software on Linux, then you can get dizzy of all the LD-parameters. Also when installing the SDKs from AMD, Intel and NVIDIA, you get different locations for libraries, header-files, etc. Now GPGPU is old-fashioned and we go for heterogeneous programming, the chances get higher you will have more SDKs on your machine. Also if you want to keep it the way you have, reading this article gives you insight in what the design is after it all. Note that Intel’s drivers don’t give OpenCL support for their GPUs, but CPUs only.

As my mother said when I was young: “actually cleaning up is very simple”. I’m busy creating a PPA for this, but that will take some more time.

First the idea. For developers OpenCL consists of 5 parts:

GPUs-only: drivers with OpenCL-support
The OpenCL header-files
Vendor specific libraries (needed when using -lOpenCL)
libOpenCL.so -> a special driver
An installable client driver

Currently GPU-drivers are always OpenCL-capable, so you only need to secure 4 steps. These are discussed below.

Please note that in certain 64-bit distributions there is not lib64, but only ‘lib’ and ‘lib32’. If that is the case for you, you can use the commands that are mentioned with 32-bit.

Continue reading “Install OpenCL on Debian, Ubuntu and Mint orderly” →

Intel’s OpenCL SDK examples for GCC

Posted by Vincent Hindriksen on 20 June 2011 with 3 Comments

Update august 2012: There is a new post for the latest Linux examples.

Note: these patches won’t work anymore! You can learn from the patches how to fix the latest SDK-code for GCC and Linux/OSX.

Code-examples are not bundled with the Linux OpenCL SDK 1.1 beta. Their focus is primarily Windows, so VisualStudio seems to be a logical target. I just prefer GCC/LLVM which you can get to work with all OSes. After some time trying to find the alternatives for MS-specific calls, I think I managed. Since ShallowWater uses DirectX and is quite extensive, I did not create a patch for that one – sorry for that.

I had a lot of troubles getting the BMP-export to work, because serialisation of the struct added an extra short. Feedback (such as a correct BMP-export of a file) is very welcome, since I the colours are correct. For the rest: most warnings are removed and it just works – tested with g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 on 64 bit (llvm-g++-4.2 seems to work too, but not fully tested).

THE PATCHES ARE PROVIDED AS IS – NO WARRANTIES!

Continue reading “Intel’s OpenCL SDK examples for GCC” →

ImageJ and OpenCL

Posted by Vincent Hindriksen on 30 January 2011 with 5 Comments

For a customer I’m writing a plugin for ImageJ, a toolkit for image-processing and analysis in Java. Rick Lentz has written an OpenCL-plugin using JOCL. In the tutorial step 1 is installing the great OS Ubuntu, but that would not be the fastest way to get it going, and since JOCL is multi-platform this step should be skippable. Furthermore I rewrote most of the code, so it is a little more convenient to use.

In this blog-post I’ll explain how to get it up and running within 10 minutes with the provided information.

Continue reading “ImageJ and OpenCL” →

Engineering GPGPU into existing software

Posted by Vincent Hindriksen on 5 November 2010 with 1 Comment

At the Thalesian talk about OpenCL I gave in London it was quite hard to find a way to talk about OpenCL for a very diverse public (without falling back to listing code-samples for 50 minutes); some knew just everything about HPC and other only heard of CUDA and/or OpenCL. One of the subjects I chose to talk about was how to integrate OpenCL (or GPGPU in common) into existing software. The reason is that we all have built nice, cute little programs which were super-fast, but it’s another story when it must be integrated in some enterprise-level software.

Readiness

The most important step is making your software ready. Software engineering can be very hectic; managing this in a nice matter (i.e. PRINCE2) just doesn’t fit in a deadline-mined schedule. We all know it costs less time and money when looking at the total picture, but time is just against.

Let’s exaggerate. New ideas, new updates of algorithms, new tactics and methods arrive on the wrong moment, Murphy-wise. It has to be done yesterday, so testing is only allowed when the code will be in the production-code too. Programmers just have to understand the cost of delay, but luckily is coming to the rescue and says: “It is my responsibility”. And after a year of stress your software is the best in the company and gets labelled as “platform”; meaning that your software is chosen to include all small ideas and scripts your colleagues have come up “which are almost the same as your software does, only little different”. This will turn the platform into something unmanageable. That is a different kind of software-acceptance!

Continue reading “Engineering GPGPU into existing software” →

Qt Hello World

Posted by Vincent Hindriksen on 28 May 2010 with 1 Comment

The earlier blog-post was about how to use Qt Creator with OpenCL. The examples are all around Images, but nowhere a simple Hello World. So here it is: AMD’s infamous OpenCL Hello World in Qt. Thank’s to linuxjunk for glueing the parts together.



int main(int argc, char *argv[]) {

    // Define the kernel. Take a good look what it does.
    QByteArray prog(
    "#pragma OPENCL EXTENSION cl_khr_byte_addressable_store : enablen" 
    "__constant char hw[] = "Hello World from Qt!"; n" 
    "__kernel void hello(__global char * out) {n" 
    "  size_t tid = get_global_id(0); n" 
    "  out[tid] = hw[tid]; n" 
    "}n"
    );

     // Get a context on the first available GPU.
     QCLContext context;
     if (!context.create(QCLDevice::GPU))
         qFatal("Could not create OpenCL context");

     // Allocate 100 bytes of memory on the Host.
     size_t mem_size = 100;
     char* outH = new char[mem_size];
     // Allocate buffer on the Device.
     QCLBuffer outCL = context.createBufferHost(outH, sizeof(char) * mem_size,
                                                QCLMemoryObject::WriteOnly);

     // Compile program against device
      QCLProgram program = context.buildProgramFromSourceCode(prog);

     // Create a kernel object, tell it we are using the kernel called "hello".
     QCLKernel kernel = program.createKernel("hello");

     // Set the necessary global memory. In this case it is the buffer-size.
     kernel.setGlobalWorkSize(outCL.size(), 1);

     // Turn on profiling.
     QCLCommandQueue queue = context.commandQueue();
     queue.setProfilingEnabled(true);

     // Queue the kernel up to run.
     // Give it an argument which is the memory we allocated above.
     QCLEvent event = kernel(outCL);

     // Use the event object above to block until processing has completed.
     event.waitForFinished();

     // Timing only works with profiling on. runTime is unsigned.
     printf(" time '%u'n", event.runTime());

     // Read the results out of the shared memory area.
     outCL.read(outH, mem_size);

     // Write to screen.
     printf(" result = '%s'", outH);
}

Have fun!

Using Qt Creator for OpenCL

Posted by Vincent Hindriksen on 25 May 2010 with 2 Comments

More and more ways are getting available to bring easy OpenCL to you. Most of the convenience libraries are wrappers for other languages, so it seems that C and C++ programmers have the hardest time. Since a while my favourite way to go is Qt: it is multi-platform, has a good IDE, is very extensive, has good multi-core and OpenGL-support and… has an extension for OpenCL: ~~http://labs.trolltech.com/blogs/2010/04/07/using-opencl-with-qt~~ http://blog.qt.digia.com/blog/2010/04/07/using-opencl-with-qt/

Other multi-platform choices are Anjuta, CodeLite, Netbeans and Eclipse. I will discuss them later, but wanted to give Qt an advantage because it also simplifies your OpenCL-development. While it is great for learning OpenCL-concepts, please know that the the commercial version of Qt Creator costs at least €2995,- a year. I must also warn the plugin is still in beta.

streamhpc.com is not affiliated with Qt.

Getting it all

Qt Creator is available in most Linux-repositories: install packages ‘qtcreator’ and ‘qt4-qmake’. For Windows, MAC and the other Linux-distributions there are installers available: http://qt.nokia.com/downloads. People who are not familiar with Qt, really should take a look around on http://qt.nokia.com/.

You can get the source for the plugin QtOpenCL, by using GIT:

git clone http://git.gitorious.org/qt-labs/opencl.git QtOpenCL

See http://qt.gitorious.org/qt-labs/opencl for more information about the status of the project.

You can download it here: https://dl.dropbox.com/u/1118267/QtOpenCL_20110117.zip (version 17 January 2011)

Building the plugin

For Linux and MAC you need to have the ‘build-essentials’. For Windows it might be a lot harder, since you need make, gcc and a lot of other build-tools which are not easily packaged for the Windows-OS. If you’ve made a win32-binary and/or a Windows-specific how-to, let me know.

You might have seen that people have problems building the plugin. The trick is to use the options -qmake and -I (capital i) with the configure-script:

./configure -qmake <location of qmake 4.6 or higher> -I<location of directory CL with OpenCL-headers>

make

Notice the spaces. The program qmake is provided by Qt (package ‘qt4-qmake’), the OpenCL-headers by the SDK of ATI or NVidia (you’ll need the SDK anyway), or by Khronos. By example, on my laptop (NVIDIA, Ubuntu 32bit, with Qt 4.7):

./configure -qmake /usr/bin/qmake-qt4 -I/opt/NVIDIA_GPU_Computing_SDK_3.2/OpenCL/common/inc/

make

This should work. On MAC the directory is not CL, but OpenCL – I haven’t tested it if Qt took that into account.

After building , test it by setting a environment-setting “LD_LIBRARY_PATH” to the lib-directory in the plugin, and run the provided example-app ‘clinfo’. By example, on Linux:

export LD_LIBRARY_PATH=`pwd`/lib:$LD_LIBRARY_PATH

cd util/clinfo/

./clinfo

This should give you information about your OpenCL-setup. If you need further help, please go to the Qt forums.

Configuring Qt Creator

Now it’s time to make a new project with support for OpenCL. This has to be done in two steps.

First make a project and edit the .pro-file by adding the following:

LIBS += -L<location of opencl-plugin>/lib -L<location of OpenCL-SDK libraries> -lOpenCL -lQtOpenCL

INCLUDEPATH += <location of opencl-plugin>/lib/

<location of OpenCL-SDK include-files>

<location of opencl-plugin>/src/opencl/

By example:

LIBS += -L/opt/qt-opencl/lib -L/usr/local/cuda/lib -lOpenCL -lQtOpenCL

INCLUDEPATH += /opt/qt-opencl/lib/

/usr/local/cuda/include/

/opt/qt-opencl/src/opencl/

The following screenshot shows how it could look like:

Second we edit (or add) the LD_LIBRARY_PATH in the project-settings (click on ‘Projects’ as seen in screenshot):

/usr/lib/qtcreator:location of opencl-plugin>:<location of OpenCL-SDK libraries>:

By example:

/usr/lib/qtcreator:/opt/qt-opencl/lib:/usr/local/cuda/lib:

As you see, we now also need to have the Qt-creator-libraries and SDK-libraries included.

The following screenshot shows the edit-field for the project-environment:

Testing your setup

Just add something from the clinfo-source to your project:

printf("OpenCL Platforms:n"); 
QList platforms = QCLPlatform::platforms();
foreach (QCLPlatform platform, platforms) { 
   printf("    Platform ID       : %ldn", long(platform.platformId())); 
   printf("    Profile           : %sn", platform.profile().toLatin1().constData()); 
   printf("    Version           : %sn", platform.version().toLatin1().constData()); 
   printf("    Name              : %sn", platform.name().toLatin1().constData()); 
   printf("    Vendor            : %sn", platform.vendor().toLatin1().constData()); 
   printf("    Extension Suffix  : %sn", platform.extensionSuffix().toLatin1().constData());  
   printf("    Extensions        :n");
} QStringList extns = platform.extensions(); 
foreach (QString ext, extns) printf("        %sn", ext.toLatin1().constData()); printf("n");

If it gives errors during programming (underlined includes, etc), focus on INCLUDEPATH in the project-file. If it complaints when building the application, focus on LIBS. If it complaints when running the successfully built application, focus on LD_LIBRARY_PATH.

Ok, it is maybe not that easy to get it running, but I promise it gets easier after this. Check out our Hello World, the provided examples and http://doc.qt.nokia.com/opencl-snapshot/ to start building.

Tag: Programming

What is std::execution