Asynchronous and parallel programming in C++26

Posted by Chris Tsiaousis on 11 June 2026 with 0 Comment

For the past decade, every C++ programmer who wanted to do real concurrent work has had the same conversation with themselves. std::async is a toy. std::thread is too low-level. std::future doesn’t compose. So you reach for TBB, or another third-party library, or a thread pool you wrote in 2017, or std::coroutine that you have to wire up to an executor you also wrote yourself.

C++26 finally puts an answer in the standard library: std::execution, which is C++’s execution control library, also known as senders/receivers, also known as P2300. It is the largest single addition to the language since modules, and it is going to take the community a few years to digest. This post is a tour. We’ll start with the parallel algorithm policies you may already know, build up to senders/receivers, schedulers, cancellation, and end with a worked example that ties everything together.

What is `std::execution`

std::execution defines a model for describing asynchronous work and provides a small set of generic algorithms for composing those descriptions into larger ones. The actual execution is delegated to a scheduler, which is a handle to whatever resource happens to run the work: a thread pool, a GPU queue, an I/O reactor, or the calling thread itself.

The three core abstractions are:

A scheduler is a lightweight handle that says where work runs.
A sender is a lazy description of what work runs. It produces one of three completion signals: a value, an error, or “stopped” (cancelled).
A receiver is the callback object that consumes those signals. You almost never write one by hand; the algorithms wire them up for you.

Continue reading →

NVIDIA: mobile phones, tablets and HPC (cloud)

Posted by Vincent Hindriksen on 12 May 2012

If you want to see what is coming up in the market of consumer-technology (PC, mobile and tablet), then NVIDIA can tell you the most. The company is very flexible, and shows time after time it really knows in which markets is currently operates and can enter. I sometimes strongly disagree with their marketing, but watch them closely as they are in the most important markets to define the near future in: PCs, Mobile/Tablet and HPC.

You might think I completely miss interconnects (buses between processors, devices and memory) and memory-technologies as clouds have a large need for high-speed data-transport, but the last 20 years have shown that this is a quite stable developing market based on IP-selling to the hardware-vendors. With the acquisition of Cray’s interconnect technology, we have seen this is serious business for Intel, so things might change indeed. For this article I want to focus on NVIDIA’s choices.

Continue reading “NVIDIA: mobile phones, tablets and HPC (cloud)” →

Let’s enter the Top500 HPC list using GPUs

Posted by Vincent Hindriksen on 6 June 2010 with 1 Comment

The #500 super-computer has only 24 TFlops (2010-06-06): http://www.top500.org/system/9677

update: scroll down to see the best configuration I have found. In other words: a cluster with at least 30 nodes with 4 high-end GPUs each (costing almost €2000,- per node and giving roughly 5 TFlops single precision, 1 TFLOPS double precision) would enter the Top500. 25 nodes to get to a theoretic 25TFlops and 5 extra for overcoming the overhead. So for about €60 000,- of hardware anyone can be on the list (and add at least €13 000 if you want to use Windows instead of Linux for some reason). Ok, you pay most for the services and actual building when buying such a cluster, but you get the idea it does not cost you a few millions any more. I’m curious: who is building these kind of clusters? Could you tell me the specs (theoretical TFlops, LinPack TFlops and watts/TFlop) of your (theoretical) cluster, which costs the customer less then €100 000,- in total? Or do you know companies who can do this? I’ll make a list of companies who will be building the clusters of tomorrow, the “Top €100.000,- HPC cluster list”. You can mail me via vincent [at] this domain, or put your answer in a comment.

Update: the hardware shopping-list

Nobody told in the remarks it is easy to build a faster machine than the one described above. So I’ll do it. We want the most flops per box, so here’s the wishlist:

A motherboard with as many slots as possible for PCI-E, CPU-sockets and memory-banks. This because the lag between the nodes is high.
A CPU with at least 4 cores.
Focus on the bandwidth, else we will not be able to use all power.
Focus on price per GFLOPS.

The following is what I found in local computer stores (which for some reason people there love to talk about extreme machines). AMD currently has the graphics cards with the most double precision power, so I chose for their products. I’m looking around for Intel + Nvidia, but currently they are far behind. Is AMD back on stage after being beaten by Intel’s Core-products for so many years?

The GigaByte GA-890FXA-UD7 (€245,-) has 1 AM3-socket, 6(!) PCI-e slots and supports up to 16GB of memory. We want some power, so we use the AMD Phenom II X6 1090T (€289,-), which I chose for the 6 cores and the low price per FLOPS. And to make it a monster, we add 6 times a AMD HD5970 (€599,-) giving 928 x 6 = 3264 DP-GLOPS. If it can handle 16GB DDR3 (€750,-), so we put it in. It needs about 3 Power-supplies of 700 Watt (€100,-). We add 128GB SSD (€350,-) for working data and a big 2 TB HDD (€100,-). Case needs to house the 3 power supplies (€100,-). Cooling is important and I suggest you compete with a wind-tunnel (€500,-). It will cost you €6228,- for 5,6 Double Precision TFLOPS, and 27 TFLOPS single precision. A cluster would be on the HPC500-list for around €38000,- (pure hardware-price, not taking network-devices too much into account, nor the price for man-hours).

Disclaimer: this is the price of a single node, excluding services, maintenance, software-installation, networking, engineering, etc. Please note that the above price is pure for building a single node for yourself, if you have the knowledge to do so.

Tag: HPC

What is std::execution

Continue reading “NVIDIA: mobile phones, tablets and HPC (cloud)” →

Update: the hardware shopping-list

What is `std::execution`