Power to the Vector Processor

Reducing energy-consumption is “hot”

After reading this article “Nvidia is losing on the HPC front” by The Inquirer which mixes up the demand for low-power architectures with the other side of the market: the demand for high performance. It made me think that it is not that clear there are two markets using the same technology. Also Nvidia has proven it to be not true, since the super-computer “Nebuale” uses almost half the watts per flop as the #1. How come? I quote The Register from an article of one year old:

>>When you do the math, as far as Linpack is concerned, Jaguar takes just under 4 watts to deliver a megaflops at a cost of $114 per megaflops for the iron, while Nebulae consumes 2 watts per megaflops at a cost of $39 per megaflops for the system. And there is little doubt that the CUDA parallel computing environment is only going to get better over time and hence more of the theoretical performance of the GPU ends up doing real work. (Nvidia is not there yet. There is still too much overhead on the CPUs as they get hammered fielding memory requests for GPUs on some workloads.)<<

Nvidia is (and should) be very proud. But actually I’m already looking forward when hybrids get more common. They will really shake up the HPC-market (as The Register agrees) in lowering latency between GPU and CPU and lowering energy-consumption. But where we can find a bigger market is the mobile market.

Reducing Watts per Flop

Mostly ARM and Intel are in the market of low power devices, but GPUs can do a lot too as Nvidia has proven. Also AMD joined in this year with its low-power Fusion-line. It is not a coincidence that GPUs are used for computations in the whole range from mobile devices to super-computers. It is really simple: say the energy-consumption via the GPU is twice as high per second as when using the CPU – with a speed-up of 4 times the resulting power-consumption is still reduced with 50%.

At StreamHPC we see a whole market popping up in the mobile market, where we want to be part of it. It is also not a coincidence I wrote about new materials which could decrease power-consumption. If you look around, there is a huge consumer-market in need of more power for less energy, which increased as fast as the app-revolution. As OpenCL can solve this problem, it is no surprise many visitors of this site search for OpenCL on ARM.

As we learnt from the interview with ZiiLabs a week ago, with OpenCL a recently released mobile processor is faster than the fastest Pentium 4 already. Meaning big data can be processed locally without a compute-cloud, opening doors to many new possibilities which are now planned to be done in a few years. So from super-computing to mobile devices OpenCL (and alike methods) will give green computing a new meaning. With many governments stimulating any green investment and the many advantages of reduced energy-costs, I expect these markets to grow much more rapidly than the PC-market did the past 10 years.

The green 500

The green top 500 is an overview of the super-computers which have the most FLOPS per Watt. I just want to focus on the top 10. The greenest is the upcoming IBM Blue Gene/Q, but I cannot tell much about that machine yet. You find AMD Radeon at 3 and 10, NVidia Tesla at 4 and 5. At 6 comes the World’s fastest super-computer, the K. At place 7 to 9 you find IBM’s PowerXCell 8i. What is so special about this list, is that there are a lot of vector processors at the list, and most can be programmed with OpenCL. The GPUs of AMD and NVidia are well known to be programmable with OpenCL, but the PowerXCell 8i can also be programmed – that makes 7 out of 10. So we will see more and more hybrids between scalar and vector processors the coming years.

Your own power reduction

What can you do? It’s clear that using vector processors you can reduce a lot of power. You can investigate if your software can be made greener using recent processors. Also your new mobile app which took too many batteries, can be made ready for the market using OpenCL. And when you are buying a compute cluster, check if the type of processors match the problems you solve, so its efficiency is optimal.

Do you need advice or more info? Contact us – we have a network of partners around the world and can serve any demand.