A few months ago I wrote a theoretical article on how to build the cheapest and greenest supercomputer to enter the Top500 and Green500. There I showed that AMD would theoretically win on both GFLOPS/costs and GFLOPS/Watt. Last week I learned a large cluster is actually being built in Germany, which now leads the Green500 (GFLOPS/Watt). It is powered by Intel Ivy Bridge CPUs, an FDR Infiniband network and accelerated by air-cooled(!) AMD FirePro S9150 GPUs, as can be seen on the Green 500 report of November. The score: 5.27 GFLOPS per Watt, mostly because of AMD’s surprise act: extremely efficient SGEMM and DGEMM.
The first NVIDIA Tesla-based system on the list is at #3 with 4.45 GFLOPS per Watt for a liquid cooled system. If the AMD FirePro S9150 would be oil or water cooled, the system could go to over 6 GFLOPS per Watt. I’m expecting such system on the Green500 of June. The PEZY-SC (#2 on the list) is a very interesting, unexpected newcomer to the field – I’ll share more with you later, as I heard it supports OpenCL.
The price metric
The cluster at GSI Helmholtz Center has around 1.65 double precision PetaFlops (theoretical). Let’s do the same calculation as with the 150 GFLOPS system using the latest prices, only taking the accelerator part.
640 x AMD FirePro S9150.
- 2.53 GFLOPS * 640 = 1.62 TFLOPS (I rounded down to 2.0 GFLOPS in the other article)
- US$ 3300. Total price: $2.112M. Price per TFLOPS: $1.304M
- 235 Watt * 640 = 150 kWatt (excluding network, CPU, etc)
640 x NVIDIA Tesla K40x
- 1.42 GFLOPS * 640 = 0.91 TFLOPS
- US$ 3160 (got down a lot due to introduction K80!). Total price: $2.022M. Price per TFLOPS: $2.225M
- 235 Watt * 640 = 150 kWatt
640 x Intel XeonPhi 7120P
- 1.21 GFLOPS * 640 = 0.65 TFLOPS
- US$ 3450. Total price: 2.208$M. Price per TFLOPS: $3.397M
- 300 Watt * 640 = 192 kWatt
So it’s pretty clear, why GSI chose AMD: $92M or $209M less costs for the same GFLOPS. Also note that more GFLOPS per accelerator is important to lower overhead.
What to expect from June’s Green500
Next year Nvidia probably comes with Maxwell, which probably will do very well in the Green500. Intel has their new XeonPhi, but it’s a very new architecture and no samples have arrived yet – I would be surprised, as they over-promised for too long now. Besides bringing surprises, Intel’s other strengths are its vast collaborations and strong fanbase – the past years I heard the most ridiculous responses on why such underperforming accelerator was chosen instead of FirePro or Tesla, so it’s certainly aiming for a rampage (based on hope). AMD did not enclose any information on a new version of the S9150 (Something like S9200 or S9250).
Then there are the dual GPUs, which have no advantages but lower energy-usage. The K80 just arrived, but the number don’t add up yet – we’ll have to see when the samples arrive. AMD did not say anything about the next version of the S10000, but probably arrives next year – no ETA. Intel did not do dual-chip cards until now. These systems can be built more compact, as 4 GPUs per system is becoming a standard.
Another important change will be the CPUs with embedded CPU being used in the clusters, where now mostly Intel Xeons rule the world. Intel’s Iris Pro line and AMD new Carrizo APU could certainly get more popular, as more complex code can be accelerated very well by such processors. Also 64-bit ARM-processors we’ll see more – hopefully with GPU. This subject I’ll handle in a separate article, as OpenCL could be a big enabler for easy offloading.
Based on the current information I have available, Nvidia aims for Maxwell based Teslas, AMD with S9150 and the dual-GPU variant, Intel with none (aiming for November 2015). It’ll be exciting to see HPC get to 6+ GFLOPS/Watt as a standard – I find that more important than building the biggest cluster.
OpenCL will help select hardware from that year’s winner, not being locked in to that year’s loser. Meanwhile at StreamHPC we will keep building OpenCL-based software, to help our customers pick that winner.