AMD now leads the Green500

green500With SC14 behind us, there are a few things I’d like to share with you. I’d like to start with the biggest win for OpenCL: AMD leading in the most power-efficient GPU-cluster.

A few months ago I wrote a theoretical article on how to build the cheapest and greenest supercomputer to enter the Top500 and Green500. There I showed that AMD would theoretically win on both GFLOPS/costs and GFLOPS/Watt. Last week I learned a large cluster is actually being built in Germany, which now leads the Green500 (GFLOPS/Watt). It is powered by Intel Ivy Bridge CPUs, an FDR Infiniband network and accelerated by air-cooled(!) AMD FirePro S9150 GPUs, as can be seen on the Green 500 report of November. The score: 5.27 GFLOPS per Watt, mostly because of AMD’s surprise act: extremely efficient SGEMM and DGEMM.

green5

The first NVIDIA Tesla-based system on the list is at #3 with 4.45 GFLOPS per Watt for a liquid cooled system. If the AMD FirePro S9150 would be oil or water cooled, the system could go to over 6 GFLOPS per Watt. I’m expecting such system on the Green500 of June. The PEZY-SC (#2 on the list) is a very interesting, unexpected newcomer to the field – I’ll share more with you later, as I heard it supports OpenCL.

The price metric

The cluster at GSI Helmholtz Center has around 1.65 double precision PetaFlops (theoretical). Let’s do the same calculation as with the 150 GFLOPS system using the latest prices, only taking the accelerator part.

640 x AMD FirePro S9150.

  • 2.53 GFLOPS * 640 = 1.62 TFLOPS (I rounded down to 2.0 GFLOPS in the other article)
  • US$ 3300. Total price: $2.112M. Price per TFLOPS: $1.304M
  • 235 Watt * 640 = 150 kWatt (excluding network, CPU, etc)

640 x NVIDIA Tesla K40x

  • 1.42 GFLOPS * 640 = 0.91 TFLOPS
  • US$ 3160 (got down a lot due to introduction K80!). Total price: $2.022M. Price per TFLOPS: $2.225M
  • 235 Watt * 640 = 150 kWatt

640 x Intel XeonPhi 7120P

  • 1.21 GFLOPS * 640 = 0.65 TFLOPS
  • US$ 3450. Total price: 2.208$M. Price per TFLOPS: $3.397M
  • 300 Watt * 640 = 192 kWatt

So it’s pretty clear, why GSI chose AMD: $92M or $209M less costs for the same GFLOPS. Also note that more GFLOPS per accelerator is important to lower overhead.

What to expect from June’s Green500

Next year Nvidia probably comes with Maxwell, which probably will do very well in the Green500. Intel has their new XeonPhi, but it’s a very new architecture and no samples have arrived yet – I would be surprised, as they over-promised for too long now. Besides bringing surprises, Intel’s other strengths are its vast collaborations and strong fanbase – the past years I heard the most ridiculous responses on why such underperforming accelerator was chosen instead of FirePro or Tesla, so it’s certainly aiming for a rampage (based on hope). AMD did not enclose any information on a new version of the S9150 (Something like S9200 or S9250).

Then there are the dual GPUs, which have no advantages but lower energy-usage. The K80 just arrived, but the number don’t add up yet – we’ll have to see when the samples arrive. AMD did not say anything about the next version of the S10000, but probably arrives next year – no ETA. Intel did not do dual-chip cards until now. These systems can be built more compact, as 4 GPUs per system is becoming a standard.

Another important change will be the CPUs with embedded CPU being used in the clusters, where now mostly Intel Xeons rule the world. Intel’s Iris Pro line and AMD new Carrizo APU could certainly get more popular, as more complex code can be accelerated very well by such processors. Also 64-bit ARM-processors we’ll see more – hopefully with GPU. This subject I’ll handle in a separate article, as OpenCL could be a big enabler for easy offloading.

Based on the current information I have available, Nvidia aims for Maxwell based Teslas, AMD with S9150 and the dual-GPU variant, Intel with none (aiming for November 2015). It’ll be exciting to see HPC get to 6+ GFLOPS/Watt as a standard – I find that more important than building the biggest cluster.

OpenCL will help select hardware from that year’s winner, not being locked in to that year’s loser. Meanwhile at StreamHPC we will keep building OpenCL-based software, to help our customers pick that winner.

Related Posts

5yearsSC

Birthday present! Free 1-day Online GPGPU crash course: CUDA / HIP / OpenCL

Stream HPC is 10 years old on 1 April 2020. Therefore we offer our one day GPGPU crash course for free that whole month. Now Corona (and fear for i ...

network-of-boxes

Problem solving tactic: making black boxes smaller

We are a problem solving company first, specialised in HPC - building software close to the processor. The more projects we finish, the more it's clea ...

stocks

Improving FinanceBench

If you're into computational finance, you might have heard of FinanceBench. It's a benchmark developed at the University of Deleware and is aimed a ...

Kabul - An Afghan National Police (ANP) students study basic logistics training at the Afghan Ministry in Interior (MOI) August 1, 2010. The Afghan MOI is providing the first ever computer based logistics training for ANP personnel. (U.S. Air Force photo/ Staff Sergeant Matt Davis)

Updated: OpenCL and CUDA programming training – now online

Update: due to Corona, the Amsterdam training has been cancelled. We'll offer the training online on dates that better suit the participants. As it ...