Recently AMD announced their new FirePro GPUs to be used in servers: the S9000 (shown at the right) and the S7000. They use passive cooling, as server-racks are actively cooled already. AMD partners for servers will have products ready Q1 2013 or even before. SuperMicro, Dell and HP will probably be one of the first.

What does this mean? We finally get a very good alternative to TESLA: servers with probably 2 (1U) or 4+ (3U) FirePro GPUs giving 6.46 to up to 12.92 TFLOPS or more theoretical extra performance on top of the available CPU. At StreamHPC we are happy with that, as AMD is a strong OpenCL-supporter and FirePro GPUs give much more performance than TESLAs. It also outperforms the unreleased Intel Xeon Phi in single precision and is close in double precision.

Edit: About the multi-GPU configuration

A multi-GPU card has various advantages as it uses less power and space, but does not compare to a single GPU. As the communication goes via the PCI-bus still, the compute-capabilities between two GPU cards and a multi-GPU card is not that different. Compute-problems are most times memory-bound and that is an important factor that GPUs outperform CPUs, as they have a very high memory bandwidth. Therefore I put a lot of weight on memory and cache available per GPU and core.

Performance comparison

At IBC’12 I was at the AMD booth and looked at the impressive performance of OpenCL-enabled software by AMD-partners when using the latest GPUs in the 9000-series. Let’s put into perspective why this is, by comparing the TESLA K10 against the FirePro S9000. As The K10 is a dual-GPU, the table is per GPU.

Functionality TESLA K10 FirePro S9000
GPU-Processor count 2 1
Architecture Kepler GK104 Graphics Core Next
Memory per GPU-processor 4 GB GDDR5 ECC 6GB GDDR5 ECC
Memory bandwidth per GPU-processor 160 GB/sec 264 GB/s
Performance (single precision, per GPU-proc.) 2.288 TFLOPS 3.230 TFLOPS
Performance (double precision, per GPU-proc.) 0.095 TFLOPS 0.806 TFLOPS
Max power usage per GPU-processor 150 (?) Watt (225 total) 225 Watt
Greenness 15.25 GFLOPS/Watt 14.35 GFLOPS/Watt
Bus Interface PCIe 3.0 x16 PCIe 3.0 x16
Price (per GPU-processor) $1638 $2500
Price per GFLOPS (SP) $0.72 $0.77
Price per GFLOPS (DP) $17.24 $3.10
Cooling Passive Passive

Sources for this table are below.


Tesla K10 is better in:

  • GFLOPS/Watt
  • price

But also the lack of a dual-GPU could be seen as an disadvantage, as for instance there can be less GPUs in an U1-server. In U3 this is less of a problem.

For the rest the S9000 is the better choice:

  • more memory per GPU,
  • more compute power,
  • higher memory bandwidth.
  • more FLOPS per dollar

FirePro GPUs also provide DirectGMA, which is direct access from other PCIe cards avoiding the CPU, so data from for example FireWire-cards can directly be transported to the FirePro cards via a DMA-channel. TESLA-cards don’t have this. Notice this is different from pinned memory, where both GPUs have support for.

Know that due to architectures-differences, the actual performance of the two professional GPUs may be closer or further away in various cases.

Based on above information we chose S9000s for our next generation Hadoop Servers.

AMD FirePro S9000:

