AMD’s answer to NVIDIA TESLA K10: the FirePro S9000

Reading Time: 3 minutes

Recently AMD announced their new FirePro GPUs to be used in servers: the S9000 (shown at the right) and the S7000. They use passive cooling, as server-racks are actively cooled already. AMD partners for servers will have products ready Q1 2013 or even before. SuperMicro, Dell and HP will probably be one of the first.

What does this mean? We finally get a very good alternative to TESLA: servers with probably 2 (1U) or 4+ (3U) FirePro GPUs giving 6.46 to up to 12.92 TFLOPS or more theoretical extra performance on top of the available CPU. At StreamHPC we are happy with that, as AMD is a strong OpenCL-supporter and FirePro GPUs give much more performance than TESLAs. It also outperforms the unreleased Intel Xeon Phi in single precision and is close in double precision.

Edit: About the multi-GPU configuration

A multi-GPU card has various advantages as it uses less power and space, but does not compare to a single GPU. As the communication goes via the PCI-bus still, the compute-capabilities between two GPU cards and a multi-GPU card is not that different. Compute-problems are most times memory-bound and that is an important factor that GPUs outperform CPUs, as they have a very high memory bandwidth. Therefore I put a lot of weight on memory and cache available per GPU and core.

Performance comparison

At IBC’12 I was at the AMD booth and looked at the impressive performance of OpenCL-enabled software by AMD-partners when using the latest GPUs in the 9000-series. Let’s put into perspective why this is, by comparing the TESLA K10 against the FirePro S9000. As The K10 is a dual-GPU, the table is per GPU.


FunctionalityTESLA K10FirePro S9000
GPU-Processor count21
ArchitectureKepler GK104Graphics Core Next
Memory per GPU-processor4 GB GDDR5 ECC6GB GDDR5 ECC
Memory bandwidth per GPU-processor160 GB/sec264 GB/s
Performance (single precision, per GPU-proc.)2.288 TFLOPS3.230 TFLOPS
Performance (double precision, per GPU-proc.)0.095 TFLOPS0.806 TFLOPS
Max power usage per GPU-processor150 (?) Watt (225 total)225 Watt
Greenness15.25 GFLOPS/Watt14.35 GFLOPS/Watt
Bus InterfacePCIe 3.0 x16PCIe 3.0 x16
Price (per GPU-processor)$1638$2500
Price per GFLOPS (SP)$0.72$0.77
Price per GFLOPS (DP)$17.24$3.10
CoolingPassivePassive

Sources for this table are below.

Conclusion

Tesla K10 is better in:

  • GFLOPS/Watt
  • price

But also the lack of a dual-GPU could be seen as an disadvantage, as for instance there can be less GPUs in an U1-server. In U3 this is less of a problem.

For the rest the S9000 is the better choice:

  • more memory per GPU,
  • more compute power,
  • higher memory bandwidth.
  • more FLOPS per dollar

FirePro GPUs also provide DirectGMA, which is direct access from other PCIe cards avoiding the CPU, so data from for example FireWire-cards can directly be transported to the FirePro cards via a DMA-channel. TESLA-cards don’t have this. Notice this is different from pinned memory, where both GPUs have support for.

Know that due to architectures-differences, the actual performance of the two professional GPUs may be closer or further away in various cases.

Based on above information we chose S9000s for our next generation Hadoop Servers.

As usual, put in the comments your voice on points I missed or on anything you agree or disagree with. The comments are unmoderated, but be polite.

Sources

NVIDIA TESLA K10:

http://www.nvidia.com/object/tesla-servers.html
http://www.nvidia.com/content/PDF/kepler/Tesla_K10_BD-06280-001_v05.pdf
http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Delectronics&field-keywords=nvidia+tesla+k10

AMD FirePro S9000:

http://www.amd.com/us/products/workstation/graphics/firepro-remote-graphics/S9000/Pages/S9000.aspx
http://www.amd.com/us/Documents/FirePro_S9000_Data_Sheet.pdf
http://www.amd.com/us/Documents/SDI-tech-brief.pdf

 

Related Posts

S10000

AMD positions FirePro S10000 against both TESLA K10 (and K20)

...  from whom? Intel got their Phi-processor finally launched, NVIDIA came with the TESLA K20 plus K20X, and AMD introduced the ...

CMD

Intel’s answer to AMD and NVIDIA: the XEON Phi 5110P

...  all the info can be completed.Yes, another post in the answer-to series. At SC12 Intel tries to steal away the show from the Tesla ...

Tesla-K20

NVIDIA’s answer to FirePro S9000: the TESLA K20

...  months ago I wrote about the FirePro S9000 - AMD's answer to the K10 - and was already looking forward to this K20. Where in ...