Recently AMD announced their new FirePro GPUs to be used in servers: the S9000 (shown at the right) and the S7000. They use passive cooling, as server-racks are actively cooled already. AMD partners for servers will have products ready Q1 2013 or even before. SuperMicro, Dell and HP will probably be one of the first.
What does this mean? We finally get a very good alternative to TESLA: servers with probably 2 (1U) or 4+ (3U) FirePro GPUs giving 6.46 to up to 12.92 TFLOPS or more theoretical extra performance on top of the available CPU. At StreamHPC we are happy with that, as AMD is a strong OpenCL-supporter and FirePro GPUs give much more performance than TESLAs. It also outperforms the unreleased Intel Xeon Phi in single precision and is close in double precision.
Edit: About the multi-GPU configuration
A multi-GPU card has various advantages as it uses less power and space, but does not compare to a single GPU. As the communication goes via the PCI-bus still, the compute-capabilities between two GPU cards and a multi-GPU card is not that different. Compute-problems are most times memory-bound and that is an important factor that GPUs outperform CPUs, as they have a very high memory bandwidth. Therefore I put a lot of weight on memory and cache available per GPU and core.
At IBC’12 I was at the AMD booth and looked at the impressive performance of OpenCL-enabled software by AMD-partners when using the latest GPUs in the 9000-series. Let’s put into perspective why this is, by comparing the TESLA K10 against the FirePro S9000. As The K10 is a dual-GPU, the table is per GPU.
|Functionality||TESLA K10||FirePro S9000|
|Architecture||Kepler GK104||Graphics Core Next|
|Memory per GPU-processor||4 GB GDDR5 ECC||6GB GDDR5 ECC|
|Memory bandwidth per GPU-processor||160 GB/sec||264 GB/s|
|Performance (single precision, per GPU-proc.)||2.288 TFLOPS||3.230 TFLOPS|
|Performance (double precision, per GPU-proc.)||0.095 TFLOPS||0.806 TFLOPS|
|Max power usage per GPU-processor||150 (?) Watt (225 total)||225 Watt|
|Greenness||15.25 GFLOPS/Watt||14.35 GFLOPS/Watt|
|Bus Interface||PCIe 3.0 x16||PCIe 3.0 x16|
|Price (per GPU-processor)||$1638||$2500|
|Price per GFLOPS (SP)||$0.72||$0.77|
|Price per GFLOPS (DP)||$17.24||$3.10|
Sources for this table are below.
Tesla K10 is better in:
But also the lack of a dual-GPU could be seen as an disadvantage, as for instance there can be less GPUs in an U1-server. In U3 this is less of a problem.
For the rest the S9000 is the better choice:
- more memory per GPU,
- more compute power,
- higher memory bandwidth.
- more FLOPS per dollar
FirePro GPUs also provide DirectGMA, which is direct access from other PCIe cards avoiding the CPU, so data from for example FireWire-cards can directly be transported to the FirePro cards via a DMA-channel. TESLA-cards don’t have this. Notice this is different from pinned memory, where both GPUs have support for.
Know that due to architectures-differences, the actual performance of the two professional GPUs may be closer or further away in various cases.
Based on above information we chose S9000s for our next generation Hadoop Servers.
As usual, put in the comments your voice on points I missed or on anything you agree or disagree with. The comments are unmoderated, but be polite.
NVIDIA TESLA K10:
AMD FirePro S9000: