During the “little” HPC-show, SC12, several vendors have launched some very impressive products. Question is who steals the show from whom? Intel got their Phi-processor finally launched, NVIDIA came with the TESLA K20 plus K20X, and AMD introduced the FirePro S10000.

This card is the fastest card out there with 5.91 TFLOPS of processing power – much faster than the TESLA K20X, which only does 3.95 TFLOPS. But comparing a dual-GPU to a single-GPU card is not always fair. The moment you choose to have more than one GPU (several GPUs in one case or a small cluster), the S10000 can be fully compared to the Tesla K20 and K20X.

The S10000 can be seen as a dual-GPU version of the S90000, but does not fully add up. Most obvious is the big difference in power-usage (325 Watt) and the active cooling. As server-cases are made for 225 Watt cooling-power, this is seen as a potential possible disadvantage. But AMD has clearly looked around – for GPUs not 1U-cases are used, but 3U-servers using the full width to stack several GPUs.

Performance comparison

As I see a multi-GPU approach very different from a single-GPU, I chose to put it against the dual-GPU TESLA K10.

Functionality	TESLA K10	FirePro S10000
GPU-Processor count	2	2
Architecture	Kepler GK104	Graphics Core Next
Memory per GPU-processor	4 GB GDDR5 ECC	3GB GDDR5 ECC
Memory bandwidth per GPU-processor	160 GB/sec per GPU	240 GB/s per GPU
Performance (single precision, per GPU-proc.)	2.29 TFLOPS per GPU	2.95 TFLOPS per GPU
Performance (double precision, per GPU-proc.)	0.095 TFLOPS per GPU	0.74 TFLOPS per GPU
Max power usage for whole dual-GPU card	225 Watt	325 Watt
Greenness for whole dual-GPU card (SP)	20.35 GFLOPS/Watt	18.15 GFLOPS/Watt
Bus Interface	PCIe 3.0 x16	PCIe 3.0 x16
Price (per GPU-processor)	$1638 ($3275 total)	$1799 ($3599 total)
Price per GFLOPS (SP)	$0.72	$0.60
Price per GFLOPS (DP)	$17.24	$2.43
Cooling	Passive	Active (!)

Sources for this table are below.

Conclusion

S10000 has the performance SP&DP, worst in total power-usage/cooling. I was first not really enthusiastic about a card that consumes so much. But the moment your algorithm can be split over several GPUs, this card is a very good choice.

In comparison to the Tesla K10 it is a clear winner in all aspects. In GFLOPS/Watt the TESLA might outperform, but you get a much larger memory bandwidth in return.

Be sure to also check the other accelerators in the Answer-to series.

Feel free to mail me for some free advice if you are about to buy new accelerators. I do send a PDF with all the StreamHPC-trainings together with the answer.