During the “little” HPC-show, SC12, several vendors have launched some very impressive products. Question is who steals the show from whom? Intel got their Phi-processor finally launched, NVIDIA came with the TESLA K20 plus K20X, and AMD introduced the FirePro S10000.
This card is the fastest card out there with 5.91 TFLOPS of processing power – much faster than the TESLA K20X, which only does 3.95 TFLOPS. But comparing a dual-GPU to a single-GPU card is not always fair. The moment you choose to have more than one GPU (several GPUs in one case or a small cluster), the S10000 can be fully compared to the Tesla K20 and K20X.
The S10000 can be seen as a dual-GPU version of the S90000, but does not fully add up. Most obvious is the big difference in power-usage (325 Watt) and the active cooling. As server-cases are made for 225 Watt cooling-power, this is seen as a potential possible disadvantage. But AMD has clearly looked around – for GPUs not 1U-cases are used, but 3U-servers using the full width to stack several GPUs.
As I see a multi-GPU approach very different from a single-GPU, I chose to put it against the dual-GPU TESLA K10.
|Functionality||TESLA K10||FirePro S10000|
|Architecture||Kepler GK104||Graphics Core Next|
|Memory per GPU-processor||4 GB GDDR5 ECC||3GB GDDR5 ECC|
|Memory bandwidth per GPU-processor||160 GB/sec per GPU||240 GB/s per GPU|
|Performance (single precision, per GPU-proc.)||2.29 TFLOPS per GPU||2.95 TFLOPS per GPU|
|Performance (double precision, per GPU-proc.)||0.095 TFLOPS per GPU||0.74 TFLOPS per GPU|
|Max power usage for whole dual-GPU card||225 Watt||325 Watt|
|Greenness for whole dual-GPU card (SP)||20.35 GFLOPS/Watt||18.15 GFLOPS/Watt|
|Bus Interface||PCIe 3.0 x16||PCIe 3.0 x16|
|Price (per GPU-processor)||$1638 ($3275 total)||$1799 ($3599 total)|
|Price per GFLOPS (SP)||$0.72||$0.60|
|Price per GFLOPS (DP)||$17.24||$2.43|
Sources for this table are below.
S10000 has the performance SP&DP, worst in total power-usage/cooling. I was first not really enthusiastic about a card that consumes so much. But the moment your algorithm can be split over several GPUs, this card is a very good choice.
In comparison to the Tesla K10 it is a clear winner in all aspects. In GFLOPS/Watt the TESLA might outperform, but you get a much larger memory bandwidth in return.
Be sure to also check the other accelerators in the Answer-to series.
Feel free to mail me for some free advice if you are about to buy new accelerators. I do send a PDF with all the StreamHPC-trainings together with the answer.
AMD FirePro S10000:
NVIDIA TESLA K10: