
NVidia
As NVidia doesn’t have X86, they mostly focuses on GPUs and bet on POWER and ARM for CPU. They already sell their Pascal-architecture in small numbers. 2017 will all be about their Pascal-architecture.
Tesla K80 (Kepler)
- The GPU is not simply 2 x K40 (GK110B GPUs), the chip is actually different (GK210)
- It is the Nvidia GPU with the largest private memory size (used in kernels): 255.
Pascal P100 (Pascal)
- 20 TFLOPS Half Precision (HP), 10 TFLOPS single precision, 5 TFLOPS double precision
- 16 GB HBM2 (720 GB/s).
- NVlink up to 64 GB/s effectively (20% of the 80 GB/s is protocol-overhead), dual simplex bidirectional (so dedicated wires per direction). Each NVLink offers a bidirectional 16 GB/sec up and 16 GB/sec down. Compared to 12 GB/s PCIe3 x16 (24 GB/s cumulative), this is a good speed-up. The support is only available between Pascal-GPUs, and not between the GPU and CPU yet.
- OpenPOWER support coming, to compete with Intel.
Titan Black (Kepler) and GTX 980 (Maxwell)
- The Titan Black has 1.7 TFLOPS DP, 4.5 TFLOPS SP.
- The GTX 980 has 0.14 TFLOPS DP, 4.6 TFLOPS SP.
Tegra X1
- 0.5 TFLOPS SP (GPU), 1 TFLOPS HP
- 10 Watts
AMD
Known for the strongest OpenCL-developers since 2012. With HSA-capable Fiji-GPUs, they now got to their third GPGPU-architecture after “VLIW” and “GCN” – fully driven by their HSA-initiative. For 2017 they focus on their main advantages: brute Single Precision performance, HBM (they have early access), their new CPU (Zen) and new GPU (Polaris).FirePro S9170 (GCN)
- 32GB GDDR5 global memory
- 2.5 TFLOPS DP, 5 TFLOPS SP
Radeon Nano and FirePro S9300X2 (Fiji)
- Nano: 0.8 TFLOPS DP, 8 TFLOPS SP, no HP-support at the processor (only for data-transfers)
- S9300X2: 1.4 TFLOPS DP, 13.9 TFLOPS SP (lower clocked)
- Nano 175 Watt, S9300X2 300 Watt
- Nano has 4 GB HBM, with a bandwidth up to 512GB/s, S9300X2 has 2x 4GB HBM.
AMD Carrizo A10 8890k APU (HSA)
- CPU with built-in GPU
- About one TFLOPS
- TDP of 95 Watt
Intel
After years of “Peter and the wolf” stories, they seem to finally have gotten the Larrabee they promised years ago. With the acquisition of Altera, new processors are at the horizon. Their focus is still on customers who focus on test-driven design and want to “make it run quickly, make it perform later”.Xeon E5-2699 v4
- 55MB cache, 22 cores
- AVX 2.0 (256 bit vector operations)
- DDR4 (60 GB/s)
XeonPhi Knights landing
- Available in socket and PCI version
- 3 TFLOPS DP, 6 TFLOPS SP
- AVX 512 (512 bit vector operations)
- 16 GB HBM (over 400GB/s), up to 348 GB DDR4 (60 GB/s).
- Currently (?) not programmable with OpenCL
Xeon+FPGA
- Task-parallel processor
- Low-latency
Iris GPUs
- CPU with built-in GPU
- 0.7 TFLOPS SP
Selecting the right hardware
Choosing the best hardware has become quite complex, especially when focusing on the TCO (Total Costs of Ownership). At StreamHPC we have experience with many of the devices above, but also various embedded hardware that compete with the above processors on a totally different scale. You need to select the right benchmarks to know what your device of choice is – we can help with that.Related Posts
The 12 latest Twitter Poll Results of 2018
... polls are not focused (and thus difficult to answer), most polls are incomplete by design. Still insights can be given. Or ...