NOTE: there are many contradicting sources out there, so there are mistakes in this article. Please give me feedback via twitter, mail or comments, so all the info can be completed.
Yes, another post in the answer-to series. At SC12 Intel tries to steal away the show from the Tesla K20 and FirePro S10000.
After two years of waiting Intel finally comes with an accelerator-card: the Xeon Phi. Compare it if NVIDIA would have skipped the GTX 200 series and now has presented the GTX 500 series. Or maybe even the GTX 600 series – we cannot tell yet.
The Phi is not a compute-card as we know it. As you cannot do a 1-to-1 comparison between AMD GCN architecture and NVIDIA Kepler, neither can be easily compared to the Phi. But this article should give an idea on where it is positioned.
The architecture
It contains 60 cores with a vector-width of 512 bits (8 times 64 bits). This means that per clock-tick it can do one computation on a 8-wide vector of double precision floats on each of the 60 cores (SIMD). Compare this to an AMD card, which has several hundreds of cores with support for 4-wide vectors of single precision floats (VLIW). At 1.053 GHz this gives 1.050 * 60 * 8 * 2 = 1011 GFLOPS.
The above 2 is because it is capable of doing MAD-operations: a multiply + an add. This means that if you have a multiply-operation, you can get an add free – if not, then you get 0.5 TFLOPS only. For more information, check “Fused Multiply-Add” on page two of Differences in floating-point arithmetic between Intel® Xeon® processors and the Intel® Xeon Phi™ coprocessor [PDF].
Most interesting would be to know how good the scheduler is implemented. If there is one (full) scheduler per core, then the Phi will be much easier to program than an accelerator of AMD or NVIDIA. Do note that upcoming architectures of the two GPU-vendors are much more advanced in this criterion.
There is no official information that single precision is double the performance of double precision – clear is that they focus on double precision. It has a strong focus on cache-sizes (± 1.8 MB L1, ± 30 MB L2 cache per core (?)) and a high memory bandwidth (320 GB/s → ±5.33 GB per core) – both will increase programmability of the accelerator. This makes it easier to write code that runs at 70% or better.
The Phi is special in more ways. When the Phi was still called the Knights Corner, it was mentioned that it is pre-loaded with an embedded Linux. This means it is an computer on its own. You can read more about it here.
Knowing this capability of the Phi, it is strange it is strongly positioned to be used with a strong CPU. Also for future releases Intel focuses its system-architecture on combining an Intel Phi next to an (Intel) CPU (see image blue is CPU, yellow is Phi).
This is a different approach than what is popular with other chip-designers, which try to find ways to put the accelerator on the same die as the CPU. But as the interconnect-war is currently heating up, we cannot draw any conclusion from this. Think of the various ways the 386/486 co-processors could be connected to the motherboard/CPU – also this time nothing has be decided yet.
Programming models
Intel chose the safest way to attract as many developers as possible: support all models. This list could be decreased for the sake of vendor lock-in, but for now we can enjoy it. The below image is taken from this PDF. Of course OpenCL is in it too.
Performance comparison
A comparison to both competitors. There are many, many sources all claiming different things. Will therefore update this tables a lot the coming time.
Tesla K20:
Functionality | XEON Phi 5110P | TESLA K20 |
---|---|---|
GPU-Processor count | 1 | 1 |
Architecture | X87 Knights Corner | Kepler GK110 |
Memory per GPU-processor | up to 8 GB GDDR5. No ECC! | 6GB GDDR5 (5 GB w/ ECC) |
Memory bandwidth per GPU-processor | 320 GB/s | 200 GB/s |
Performance (single precision, per GPU-proc.) | 2.022 TFLOPS | 3.52 TFLOPS |
Performance (double precision, per GPU-proc.) | 1.011 TFLOPS | 1.17 TFLOPS |
Max power usage per GPU-processor | 225 Watt | 225 Watt |
Greenness (SP) | 8.99 GFLOPS/Watt ? | 15.6 GFLOPS/Watt |
Bus Interface | PCIe 3.0 x16 | PCIe 3.0 x16 |
Price (per GPU-processor) | $2649 | $3199 |
Price per GFLOPS (SP) | $2.62 | $0.77 |
Price per GFLOPS (DP) | $2.62 | $2.73 |
Cooling | Passive (P, A = Active) | Passive |
FirePro S9000 – see next article for the S10000:
Functionality | XEON Phi 5110P | FirePro S9000 |
---|---|---|
GPU-Processor count | 1 | 1 |
Architecture | X87 Knights Corner | Graphics Core Next |
Memory per GPU-processor | up to 8 GB GDDR5. No ECC! | 6GB GDDR5 (5 GB w/ ECC) |
Memory bandwidth per GPU-processor | 320 GB/s | 264 GB/s |
Performance (single precision, per GPU-proc.) | 2.022 TFLOPS | 3.230 TFLOPS |
Performance (double precision, per GPU-proc.) | 1.011 TFLOPS | 0.806 TFLOPS |
Max power usage per GPU-processor | 225 Watt | 225 Watt |
Greenness (SP) | 8.99 GFLOPS/Watt | 14.35 GFLOPS/Watt |
Bus Interface | PCIe 3.0 x16 | PCIe 3.0 x16 |
Price (per GPU-processor) | $2649 | $2500 |
Price per GFLOPS (SP) | $2.62 | $0.77 |
Price per GFLOPS (DP) | $2.62 | $3.10 |
Cooling | Passive (P, A = Active) | Passive |
Sources for the Phi-specifications are below.
As not all information is public, no conclusions can be drawn yet. Follow us on Twitter or LinkedIn to get noticed of any update of this article and other interesting information.
Sources
- The Register: Intel Xeon Phi battles GPUs, defends x86 in supercomputers
- CPU World: Details of Intel Xeon Phi coprocessors (September, so lot of guesses)
- Intel: The Intel Xeon Phi Coprocessor: Parallel Processing, Unparalleled Discovery
- Intel: Differences in floating-point arithmetic between Intel® Xeon® processors and the Intel® Xeon Phi™ coprocessor [PDF]
- Intel: Intel Xeon Phi Coprocessor Instruction Set Reference Manual, 2.2M (updated September 7, 2012) [PDF]
- Intel: Introducing OpenCL 1.2 for Intel Xeon Phi coprocessor
- ELEKS: NVIDIA Tesla K20 benchmark: facts, figures and some conclusions
- HPC-wire: Intel brings manycore x86 to market with knights corner
Blogs of people at Intel who are into Phi and OpenCL
Sad OpenACC isn’t supported.
Are you sure? http://www.caps-entreprise.com/products/caps-compilers/
Pingback: Walking Randomly » Intel’s Xeon Phi – GPU level power without the hassle?