The knowns and unknowns of the PEZY-SC accelerator at RIKEN

PEZY-SC_QuadPCB-1_smallThe green500 is out and one unknown processor takes the number one position with a huge improvement over last year. It is a new super-computer installed at RIKEN with an incredible 7 GFLOPS/Watt. It is powered by the processor-boards at the right: two Xeons, 4 PEZY-SC 1.4 accelerators and 128GB DRAM, which have a combined performance of about 6.2 TFLOPS. It has been designed for immersive cooling.

The second and third positions are also powered by the PEZY-SC, before we find the winner of last year: the AMD FirePro S9150 and a bit after that the rest (mostly NVidia Tesla). One constant is the CPUs used: Intel XEON is taking most. To my big surprise no ARM64.

green500_2015june_top5

From the third to the first PEZY-SC installation there is an improvement of 13%. It seems the first two are the new type, called “bricks”, while the third is the same as last year. Comparing with that super from last year (4.4945 GFLOPS/W) there is an improvement of 42% and 25%. The 13% improvement from the previous version is interesting enough, but the 25% improvement on exactly the same system raised questions. Probably it is due to compiler-optimisations. As the November-version of the Green500 is much more strict, it will be clear if the rules were bent – let’s hope it’s for real!

It supports OpenCL!

When new accelerators support OpenCL, it gets accepted more easily. So it is very interesting the PEZY-SC runs on OpenCL. I asked at ISC and got explained it was a subset of OpenCL, but could not get the finger on which subset, nor could I get access to test it. It does mean that code that would run well on this machine is easy to port. And then I mean the same “easy” Intel uses for explaining the easyness of porting OpenMP software to XeonPhi: PEZI-specific optimisations and writing around the missing functionality would still take effort – the typical stuff we do at StreamHPC.

RIKEN Shoubu

Some information on “Shoubu” (“Iris” in Japanese), the top 1 on the Green 500. According to the Green500 it is 353.8 TFLOPS (based on 50kW, using an actual benchmark). On 25 June RIKEN announced the Shoubu is 2 PFLOPS (theoretical). If the full machine is used for the Green500, then the efficiency was only 18%!

Below are some images of the installation.

shoubu2 shoubu3 shoubu1

Source: http://www.exascaler.co.jp/wp-content/uploads/2015/06/20150625.pdf

An important part is Exascaler’s immersion technology, what I understood is a spin-off of PEZY. I’m very curious what the AMD FirePro S9150 does when it uses immersion-cooling – I think we have to do some frying at the office to find out.

PEZY-SC1.4 and PEZY-SC2

PEZY started with a multi-core processor of 512 cores, the PEZY-1. The PEZY-SC has 1024 cores and has had a few gradual upgrades – currently PEZY-SC 1.4 (“the brick”) is installed.

PEZY-SC Specification:

Logic Cores(PE) 1,024
Core Frequency 733MHz
Peak Performance Floating Point. Single 3.0TFlops / Double 1.5TFlops
Host Interface PCI Express GEN3.0 x8Lane x 4Port (x16 bifurcation available)
JESD204B Protocol support
DRAM Interface DDR4, DDR3 combo 64bit x 8Port Max B/W 1533.6GB/s
+Ultra WIDE IO SDRAM (2,048bit) x 2Port Max B/W 102.4GB/s
Control CPU ARM926 dual core
Process Node 28nm
Package FCBGA 47.5mm x 47.5mm, Ball Pitch 1mm, 2,112pin

Source: http://pezy.co.jp/en/products/pezy-sc.html

Development on PEZY-SC2 is ongoing, which will have a staggering 4096 cores. Ofcourse efficiency has to go up (if the 18% is correct), to make this a good upgrade.

There is no promise on when the PEZY-SC2 will be announced, but it will certainly surprise us again hen it arrives.