The green500 is out and one unknown processor takes the number one position with a huge improvement over last year. It is a new super-computer installed at RIKEN with an incredible 7 GFLOPS/Watt. It is powered by the processor-boards at the right: two Xeons, 4 PEZY-SC 1.4 accelerators and 128GB DRAM, which have a combined performance of about 6.2 TFLOPS. It has been designed for immersive cooling.
The second and third positions are also powered by the PEZY-SC, before we find the winner of last year: the AMD FirePro S9150 and a bit after that the rest (mostly NVidia Tesla). One constant is the CPUs used: Intel XEON is taking most. To my big surprise no ARM64.
From the third to the first PEZY-SC installation there is an improvement of 13%. It seems the first two are the new type, called “bricks”, while the third is the same as last year. Comparing with that super from last year (4.4945 GFLOPS/W) there is an improvement of 42% and 25%. The 13% improvement from the previous version is interesting enough, but the 25% improvement on exactly the same system raised questions. Probably it is due to compiler-optimisations. As the November-version of the Green500 is much more strict, it will be clear if the rules were bent – let’s hope it’s for real!
It supports OpenCL!
When new accelerators support OpenCL, it gets accepted more easily. So it is very interesting the PEZY-SC runs on OpenCL. I asked at ISC and got explained it was a subset of OpenCL, but could not get the finger on which subset, nor could I get access to test it. It does mean that code that would run well on this machine is easy to port. And then I mean the same “easy” Intel uses for explaining the easyness of porting OpenMP software to XeonPhi: PEZI-specific optimisations and writing around the missing functionality would still take effort – the typical stuff we do at StreamHPC.
RIKEN Shoubu
Some information on “Shoubu” (“Iris” in Japanese), the top 1 on the Green 500. According to the Green500 it is 353.8 TFLOPS (based on 50kW, using an actual benchmark). On 25 June RIKEN announced the Shoubu is 2 PFLOPS (theoretical). If the full machine is used for the Green500, then the efficiency was only 18%!
Below are some images of the installation.
Source: http://www.exascaler.co.jp/wp-content/uploads/2015/06/20150625.pdf
An important part is Exascaler’s immersion technology, what I understood is a spin-off of PEZY. I’m very curious what the AMD FirePro S9150 does when it uses immersion-cooling – I think we have to do some frying at the office to find out.
PEZY-SC1.4 and PEZY-SC2
PEZY started with a multi-core processor of 512 cores, the PEZY-1. The PEZY-SC has 1024 cores and has had a few gradual upgrades – currently PEZY-SC 1.4 (“the brick”) is installed.
PEZY-SC Specification:
Logic Cores(PE) | 1,024 |
Core Frequency | 733MHz |
Peak Performance | Floating Point. Single 3.0TFlops / Double 1.5TFlops |
Host Interface | PCI Express GEN3.0 x8Lane x 4Port (x16 bifurcation available) JESD204B Protocol support |
DRAM Interface | DDR4, DDR3 combo 64bit x 8Port Max B/W 1533.6GB/s +Ultra WIDE IO SDRAM (2,048bit) x 2Port Max B/W 102.4GB/s |
Control CPU | ARM926 dual core |
Process Node | 28nm |
Package | FCBGA 47.5mm x 47.5mm, Ball Pitch 1mm, 2,112pin |
Source: http://pezy.co.jp/en/products/pezy-sc.html
Development on PEZY-SC2 is ongoing, which will have a staggering 4096 cores. Ofcourse efficiency has to go up (if the 18% is correct), to make this a good upgrade.
There is no promise on when the PEZY-SC2 will be announced, but it will certainly surprise us again hen it arrives.
Related Posts
Improving FinanceBench for GPUs Part II – low hanging fruit
We found a finance benchmark for GPUs and wanted to show we could speed its algorithms up. Like a lot! Following the initial work done in porting ...
The Art of Benchmarking
How fast is your software? The simpler the software setup, the easier to answer this question. The more complex the software, the more the answer will ...
Birthday present! Free 1-day Online GPGPU crash course: CUDA / HIP / OpenCL
Stream HPC is 10 years old on 1 April 2020. Therefore we offer our one day GPGPU crash course for free that whole month. Now Corona (and fear for i ...
Problem solving tactic: making black boxes smaller
... decisions, and then assuming fits in all these scary unknowns. So if you use the words "solving a problem", you visualise it ...