ARM Mali-T604 GPU has 3.5x more performance than dual core Cortex-A15

mont-blancAccording to the latest newsletter of the Mont-Blanc Project, it was explained that the GPU on a Samsung Exynos 5 is much faster and greener than its CPU: 3.5 times faster with half the energy. They built a supercomputer using 810 Exynos SoCs, that can deliver a 26 TFLOPS of peak performance. With the upcoming mobile GPUs becoming exponentially faster, they have all the expertise to build an even faster and greener ARM-supercomputer after this.

The Mont-Blanc compute cards deliver considerably higher performance; at 50% lower energy consumption, compared with previous ARM-based developer platforms.

The Mont-Blanc prototype is based on the Samsung Exynos 5 Dual SoC, which integrates a dual-core ARM Cortex-A15 and an on-chip ARM Mali-T604 GPU, and has been featured and market proven in advanced mobile devices. The dual-core ARM Cortex-A15 delivers twice the performance of the quad-core ARM Cortex-A9, used in the previous generation of ARM-based prototype, whilst consuming 20% less energy for the same workload. Furthermore, the on-chip ARM Mali-T604 GPU provides 3.5 times higher performance than the dual-core Cortex-A15, whilst consuming half the energy for the same workload.

Each Mont-Blanc compute card integrates one Samsung Exynos 5 Dual SoC, 4 GB of DDR3-1600 DRAM, a microSD slot for local storage and a 1 GbE NIC, all in an 85x56mm card (3.3×2.2 inches). A single Mont-Blanc blade integrates fifteen Mont-Blanc compute cards and a 1 GbE crossbar switch, which is connected to the rest of the system via two 10 GbE links. Nine Mont-Blanc blades fit into a standard BullX 9-blade INCA chassis. A complete Mont-Blanc rack hosts up to six such chassis, providing a total of 1620 ARM Cortex-A15 cores and 810 on-chip ARM Mali-T604 GPU accelerators, delivering 26 TFLOPS of peak performance.

“We are only scratching the surface of the Mont-Blanc potential”, says Alex Ramirez, coordinator of the Mont-Blanc project. “There is still room for improvement in our OpenCL algorithms, and for optimizations, such as executing on both the CPU and GPU simultaneously, or overlapping MPI communication with computation.”

There is not much said about power-usage. In the below (old!) slide you see the numbers for the prototype, which seem to be comparable.


Completely integrated it turns 1 Watt into 2.2 GFLOPS of compute power. The #1 (TSUBAME-KFC) on the Green500 does 4.5 GFLOPS per Watt! Officially the ARM MALI delivers 68 GFLOPS – I’m currently checking what are the right numbers.

So is the project there yet? No. Is this the right direction? Absolutely. The scientists at the project thinks that this is the way to get to Exascale in 2020.

Nicolas Boichat’s easy entry to get OpenCL on ARM Chromebooks

Want to try OpenCL programming on just 1 chip? As mentioned above,  ARM MALI can be programmed with OpenCL. And the Samsung ARM Chromebook has the same chip as used in the Mont Blanc Project.

Installing OpenCL on a Samsung ARM Chromebook is easy, thanks to Nicolas Boichat. Click here for his instructions. Basically most work is installing Crouton and Linux on the Chromebook, then it takes 5-10 minutes to get an example running.

He has also worked on porting the Rodinia benchmarks (instructions here). After that you can also run the AO-benchmark.

Enjoy playing!