Our projects range from several person-weeks to fix software performance problems, to several person-years to build high performance software and libraries.

Join a growing list of companies that trust us with designing and building their core software with performance in mind.

 

A selection of our customers

We have helped many companies become competitive, and cannot mention them here as of today. Below are public examples.

RocRAND. The world’s fastest random number generator is built for AMD GPUs, and it’s open source. With random numbers generated at several hundreds of gigabytes per second, the library makes it possible to speed up existing code numerous times. The code is faster than Nvidia’s cuRAND and is therefore the preferred library to be used on any high-end GPU.

RocPRIM. A version of CUB optimised for AMD GPUs and is fully open source. This enables software like Tensorflow to run on AMD hardware at full performance.

OpenCL 2.2 test-suite. When a hardware-company wants to have OpenCL 2.2 on their processor, they need to use a large test-suite to test their drivers and device. We made that update, which was a big change from 2.1 because of the addition of C++ kernels. We hope to see more devices support OpenCL 2.2 and find the new test suite to be complete and correct.

GROMACS does soft matter simulations on molecular scale

We ported GROMACS to OpenCL and optimised the code for usage with AMD FirePro accelerators. This resulted in code that is as fast with CUDA. Gromacs is used world-wide by over 5000 research centers, from simulating molecular docking to examining the hydrogen bonds in a falling water drop. Read more…

stanford_chemistry_logo

For the university of Stanford we optimised a part of TeraChem, a general purpose quantum chemistry software designed to run on NVIDIA GPU architectures. Our work resulted in adding an extra 70% performance to the already optimised CUDA-code.

UniOfManchesterLogo.svg

For the University of Manchester we got a large speedup with UNIFAC when going from OpenMP code to optimised OpenCL. Where OpenMP could get the single threaded code down to about 8 seconds, we brought it down to 0.062 seconds. Read more…

Memorial Sloan-Kettering Cancer Center-logo

We helped the Memorial Sloan-Kettering Cancer Center with improving a tool they used daily. Where it previously took one hour, it now takes just two minutes – a speed-up of 30x. Their productivity rose, as they did not need to wait for results so long anymore and could get more done without buying new computers.

Success stories

Want to read more what we did? Read about work we do.

Our customers did not want to hire another team, as they wanted the code to be fast the first time.

Technologies we work with

CUDA
HSA
OpenMP
OpenACC
ROCm


Want to know more? Get in contact!

We are the acknowledged experts in OpenCL, CUDA and performance optimization for CPUs and GPUs. We proudly boast a portfolio of satisfied customers worldwide, and can also help you build high performance software. E-mail us today