We are in the niche of GPGPU-computing, where GPUs are programmed to efficiently run scientific and large-scale simulations, AI training/inference and other mathematical compute-intensive software. As a recognized expert, customers from mostly US and Europe trust us to speed up their software.
Our projects range from several person-weeks to fix software performance problems, to several person-years to build extensive high performance software and libraries.
Join a growing list of companies that trust us with designing and building their core software with performance in mind.
A selection of our customers
We have helped many companies become competitive, and cannot mention them here as of today. Below are public examples.
RocRAND. The world’s fastest random number generator is built for AMD GPUs, and it’s open source. With random numbers generated at several hundreds of gigabytes per second, the library makes it possible to speed up existing code numerous times. The code is faster than Nvidia’s cuRAND and is therefore the preferred library to be used on any high-end GPU.
RocPRIM. A version of CUB optimised for AMD GPUs and is fully open source. This enables software like Tensorflow to run on AMD hardware at full performance.
OpenCL 2.2 test-suite. When a hardware-company wants to have OpenCL 2.2 on their processor, they need to use a large test-suite to test their drivers and device. We made that update, which was a big change from 2.1 because of the addition of C++ kernels. We hope to see more devices support OpenCL 2.2 and find the new test suite to be complete and correct.
We ported GROMACS to OpenCL and optimised the code for usage with AMD FirePro accelerators. This resulted in code that is as fast with CUDA. Gromacs is used world-wide by over 5000 research centers, from simulating molecular docking to examining the hydrogen bonds in a falling water drop. Read more…
For the university of Stanford we optimised a part of TeraChem, a general purpose quantum chemistry software designed to run on NVIDIA GPU architectures. Our work resulted in adding an extra 70% performance to the already optimised CUDA-code.
For the University of Manchester we got a large speedup with UNIFAC when going from OpenMP code to optimised OpenCL. Where OpenMP could get the single threaded code down to about 8 seconds, we brought it down to 0.062 seconds. Read more…
We helped the Memorial Sloan-Kettering Cancer Center with improving a tool they used daily. Where it previously took one hour, it now takes just two minutes – a speed-up of 30x. Their productivity rose, as they did not need to wait for results so long anymore and could get more done without buying new computers.
Success stories
Want to read more what we did? Read about work we do.
Our customers did not want to hire another team, as they wanted the code to be fast the first time.