We help our customers get faster, more responsive and/or more precise software. But what does that mean? What is it what we do here in Amsterdam?
Our work is under NDA for a large part, so unfortunately we cannot share all details of all the work we’re very proud of.
Below is a selection of blog posts discussing our demos and Github-links showing our work and open-source software.
Projects
Note that various GPU and CPU code does contain low-level code like PTX, AMDIL and Assembly, but is minimal and only used by exception.
- Porting GROMACS, OpenMM, AMBER and more (active project) to supercomputers running AMD MI100 GPUs. While we were busy, we optimized some code that also runs faster on Nvidia GPUs, so the comparisons between Nvidia and AMD would be fair. If you run one of these on your local supercomputer – you’re welcome.
- Building the Khronos OpenCL SDK [OpenCL, C, C++]. To be published.
- Speeding up pyPasWAS 3-5x [C, Python, OpenCL]. We claimed that we could speed up this open-source software to do DNA/RNA/protein sequence alignment and trimming, and so we did.
- Building multiple libraries for AMD GPUs. Several foundational libraries on ROCm Github were built by us, and we still maintain.
- rocRAND [HIP, C++]. The world’s fastest random number generator (or second, depending on Nvidia’s response) is built for AMD GPUs, and it’s open source. With random numbers generated at several hundreds of gigabytes per second, the library makes it possible to speed up existing code numerous times. The code is often faster than Nvidia’s cuRAND and is therefore the preferred library to be used on any high-end GPU.
- rocThrust – AMD’s optimized version of Thrust [HIP, C++]. Highly optimized for CDNA GPUs. Lots of software for CUDA is Thrust based, and now has no lock-in anymore.
- hipCUB – AMD’s optimized version of CUB [HIP, C++]. Highly optimized for CDNA GPUs. Now porting CUB-based software to AMD is a lot simpler. Both rocThrust and hipCUB share a library rocPRIM which unites many of the GPU-primitives.
- Porting Gromacs from CUDA to OpenCL [CUDA, OpenCL, C, C++]. Until we ported the simulation software end of 2014, it has been CUDA-only. Porting took several man-months to manually port all code. You can now download the source, build it and run it on AMD/Intel hardware – see here for more info. All is open source, so you can see our code.
- Porting Manchester’s UNIFAC to OpenCL@XeonPhi [OpenCL, C++, MPI]. Even though XeonPhi Knights Corner is not a very performant accelerator, we managed to get a 160x speedup from single threaded code. Most of the speedup is due to clever code-optimizations and less due to low-level optimizations.
- Porting a set of ADSL-algorithms to an embedded special purpose GPU [OpenCL, C, C++]. Allowing central ADSL-routers in large buildings to handle modern ADSL-protocols.
- Optimizing and extending the main image processing framework of a large photo hosting platform [CUDA, C++].
- Flooding simulation [OpenCL, C++, MPI]. Software that simulates flooding of land, which we ported to multi-GPU on OpenCL and got a 35x speedup over MPI.
Demos
- Cartoonizer. The webcam or video stream is “cartoonized” using several image filters on an FPGA using OpenCL.
- Android video filter demo. Real-time Android-app, where the webcam stream has several real-time OpenGL filters applied to make it look like an old movie. This was a proof-of-concept to show we could apply our knowledge to Android and OpenGL.
- Speeding up Excel. A heavy financial algorithm is offloaded to a GPU, resulting in a big speedups. Most Excel-sheets are slow because they’re bigger than where Excel was designed for, so unfortunately offloading does often not work when Excel gets really too slow.
Do you need a secret weapon too? We like to work together with you, to build fast software together. Get in touch to discuss your needs and goals.