If the algorithm was not designed for parallel execution, it just can’t be directly ported to OpenCL. We have the expertise in both algorithm design and GPU-programming to walk the shortest path to optimal performance.
Designing parallel version of algorithms is an intensive but important task of the speed-up process. By focusing on programmability-aspects such as caching, parallelisability and data-locality while redesigning the algorithms form the ground up, the performance can be maximised. For example a recursive algorithm is very understandable, but it uses much more memory than the stack-based version. We have experience in converting algorithms to be used in various parallel programming languages.
The algorithm-document will help you to fully understand what has changed, how it works and how you can continue from where we delivered.
- Redesign algorithms for optimal performance on parallel architectures
- Implementation in OpenCL, CUDA, OpenMP, MPI and more.
- Full documentation of the redesign process.
Check out the blog-series on Programming Theories to learn more about what we do to make your software more scalable!
Want to know more? Get in contact!
We are the acknowledged experts in OpenCL, CUDA and performance optimization for CPUs and GPUs. We proudly boast a portfolio of satisfied customers worldwide, and can also help you build high performance software. E-mail us today