I find micro-kernels an important subject, since micro-kernels have clear advantages. In OpenCL 2.0 there are more possibilities to create smaller kernels. Also making smaller and more focused functions is considered good software engineering, defined as “Separation of Concerns“.
For a general introduction to the concept of “Mega Vs Micro” kernels, read “Megakernels Considered Harmful: Wavefront Path Tracing on GPUs” by Samuli Laine, Tero Karras, and Timo Aila of NVIDIA. Abstract:
When programming for GPUs, simply porting a large CPU programinto an equally large GPU kernel is generally not a good approach.Due to SIMT execution model on GPUs, divergence in control flowcarries substantial performance penalties, as does high register us-age that lessens the latency-hiding capability that is essential for thehigh-latency, high-bandwidth memory system of a GPU. In this pa-per, we implement a path tracer on a GPU using a wavefront formu-lation, avoiding these pitfalls that can be especially prominent whenusing materials that are expensive to evaluate. We compare our per-formance against the traditional megakernel approach, and demon-strate that the wavefront formulation is much better suited for real-world use cases where multiple complex materials are present inthe scene.
OpenCL kernels in “SmallLuxGPU” (raytracer, originally made by David) have followed the micro-kernel approach from the very beginning. However, with the merge with LuxRender and the introduction of LuxRender materials, textures, light sources, etc. one of the kernels sized up to the point of being a “Mega-kernel”.
The major problem with “Mega-kernel”, aside of the inability of AMD OpenCL compiler to compile them, is the huge register usage and the very low GPU utilization. Why this happens, is well explained in the paper.
PATHOCL Micro-kernels edition, the results
The number of kernels increases from 2 to 10, the register usage decrease from 196 (!!!) to 3-84 and the GPU utilization rise from a miserable 10% to a more healthy 30%-100%.
A speedup in the 20% to 40% range has been reported on MacOS/Windows + NVIDIA GPUs.
It solves the problems with AMD compiler
Micro-kernels not only improve the performance but also addressees the major issues with AMD OpenCL compiler. For the very first time since the release of first AMD OpenCL SDK beta, I’m not aware of a scene not running on AMD GPUs. This is SATtva’s Mic scene running on GPUs for the first time:
Try it out yourself
This feature will be extended to BIASPATHOCL and available in LuxRender v1.5.
To run with micro-kernels, use “path.microkernels.enable=1”.