Basic concepts: malloc in the kernel

22489954_ml
Pointers and allocated memory space with a hint to Oktoberfest.

During the last training I got a question how to do malloc in the kernel. It was one of those good questions, as it gives another view on a basic concept of OpenCL. Simply put: you cannot allocate (local or global) memory from within the kernel. Luckily it’s possible, but it is somewhat hidden in another function.

clSetKernelArg to the rescue

The way to do it is from the host, using one of the kernel arguments.

cl_int clSetKernelArg ( cl_kernel kernel,
cl_uint arg_index,
size_t arg_size,
const void *arg_value)

This function allocates the memory on the device for you. Just as with normal malloc, it doesn’t clear the memory for you.

To make sure the host cannot access it (and you don’t accidentally pin/write/read it, when using host-generation scripts), you can use a flag for that: CL_MEM_HOST_NO_ACCESS. All the flags have been explained in a previous article about this same function, setting flags for creating kernel arguments.

The advantage of only allowing malloc to be done from the host, before the kernel is launched, is that the memory-planning can be done more efficiently.

Local memories

When you need a local space, you can specify that at the kernel-side. For example:

__kernel void foo(__local int* bar) { ... }

This mallocs an area in all local memories with size specified by arg_size.

Basic Concepts

This short article is in the basic concept series. It contains several subjects I did not see well-enough explained in books or the reference manual. If you see a subject that you would like to see in this series, just contact us.

Related Posts

stocks

Improving FinanceBench for GPUs Part II – low hanging fruit

...  was made in tackling the low hanging fruits in the kernels and tackling any potential structural problems outside of the ...

FIA_F1_Austria_2018_Nr._33_Verstappen

The Art of Benchmarking

How fast is your software? The simpler the software setup, the easier to answer this question. The more complex the software, the more the answer will ...

5yearsSC

Birthday present! Free 1-day Online GPGPU crash course: CUDA / HIP / OpenCL

...  to the most normal projects. The training is very basic, but there will be a lot of aha!s that will help in making better ...

network-of-boxes

Problem solving tactic: making black boxes smaller

We are a problem solving company first, specialised in HPC - building software close to the processor. The more projects we finish, the more it's clea ...