Basic concepts: malloc in the kernel

Pointers and allocated memory space with a hint to Oktoberfest.

During the last training I got a question how to do malloc in the kernel. It was one of those good questions, as it gives another view on a basic concept of OpenCL. Simply put: you cannot allocate (local or global) memory from within the kernel. Luckily it’s possible, but it is somewhat hidden in another function.

clSetKernelArg to the rescue

The way to do it is from the host, using one of the kernel arguments.

cl_int clSetKernelArg ( cl_kernel kernel,
cl_uint arg_index,
size_t arg_size,
const void *arg_value)

This function allocates the memory on the device for you. Just as with normal malloc, it doesn’t clear the memory for you.

To make sure the host cannot access it (and you don’t accidentally pin/write/read it, when using host-generation scripts), you can use a flag for that: CL_MEM_HOST_NO_ACCESS. All the flags have been explained in a previous article about this same function, setting flags for creating kernel arguments.

The advantage of only allowing malloc to be done from the host, before the kernel is launched, is that the memory-planning can be done more efficiently.

Local memories

When you need a local space, you can specify that at the kernel-side. For example:

__kernel void foo(__local int* bar) { ... }

This mallocs an area in all local memories with size specified by arg_size.

Basic Concepts

This short article is in the basic concept series. It contains several subjects I did not see well-enough explained in books or the reference manual. If you see a subject that you would like to see in this series, just contact us.

Related Posts

    Basic concepts: Function Qualifiers

    ...  to make this clear when you talk about compile-time of the kernel as this can be confusing. Compile-time of the kernel is at run-time ...

    Basic Concepts: online kernel compiling

    ...  In short is just an OpenCL-program with a variable kernel as input, and thus uses the compilers of Intel, AMD, NVidia or ...

    OpenCL SPIR by example

    ...  Intel's offline OpenCL compiler for compiling the below kernel to SPIR can be done on the command line with: ioc64 ...

    OpenCL error codes (1.x and 2.x)

    ...  in_device could not be further partitioned. -19 CL_KERNEL_ARG_INFO _NOT_AVAILABLE clGetKernelArgInfo if the argument ...