
It’s not easy to get the available private memory size – actually it’s impossible to get this information directly from the device/drivers, using the OpenCL API. This can only be explained after you dive deep into clGetKernelWorkGroupInfo – the function that tells you how well your kernel fits on the device. It is strange this function is not often discussed.
Memory sizes
CL_KERNEL_LOCAL_MEM_SIZE
Returns the amount of local memory, in bytes, being used by a kernel (per work-group). Use CL_DEVICE_LOCAL_MEM_SIZE to find out the maximum.
CL_KERNEL_PRIVATE_MEM_SIZE
Returns the minimum amount of private memory, in bytes, used by each work-item in the kernel.
Work sizes
CL_KERNEL_GLOBAL_WORK_SIZE
This answers the question “What is the maximum value for global_work_size argument that can be given to clEnqueueNDRangeKernel?”. The result is of type size_t[3].
CL_KERNEL_WORK_GROUP_SIZE
The is the same for local_work_size. The kernel’s resource requirements (register usage etc.) are used, to determine what this work-group size should be.
CL_KERNEL_COMPILE_WORK_GROUP_SIZE
If __attribute__((reqd_work_group_size(X, Y, Z))) is used, then (X, Y, Z) is returned, else (0, 0, 0).
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
It returns a performance-hint: if the total number of work-items is a multiple of this number, then you’ll get good results. So no more remembering 32 or 64 for specific GPUs, but simply kick in a call to this function.
Combined with clDeviceInfo’s CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, you can fine-tune your workgroup-size in case you need the group-size to be as large as possible.
Read more?
You’ll find interesting usages when specifically looking for the flags on Github or Stackoverflow.
Short list of interesting Stackoverflow discussions:
- http://stackoverflow.com/questions/10096443/what-is-the-algorithm-to-determine-optimal-work-group-size-and-number-of-workgro/10098063#10098063
- http://stackoverflow.com/questions/13496681/is-clgetkernelworkgroupinfo-cl-kernel-work-group-size-the-size-opencl-uses-whe
- http://stackoverflow.com/questions/9331696/physical-memory-on-amd-devices-local-vs-private
Related Posts
Nokia Maemo and OpenCL
... smartphone", the Nokia N900 does not support it, while the magic of OpenCL attracts to many developers on the ...
The history of the PC from 2000 – 2012
... than mandatory. Microsoft Windows' "hardware selling magic" to boost PC-sales also lost its speed, since Microsoft was more and ...
Installing both NVidia GTX and AMD Radeon on Linux for OpenCL
... steps, as they will probably not work for you without some magic. Also problems could occur each time there is a kernel-update. Step ...
Avoiding false dependencies in only two steps
... should add, depends heavily on what you are making. The magic that happens here is that software gets into your brain ...