It’s not easy to get the available private memory size – actually it’s impossible to get this information directly from the device/drivers, using the OpenCL API. This can only be explained after you dive deep into clGetKernelWorkGroupInfo – the function that tells you how well your kernel fits on the device. It is strange this function is not often discussed.
Memory sizes
CL_KERNEL_LOCAL_MEM_SIZE
Returns the amount of local memory, in bytes, being used by a kernel (per work-group). Use CL_DEVICE_LOCAL_MEM_SIZE to find out the maximum.
CL_KERNEL_PRIVATE_MEM_SIZE
Returns the minimum amount of private memory, in bytes, used by each work-item in the kernel.
Work sizes
CL_KERNEL_GLOBAL_WORK_SIZE
This answers the question “What is the maximum value for global_work_size argument that can be given to clEnqueueNDRangeKernel?”. The result is of type size_t[3].
CL_KERNEL_WORK_GROUP_SIZE
The is the same for local_work_size. The kernel’s resource requirements (register usage etc.) are used, to determine what this work-group size should be.
CL_KERNEL_COMPILE_WORK_GROUP_SIZE
If __attribute__((reqd_work_group_size(X, Y, Z))) is used, then (X, Y, Z) is returned, else (0, 0, 0).
CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE
It returns a performance-hint: if the total number of work-items is a multiple of this number, then you’ll get good results. So no more remembering 32 or 64 for specific GPUs, but simply kick in a call to this function.
Combined with clDeviceInfo’s CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, you can fine-tune your workgroup-size in case you need the group-size to be as large as possible.
Read more?
You’ll find interesting usages when specifically looking for the flags on Github or Stackoverflow.
Short list of interesting Stackoverflow discussions:
- http://stackoverflow.com/questions/10096443/what-is-the-algorithm-to-determine-optimal-work-group-size-and-number-of-workgro/10098063#10098063
- http://stackoverflow.com/questions/13496681/is-clgetkernelworkgroupinfo-cl-kernel-work-group-size-the-size-opencl-uses-whe
- http://stackoverflow.com/questions/9331696/physical-memory-on-amd-devices-local-vs-private