Basic Concepts: Writing OpenCL code for single and double precision

What’s precise enough?

Support for double precision floating-point type double in OpenCL kernels requires an extension. AMD provides cl_khr_fp64 for newer high-edn hardware, but also a non-fully compliant cl_amd_fp64 extension for other hardware. NVIDIA and Intel support the cl_khr_fp64, so no exceptions need to be made for those drivers.

The code you see bellow these lines is based on a page you can find on Bealto and it was written by Eric Bainville. I added extra typedefs, removed a constant and added DOUBLE_SUPPORT_AVAILABLE for easier fallback.


#if defined(cl_khr_fp64)  // Khronos extension available?
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)  // AMD extension available?
#pragma OPENCL EXTENSION cl_amd_fp64 : enable



// double
typedef double real_t;
typedef double2 real2_t;
typedef double3 real3_t;
typedef double4 real4_t;
typedef double8 real8_t;
typedef double16 real16_t;
#define PI 3.14159265358979323846


// float
typedef float real_t;
typedef float2 real2_t;
typedef float3 real3_t;
typedef float4 real4_t;
typedef float8 real8_t;
typedef float16 real16_t;
#define PI 3.14159265359f


A macro is defined by the OpenCL C compiler for each available extension, which is cl_khr_fp64 in this example. This macro can be tested to enable the extension with #pragma OPENCL EXTENSION cl_khr_fp64 : enable.

Now, you need to use the defined constant(s) and real_t, real2_t types instead of float or double. The definition of CONFIG_USE_DOUBLE is passed as compilation option to clBuildProgram to make the switch between double and single precision. If there is no double-support, it falls back to single precision.

Enjoyed this post? Share it!

Related Posts


Improving FinanceBench for GPUs Part II – low hanging fruit

...  the basics We could have chosen to rewrite the algorithms from scratch, but ...  for easy improvements, but needs some ...


The Art of Benchmarking

...  of liking to make algorithms that solve labyrinths and writing code that measures efficiency of the algorithm - for various reasons ...


Birthday present! Free 1-day Online GPGPU crash course: CUDA / HIP / OpenCL

...  goal is hands on. We'll provide templates for CUDA/HIP and OpenCL, where you focus on making the core compute part. The third goal ...


Problem solving tactic: making black boxes smaller

...  know you don't knowAristotle Practical example: porting code Let's start with reverse engineering code, before going to porting. ...