
Know that the new SPIR-V is something completely different in implementation, and we are only discussing the old SPIR here.
__kernel void sum(const int size, __global float * vec1, __global float * vec2){ int ii = get_global_id(0); if(ii < size) vec2[ii] += vec1[ii]; }[/raw]
The LLVM-SPIR output + comments
; ModuleID = '/tmp/vincent/spirBinaryTempFile.tmp' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024" target triple = "spir-unknown-unknown"[/raw]
define spir_kernel void @sum(i32 %size, float addrspace(1)* %vec1, float addrspace(1)* %vec2) nounwind {[/raw]
[list1]
- 0 – private
- 1 – global
- 2 – constant
- 3 – local
If there would be GPUs with extended memory (like we have on a CPU motherboard) and we would add an OpenCL extension for "extern memory", then we'd add a addrspace(4) here - just as an example.
%1 = alloca i32, align 4 %2 = alloca float addrspace(1)*, align 4 %3 = alloca float addrspace(1)*, align 4[/raw] Allocating memory for 'size' and for pointers to 'vec1' and 'vec2' (see below for actual value setting). The "align 4" means the variable should be align per 4 bytes - try vectors and arrays and see what happens here, for example "float4" gives an "align 16".
[raw]
%ii = alloca i32, align 4[/raw] Line 1, first step: allocation of the pointer.
[raw]
store i32 %size, i32* %1, align 4[/raw] Storing value 'size' to %1. You can learn from this that if you set a const variable as a parameter, this implies a load from global memory to a local variable.
[raw]
store float addrspace(1)* %vec1, float addrspace(1)** %2, align 4 store float addrspace(1)* %vec2, float addrspace(1)** %3, align 4[/raw] Setting the pointers %2 and %3 to point to the position 0 of vec1 and vec2.
[raw]
%4 = call spir_func i64 @_Z13get_global_idj(i32 0) nounwind readnone %5 = trunc i64 %4 to i32 store i32 %5, i32* %ii, align 4[/raw] Line 1, execution. See below for the declaration of "_Z13get_global_idj".
The function returns a 64 bit variable, and is truncated to 32 bit. I suspect SPIR must truncate all 64-bit variables to 32 bit to be valid, but could not find
[raw]
%6 = load i32* %ii, align 4 %7 = load i32* %1, align 4 %8 = icmp slt i32 %6, %7[/raw] Stack-operations: pushing 'ii' (%ii) and 'size' (%1), executing 'icmp'.
According to the LLVM-manual, the 'icmp' instruction returns a boolean value or a vector of boolean values based on comparison of its two integer, integer vector, pointer, or pointer vector operands. The operator 'slt' stands for "(signed) less than".
[raw]
br i1 %8, label %9, label %19[/raw]
; < label>:9 ; preds = %0 %10 = load i32* %ii, align 4 %11 = load float addrspace(1)** %2, align 4 %12 = getelementptr inbounds float addrspace(1)* %11, i32 %10 %13 = load float addrspace(1)* %12, align 4[/raw] (The space added to the label-tag is because WordPress tries to help out by adding a closing tag). These four lines load the value of vec1[ii] into %13. %10 and %11 are to push the values %ii and %2 (pointer to the start of vec1) onto the stack. The function 'getelementptr inbounds' performs an address calculation (no memory access), and can return a 'poison value' (for simplicity: undefined) when out of range. So if all went ok, then %12 is a pointer to vec1[ii], and the memory access is done with the load-function for %13.
[raw]
%14 = load i32* %ii, align 4 %15 = load float addrspace(1)** %3, align 4 %16 = getelementptr inbounds float addrspace(1)* %15, i32 %14 %17 = load float addrspace(1)* %16, align 4[/raw] As above, to load vec2[ii] into %17.
[raw]
%18 = fadd float %17, %13 store float %18, float addrspace(1)* %16, align 4[/raw] Here the actual computation at last: 'fadd'. When the '+' is replaced by a '*', then this will be 'fmul'. See page 20 of the SPIR-specs for more functions. Finally the result is stored into %16, which is the pointer to vec2[ii].
[raw]
br label %19[/raw] Goto label %19. Looks a bit redundant in this case. I tried: if there was an "else" used in the kernel, an extra label was created for the second branch - this line would then step over the second branch, which is now omitted.
[raw]
;[/raw] End the function, return nothing.
[raw]
} declare spir_func i64 @_Z13get_global_idj(i32) nounwind readnone[/raw] The kernel was "defined" - this is declared. This is a promise that the function will be implemented. I could not find much information how functions like get_global_id are implemented in SPIR, strangely enough. If you know more information, please tell me via the comments.
[raw]
!opencl.kernels = !{!0} !opencl.enable.FP_CONTRACT = !{} !opencl.spir.version = !{!6} !opencl.ocl.version = !{!7} !opencl.used.extensions = !{!8} !opencl.used.optional.core.features = !{!8} !opencl.compiler.options = !{!8}[/raw] These metadata variables are defined in chapters 2.8, 2.9, 2.11, 2.12 and 2.13. For example "!opencl.kernels" points to the metadata entries (defined below) which are kernels - in this case we only have kernel !0.
The variables !6 to !8 are defined further below.
[raw]
!0 = metadata !{void (i32, float addrspace(1)*, float addrspace(1)*)* @sum, metadata !1, metadata !2, metadata !3, metadata !4, metadata !5} !1 = metadata !{metadata !"kernel_arg_addr_space", i32 0, i32 1, i32 1} !2 = metadata !{metadata !"kernel_arg_access_qual", metadata !"none", metadata !"none", metadata !"none"} !3 = metadata !{metadata !"kernel_arg_type", metadata !"int", metadata !"float*", metadata !"float*"} !4 = metadata !{metadata !"kernel_arg_type_qual", metadata !"const", metadata !"", metadata !""} !5 = metadata !{metadata !"kernel_arg_name", metadata !"size", metadata !"vec1", metadata !"vec2"}[/raw] This is described in chapter 2.4 of the specs "Kernel Arg Info" (page 13), and gives extra information about the parameters of the kernel. This is exactly as is I defined the kernel - no questions asked or optimisations done.
[raw]
!6 = metadata !{i32 1, i32 0} !7 = metadata !{i32 0, i32 0} !8 = metadata !{}[/raw] This is needed for !opencl.spir.version, !opencl.ocl.version and !opencl.used.extensions. So we have SPIR version 1.0, OpenCL version 0.0 (?) and no extensions.
Final words
Related Posts
Khronos Releases OpenCL 2.2 With SPIR-V 1.2
... Khronos has released OpenCL 2.2 with SPIR-V 1.2. The most important changes are: A static subset of the ... Add ...
8 reasons why SPIR-V makes a big difference
... news that came out of GDC, I'm most eager to talk about SPIR-V. This intermediate language will make a big difference for ...
The 13 application areas where OpenCL and CUDA can be used
... kind of algorithms are faster when using accelerators and OpenCL/CUDA? Professor Wu Feng and his group from VirginiaTech took a ...
OpenCL Videos of AMD’s AFDS 2012
... was full of talks on OpenCL. You missed them, just like me? Then you will be happy that they put ... when considering them ...