Know that the new SPIR-V is something completely different in implementation, and we are only discussing the old SPIR here.
[raw]
__kernel void sum(const int size, __global float * vec1, __global float * vec2){ int ii = get_global_id(0); if(ii < size) vec2[ii] += vec1[ii]; }
[/raw]
The LLVM-SPIR output + comments
[raw]
; ModuleID = '/tmp/vincent/spirBinaryTempFile.tmp' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024" target triple = "spir-unknown-unknown"
[/raw]
[raw]
define spir_kernel void @sum(i32 %size, float addrspace(1)* %vec1, float addrspace(1)* %vec2) nounwind {
[/raw]
[list1]
- 0 – private
- 1 – global
- 2 – constant
- 3 – local
[/list1]
As you see, addrspace is can be undefined (for size) - see chapter 2.2 "Address space qualifiers" for more information.
If there would be GPUs with extended memory (like we have on a CPU motherboard) and we would add an OpenCL extension for "extern memory", then we'd add a addrspace(4) here - just as an example.
[raw]
%1 = alloca i32, align 4 %2 = alloca float addrspace(1)*, align 4 %3 = alloca float addrspace(1)*, align 4
[/raw]
Allocating memory for 'size' and for pointers to 'vec1' and 'vec2' (see below for actual value setting). The "align 4" means the variable should be align per 4 bytes - try vectors and arrays and see what happens here, for example "float4" gives an "align 16".
[raw]
%ii = alloca i32, align 4
[/raw]
Line 1, first step: allocation of the pointer.
[raw]
store i32 %size, i32* %1, align 4
[/raw]
Storing value 'size' to %1. You can learn from this that if you set a const variable as a parameter, this implies a load from global memory to a local variable.
[raw]
store float addrspace(1)* %vec1, float addrspace(1)** %2, align 4 store float addrspace(1)* %vec2, float addrspace(1)** %3, align 4
[/raw]
Setting the pointers %2 and %3 to point to the position 0 of vec1 and vec2.
[raw]
%4 = call spir_func i64 @_Z13get_global_idj(i32 0) nounwind readnone %5 = trunc i64 %4 to i32 store i32 %5, i32* %ii, align 4
[/raw]
Line 1, execution. See below for the declaration of "_Z13get_global_idj".
The function returns a 64 bit variable, and is truncated to 32 bit. I suspect SPIR must truncate all 64-bit variables to 32 bit to be valid, but could not find
[raw]
%6 = load i32* %ii, align 4 %7 = load i32* %1, align 4 %8 = icmp slt i32 %6, %7
[/raw]
Stack-operations: pushing 'ii' (%ii) and 'size' (%1), executing 'icmp'.
According to the LLVM-manual, the 'icmp' instruction returns a boolean value or a vector of boolean values based on comparison of its two integer, integer vector, pointer, or pointer vector operands. The operator 'slt' stands for "(signed) less than".
[raw]
br i1 %8, label %9, label %19
[/raw]
[raw]
; < label>:9 ; preds = %0 %10 = load i32* %ii, align 4 %11 = load float addrspace(1)** %2, align 4 %12 = getelementptr inbounds float addrspace(1)* %11, i32 %10 %13 = load float addrspace(1)* %12, align 4
[/raw]
(The space added to the label-tag is because WordPress tries to help out by adding a closing tag). These four lines load the value of vec1[ii] into %13. %10 and %11 are to push the values %ii and %2 (pointer to the start of vec1) onto the stack. The function 'getelementptr inbounds' performs an address calculation (no memory access), and can return a 'poison value' (for simplicity: undefined) when out of range. So if all went ok, then %12 is a pointer to vec1[ii], and the memory access is done with the load-function for %13.
[raw]
%14 = load i32* %ii, align 4 %15 = load float addrspace(1)** %3, align 4 %16 = getelementptr inbounds float addrspace(1)* %15, i32 %14 %17 = load float addrspace(1)* %16, align 4
[/raw]
As above, to load vec2[ii] into %17.
[raw]
%18 = fadd float %17, %13 store float %18, float addrspace(1)* %16, align 4
[/raw]
Here the actual computation at last: 'fadd'. When the '+' is replaced by a '*', then this will be 'fmul'. See page 20 of the SPIR-specs for more functions. Finally the result is stored into %16, which is the pointer to vec2[ii].
[raw]
br label %19
[/raw]
Goto label %19. Looks a bit redundant in this case. I tried: if there was an "else" used in the kernel, an extra label was created for the second branch - this line would then step over the second branch, which is now omitted.
[raw]
;
[/raw]
End the function, return nothing.
[raw]
} declare spir_func i64 @_Z13get_global_idj(i32) nounwind readnone
[/raw]
The kernel was "defined" - this is declared. This is a promise that the function will be implemented. I could not find much information how functions like get_global_id are implemented in SPIR, strangely enough. If you know more information, please tell me via the comments.
[raw]
!opencl.kernels = !{!0} !opencl.enable.FP_CONTRACT = !{} !opencl.spir.version = !{!6} !opencl.ocl.version = !{!7} !opencl.used.extensions = !{!8} !opencl.used.optional.core.features = !{!8} !opencl.compiler.options = !{!8}
[/raw]
These metadata variables are defined in chapters 2.8, 2.9, 2.11, 2.12 and 2.13. For example "!opencl.kernels" points to the metadata entries (defined below) which are kernels - in this case we only have kernel !0.
The variables !6 to !8 are defined further below.
[raw]
!0 = metadata !{void (i32, float addrspace(1)*, float addrspace(1)*)* @sum, metadata !1, metadata !2, metadata !3, metadata !4, metadata !5} !1 = metadata !{metadata !"kernel_arg_addr_space", i32 0, i32 1, i32 1} !2 = metadata !{metadata !"kernel_arg_access_qual", metadata !"none", metadata !"none", metadata !"none"} !3 = metadata !{metadata !"kernel_arg_type", metadata !"int", metadata !"float*", metadata !"float*"} !4 = metadata !{metadata !"kernel_arg_type_qual", metadata !"const", metadata !"", metadata !""} !5 = metadata !{metadata !"kernel_arg_name", metadata !"size", metadata !"vec1", metadata !"vec2"}
[/raw]
This is described in chapter 2.4 of the specs "Kernel Arg Info" (page 13), and gives extra information about the parameters of the kernel. This is exactly as is I defined the kernel - no questions asked or optimisations done.
[raw]
!6 = metadata !{i32 1, i32 0} !7 = metadata !{i32 0, i32 0} !8 = metadata !{}
[/raw]
This is needed for !opencl.spir.version, !opencl.ocl.version and !opencl.used.extensions. So we have SPIR version 1.0, OpenCL version 0.0 (?) and no extensions.
Final words
Hi, the declaration you are thinking about
declare spir_func i64 @_Z13get_global_idj(i32) nounwind readnone
means that the spir compiler declares that there is such a function which will be defined later. In this case the get_global_id() is defined by each opencl vendor (manufacturer), not by the spir compiler. The function call and it’s actual implementation is linked when compiled to the final device code.