# OPENCL ON THE FPGA

So founders can drink beer now

Huh?

#### Huh?

# OPENCL ON THE FPGA

SO FOUNDERS CAN DRINK BEER NOW

Huh?

#### HI! I'M VINCENT HINDRIKSEN

- FOUNDER OF A COMPANY CALLED STREAMCOMPUTING
- MY COLLEAGUES KNOW HOW TO MAKE SOFTWARE FAST, REALLY FAST!
- WE USE EVERYTHING, FROM ALGORITHM-OPTIMISATIONS TO DUCT-TAPE.
- ONE OF THE TARGET DEVICES WE USE ARE FPGAS

# SEE OUR HOMEPAGE FOR BRAGGING POSTS ON SPEEDUPS WE GOT

| STREAM                                                                                                                                                                                                                                                                                                                                                  | Stay up-to-date: 🕒 🔝 – Get in touch <i>today</i> : 📞 💷         |                 |                                                                   |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------|-----------------|-------------------------------------------------------------------|
| Computing<br>Performance Engineers Company ~                                                                                                                                                                                                                                                                                                            | Software Development 👻                                         | Consultancy ~   | Training Y Knowledge Base                                         |
| Porting Manchester's UNIFA                                                                                                                                                                                                                                                                                                                              | C to OpenCL@3                                                  | KeonPhi: 1      | 60x speedup                                                       |
| « Our "new" Bug Hunting & Removal Service «<br>Written by Vincent Hindriksen                                                                                                                                                                                                                                                                            |                                                                |                 | Search this site                                                  |
| November 04, 2015 General O Com<br>As we cannot use the performance results for most<br>of our commercial projects because they contain<br>sensitive data, we were happy that Dr. David<br>Topping from the University of Manchester was so<br>kind to allow us to share the data for the UNIFAC<br>project. The goal for this project was simple, port | b)<br>13<br>13<br>13<br>13<br>13<br>13<br>13<br>13<br>13<br>13 | Se Print To PDF | Blog-post of the day<br>DirectCompute's unp<br>We are a Khronos C |
| the UNIEAC algorithm to the Intal Yaon Philusing                                                                                                                                                                                                                                                                                                        | 1 1.16                                                         | 11              |                                                                   |

# From 33 seconds to 0.068 seconds

### OPENCL – THE COMPUTE LANGUAGE

#### • YOUNG LANGUAGE, STARTED ONLY IN 2009

- ENABLES COMPUTE PARALLELISM AND EXPLICIT MEMORY MANAGEMENT IN ONE C-LIKE LANGUAGE.
- Open standard supported by many big companies, like AMD, NVidia, Intel, Altera, Xilinx, ARM, Vivante, Imagination, Texas Instruments, Qualcomm, etc.
- NOW AT VERSION 2.1, BUT NO DRIVERS AVAILABLE YET.
- OPENCL 1.2 IS THE MOST USED VERSION, WHERE 2.0 IS GETTING TRACTION QUICKLY.

#### OPENCL – THE COMPUTE LANGUAGE

- WORKS ON CPUS YOU KNOW
- Works on GPUS graphics units that now also process matrices
- Works on DSPs Digital Signal processors like your soundcard
- WORKS ON FPGAS OUR SUBJECT
- WORKS ON STEROIDS BECAUSE IT'S FAST

#### OPENCL PARALLELISM IN 3 SIMPLE STEPS

- TAKE A SMALL FUNCTION (KERNEL) THAT OPERATES ON ONE DATA ELEMENT
- DEFINE THE DATA TO OPERATE ON
- COMBINE THE ABOVE TWO AND TADA!
- (THEN OPTIMISE THE HELL OUT OF IT, WHICH TAKES MOST OF THE EFFORT)

#### APPLY KERNEL-FUNCTION TO EACH ELEMENT



YOUR MIND

#### WHAT IS AN FPGA?

- IT'S A PROCESSOR WHERE THE PROGRAM DOESN'T RUN ON IT, BUT THE PROCESSOR IS THE SOFTWARE ITSELF. SO... IT IS ACTUALLY HARDWARE-ONLY PROGRAMS.
- IT RUNS PROGRAMS WITH VERY LOW LATENCY THE TIME TO PROCESS SIMPLE INPUT CAN BE DONE UNDER 0.1 MICROSECOND = 0.0001 MILLISECOND = 0.0000001 SECOND.
- IT'S MUCH COOLER THAN A GPU. MOSTLY BECAUSE IT USES 10X LESS POWER.
- MANY BRANDS, TWO BIGGEST ARE ALTERA AND XILINX

#### WHY PROGRAM ALSO FPGAS WITH OPENCL?

- STANDARD LANGUAGES VHDL AND VERILOG ARE FOR DESIGNING IP (FUNCTIONALITY)
- OPENCL IS FOR DESIGNING FULL SYSTEMS (CONNECT IP TO RESOURCES LIKE MEMORY)

• OPENCL IS GOOD IN DEFINING PARALLELISM

#### OPENCL ON FPGAS

- ONLY ON ALTERA AND XILINX
- Altera is slowly getting to version 1.2
- XILINX IS STILL PRETTY MUCH BETA-STATE, BUT IT WORKS.

#### STANDARD "SIMD" PARALLELISM

- CORE 1 PROCESSES TASK 1
- CORE 2 PROCESSES TASK 2
- CORE 3 PROCESSES TASK 3
- CORE 4 PROCESSES TASK 4
- Etc

THNG IS: FPGA DOESN'T HAVE "CORES". SO WHAT NOW?



- FPGA USES PIPELINE
- EACH TASK IS PIPED THROUGH. WHEN A PHASE IS FINISHED, ALL TASKS HOP FORWARD ONE POSITION.
- IF THERE IS SPACE, MULTIPLE IDENTICAL PIPELINES CAN BE CREATED.

## PARALLELISM ON FPGAS 2/2



• MULTIPLE PIPELINES

#### EXAMPLE! DEVICE-SIDE

\_\_\_KERNEL VOID VECTOR\_ADD(\_\_GLOBAL CONST FLOAT \*X, \_\_\_GLOBAL CONST FLOAT \*Y, \_\_\_GLOBAL FLOAT \*RESTRICT Z)

// GET INDEX OF THE WORK ITEM
INT INDEX = GET\_GLOBAL\_ID(0);
// ADD THE VECTOR ELEMENTS
Z[INDEX] = X[INDEX] + Y[INDEX];

### HOST-SIDE BOILERPLATE (STANDARD)

1. GET A LIST OF AVAILABLE PLATFORMS 8. COMPILE KERNEL 2. SELECT DEVICE **3. CREATE CONTEXT** 4. CREATE COMMAND QUEUE 5. CREATE MEMORY OBJECTS 13.FREE OBJECTS 6. READ KERNEL FILE 7. CREATE PROGRAM OBJECT

9. CREATE KERNEL OBJECT **10.SET KERNEL ARGUMENTS** 11.EXECUTE KERNEL (ENQUEUE TASK) 12. READ MEMORY OBJECT

# HOST-SIDE BOILERPLATE (WHEN FPGA-ONLY)

1. SELECT FPGA-DEVICE 8. READ MEMORY OBJECT

- 2. CREATE CONTEXT AND COMMAND 9. FREE OBJECTS QUEUE
- 3. CREATE MEMORY OBJECTS
- 4. READ PRE-COMPILED KERNEL FILE
- 5. CREATE KERNEL OBJECT OF PRE-COMPILED KERNEL
- 6.SET KERNEL ARGUMENTS
- 7. EXECUTE KERNEL (ENQUEUE TASK)

OR:

1. COPY HOST-CODE FROM PREVIOUS PROJECT

2. ALTER TILL IT WORKS

# TRY YOURSELF FPGA



- CHEAP BOARD \$250
- GIVES ONE YEAR LICENCE (NOT SURE IF STILL APPLICABLE...)
- IT RUNS "HELLO WORLD" AND MORE!

### WANT TO KNOW MORE?

- UHM, UH... GOOGLE?
- FOLLOW @OPENCLONFPGAS ON TWITTER
- CHECK WEBSITES OF ALTERA AND XILINX
- Ask us

