9 questions on OpenCL’s future answered at IWOCL

IWOCL-logoDuring the panel discussion some very interesting questions were asked, I’d like to share with you.

Should the Khronos group poll the community more often about the future of OpenCL?

I asked it on twitter, and this is the current result:khronos-community-feedback

Khronos needs more feedback from OpenCL developers, to better serve the user base. Tell the OpenCL working group what holds you back in solving your specific problems here. Want more influence? You can also join the OpenCL advisory board, or join Khronos with your company. Get in contact with Neil Trevett for more information.

How to (further) popularise OpenCL?

While the open standard is popular at IWOCL, it is not popular enough at universities. NVidia puts a lot of effort in convincing academics that OpenCL is not as good as CUDA and to keep CUDA as the only GPGPU API in the curriculum.

Altera: “OpenCL is important to be thought at universities, because of the low-level parts, it creates better programmers”. And I agree, too many freshly graduated CS students don’t understand malloc() and say “The compiler should solve this for me”.

The short answer is: more marketing.

At StreamHPC we have been supporting OpenCL with marketing (via this blog) since 2010. 6 years already. We are now developing the website opencl.org to continue the effort, while we have diversified at the company.

How to get all vendors to OpenCL 2.0?

Ofcourse this was a question targeted at NVidia, and thus Neil Trevett answered this one. Use a carrot and not a stick, as it is business in the end.

Think more marketing and more apps. We already have a big list:opencl-library-ecosphere

Know more active(!) projects? Share!

Can we break the backwards compatibility to advance faster?

This was a question from the panel to the audience. From what I sensed, the audience and panel are quite open to this. This would mean that OpenCL could make a big step forward, fixing the initial problems. Deprecation would be the way to go the panel said. (OpenCL 2.3 deprecates all nuisances and OpenCL 3.0 is a redesign? Or will it take longer?)

See also the question below on better serving FPGAs and DSPs.

Should we do a specs freeze and harden the implementations?

Michael Wong (OpenMP) was clear on this. Learn from C++98. Two years were focused on hardening the implementations. After that it took 11 years to restart the innovation process and get to C++11! So don’t do a specs freeze.

How to evolve OpenCL faster?

Vendor extensions are the only option.

At StreamHPC we have discussed a lot about it, especially fall-backs. In most cases it is very doable to create slower fall-backs, and in other cases (like with special features on i.e. FPGAs) it can be the only option to make it work.

How to get more robust OpenCL implementations?

Open sourcing the Vulkan conformance tests was a very good decision to make Vulkan more robust. Khronos gets a lot of feedback on the test cases. It will be discussed soon to what extend this also can be done for OpenCL.

Test-cases from open source libraries are often used to create more test cases.

How to better support FPGAs and DSPs?

Now GPUs are the majority and democracy doesn’t work for the minorities.

An option to better support FPGAs and DSPs in OpenCL is to introduce feature sets. A lesson learnt from Vulkan. This way GPU vendors don’t need to spend time implementing features that they don’t find interesting.

Do we see you at IWOCL 2017?

Location will be announced later. Boston and Toronto are mentioned.

Knowledge

In this part of the website you can find the information around “performance engineering” we want to share with you.

StreamComputing is 2 years old! A personal story.

More than two years ago, on 13 January 2010, I wrote my first blog-post. Four months later StreamComputing (redacted: rebranded to StreamHPC in 2017) was both official and unknown. I want to share with you my personal story on how I got to start-up this company.

The push-factor

I wanted to create a company which was about innovative projects –  something I had hardly encountered until then. The years before I programmed parts of A-to-B-flows, as I call them. That is software that is in the base quite simple, but tediously discussed as very, very complex.

“Complex” software

The complexity is not the software, as you can see. It is undocumented APIs, forgotten knowledge, knowledge in heads of unknown people, bossy and demanding people who friendly ask for last-minute architecture changes, deadlines around promotion-rounds, new deadlines due to board-decisions, people being afraid of getting replaced if the software is finished, jealousy if another team makes version 2 of the software, etc. The rule of office-software is therefore understandable:

Software is either unfinished,
or turned into a platform for unintended functionality.

The fun in office-software is there for analyst, architect or manager – the developer just puts in his earphones and makes all the requested changes (hooray for services like Spotify). But as I did not want to become a manager and wished to keep improving my development skills, I had to conclude I was on the wrong track.

Continue reading “StreamComputing is 2 years old! A personal story.”

“That is not what programmers want”

the-miracle-middle-colour2
I think you should be more explicit here in step two” (original print)

This post is part of the series Programming Theories, in which we discuss new and old ways of programming.

When discussing the design of programming languages or the extension of existing ones, the question What concepts can simplify the tasks of the programmer? always triggers lots of interesting debates. After that, when an effective solution is found, inventors are cheered, and a new language is born. Up ’till this point all seems ok, but the problem comes with the intervention of the status quo: C, C++, Java, C#, PHP, Visual Basic. Those languages want the new feature implemented in the way their programmers expect it. But this would be like trying to implement the advantages of a motorcycle into a car without paying attention to the adjustments needed by the design of the car.

I’m in favor of learning concepts instead of doing new things the old way… but only when the latter has proven to be better than the former. The lean acceptance of i.e. functional languages tells a lot about how it goes in reality (with great exceptions like LINQ). That brings a lot of trouble when moving to multi-core. So, how do we get existing languages to change instead of just evolve?

High Level Languages for Multi-Core

Let’s start with a quote from Edsger Dijkstra:

Projects promoting programming in “natural language” are intrinsically doomed to fail.

In other words: a language can be too high level. A programmer needs the language to be able to effectively micro-manage what is being done. We speak of concerns for a reason. Still, the urge to create the highest programming language is strong.

Don’t get me wrong. A high-level language can be very powerful once its concepts define both ways. One way concerns the developer: does the programmer understand the concept and the contract of the command or programming style being offered? The other concerns the machine: can it be effectively programmed to run the command, or could a new machine be made to do just that? This two-side contract is one of the reasons why natural languages are not fit for programming.

And we have also found out that binary programming is not fit for humans.

The cartoon refers to this gap between what programmers want and what computers want.

Continue reading ““That is not what programmers want””

OpenCL Wrappers

Mostly providing simplified kernel-code with more convenient error-checking, but sometimes with quite advanced additions: the wrappers for OpenCL 1.x. As OpenCL is an open standard, these projects are an important part of the evolution of OpenCL.

You won’t find solutions that provide a new programming paradigm or work with pragmas, generating optimised OpenCL. This list is in the making.

C++

Goopax: Goopax is an object-oriented GPGPU programming environment, allowing high performance GPGPU applications to be written directly in C++ in a way that is easy, reliable, and safe.

OCL-Library: Simplified OpenCL in C++. Documentation. By Prof. Tim Warburton and David Medina.

Openclam: possibility to write kernels inside C++ code. Project is not active.

EPGPU: provides expressions in C++. Paper. By Dr. Lawlor.

VexCL: a vector expression template library. Documentation.

Qt-C++. The advantages of Qt in the OpenCL programmer’s hands.

Boost.Compute. An extensive C++ library using Boost. Documentation.

ArrayFire. A wrapper and a library in one.

OpenCLHelper. Easy to run kernels using OpenCL.

SkelCL. A more advanced C++ wrapper, providing various skeleton functions.

HPL, or the Heterogeneous Programming Library, where special Arrays are used to easily communicate between CPU and GPU.

ViennaCL, a wrapper around an OpenCL-library focused on linear algebra.

C, Objective C

C Framework for OpenCL. Rapid development of OpenCL programs in C/C++.

Simple OpenCL. Much simpler host-code.

COPRTHR: STDCL: Simplified programming interface for OpenCL designed to support the most typical use-cases in a style inspired by familiar and traditional UNIX APIs for C programming.

Grand Central Dispatch: integration into Apple’s environment. Documentation [PDF].

GoCL: For combination with Gnome GLib/GObject.

Computing-Language-Utility: C/C++ wrapper by Intel. Documentation included, slides of presentation here.

Delphi/Pascal

Delphi-OpenCL: Delphi/Pascal-bindings for OpenCL. Seems not active.

OpenCLforDelphi: OpenCL 1.2 for Delphi.

Fortran

Howto-article: It describes how to link with c-files. A must-read for Fortran-devs who want to integrate OpenCL-kernels.

FortranCL: OpenCL interface for Fortran 90. Seems to be the only matured wrapper around. FAQ.

Wim’s OpenCL Integration: contains a very simple f95 file ‘oclWrapper.f95’.

Go

GOCL: Go OpenCL bindings

Go-OpenCL: Go OpenCL bindings

Haskell

HopenCL: Haskell-bindings for OpenCL. Paper.

Java

JavaCL: Java bindings for OpenCL.

ClojureCL: OpenCL 2.0 wrapper for Clojure.

ScalaCL: Much more advanced integration as could be done with JavaCL.

JoCL by JogAmp: Java bindings for OpenCL. Good integration with siter-projects JoGL and JoAL.

JoCL.org: Java bindings for OpenCL.

The Lightweight Java Game Library (LWJGL): Support for OpenCL 1.0-1.2 plus extensions and OpenGL interop.

Aparapi, a very high level language for enabling OpenCL in Java.

JavaScript

Standardised WebCL-support is coming via the Khronos WebCL project.

Nokia WebCL. Javascript bindings for OpenCL, which works in Firefox.

Samsung WebCL. Javascript bindings for OpenCL, which works in Safari and Chromium on OSX.

Intel Rivertrail. built on top of WebCL.

Julia

JuliaGPU. OpenCL 1.2 bindings for Julia.

Lisp

cl-opencl-3b: Lisp-bindings for OpenCL. not active.

.NET: C#, F#, Visual Basic

OpenCL.NET: .NET bindings for OpenCL 1.1.

Cloo: .NET bindings for OpenCL 1.1. Used in OpenTK, which has good integration with OpenGL, OpenGL|ES and OpenAL.

ManoCL: Not active project for .NET bindings.

FSCL.Compiler: FSharp OpenCL Compiler

Perl

Perl-OpenCL: Perl bindings for OpenCL.

Python

General article

PyOpenCL: Python bindings for OpenCL with convenience wrappers. Documentation.

Cython: C-extension for Python. More info on the extension.

PyCL: not active.

PythonCL: not active.

Clyther: Python bindings for OpenCL. No code yet, but check out the predecessor.

Ruby

Ruby-OpenCL: Ruby bindings for OpenCL. Not active.

Barracuda: seems to be not active.

Rust

Rust-OpenCL: Rust-bindings for OpenCL. Blog-article by author

Math-software

Mathematica

OpenCLLink. OpenCL-bindings in Mathematica 9.

Matlab

There is native support in Matlab.

OpenCL-toolbox. Alternative bindings. Not active. Works with Octave.

R

R-OpenCL. Interface allowing R to use OpenCL.

 

Suggestions?

If you know not mentioned alike projects, let us know! Even when not active, as that is important information too.

Windows on ARM

In 2010 Microsoft got interested in ARM, because of low-power solutions for server-parks. ARM tried to lobby for years to convince Microsoft to port Windows to their architecture and now the result is there. Let’s not look to the past, why they did not do it earlier and depended completely on Intel, AMD/ATI and NVIDIA. NB: This article is a personal opinion, to open up the conversation about Windows plus OpenCL.

While Google and Apple have taken their share on the ARM-OS market, Microsoft wants some too. A wise choice, but again late. We’ve seen how the Windows-PC market was targeted first from the cloud (run services in the browser on any platform) and Apple’s user-friendly eye-candy (A personal computer had to be distinguished from a dull working-machine), then from the smartphones and tablets (many users want e-mail and a browser, not sit behind their desk). MS’s responses were Azure (Cloud, Q1 2010), Windows 7 (OS with slick user-interface, Q3 2009), Windows Phone 7 (Smartphones, Q4 2010) and now Windows 8 (OS for X86 PCs and ARM tablets, 2012 or later).

Windows 8 for ARM will be made with assistance from NVIDIA, Qualcomm and Texas Instruments, according to their press-release [1]. They even demonstrated a beta of Windows 8 running Microsoft Office on ARM-hardware, so it is not just a promise.

How can Microsoft support this new platform and (for StreamHPC more important) what will the consequences be for OpenCL, CUDA and DirectCompute.

Continue reading “Windows on ARM”

All the members of the OpenCL working group 2010

(If you’re searching for companies who offer OpenCL-products and services, please visit OpenCL:Pro)

You probably have heard AMD is on the OpenCL working group of Khronos; but there are many more and they possibly all have plans to use it. Here is an overview, so you can make your own conclusions about the future that lays ahead. Is your company on “the list”?

We’re specially interested in the less known companies, so most information is about the companies you and us possibly have not heard from before. We’ve made  assumptions what the companies use OpenCL for, so we need your feedback if you think we’re wrong! Most of these companies have not openly written about their (future) accelerated products, so we had to make those guesses.

Disclaimer: All brand and product names are or may be trademarks of, and are used to identify products or services of, their respective owners.

Last updated 6-Oct-2010.

GPU Manufacturers

GPUs being the first products targeted by OpenCL, we blast away with a list of CPU-manufacturers. You might see some unknown companies and now know which companies missed the train; it is pretty clear why GPU-manufacturers have interest in OpenCL.
We skip the companies who have a GPU-stack built upon ARM-techology and only focus on pure GPU-manufacturers in this category.

AMD

We’ve already discussed the biggest fan of OpenCL several times. While having better GPU-cards than NVIDIA (arguable per quarter of the year), they put their bets completely on OpenCL. They even get credits like “AMD’s OpenCL” when compared with NVIDIA’s CUDA.

The end of 2010, beginning of 2011 they will ship their Fusion-product having a CPU and GPU on one chip. The first Fusion-chips will not have a high-end GPU because of heating problems, is told to PC-store employees.

NVIDIA

AMD’s biggest competitor with the very well marketed similar product CUDA. Currently they have the most specialised products in market for servers. While they put more energy in their own technology CUDA, it must be said that they have adopted OpenCL more than any other hardware vendor.

Intel

The biggest part of the CPU-market is for Intel en guess once, who has the biggest GPU-market in hands? Correct: onboard-GPUs are Intel’s speciality, but their high-end GPU Larrabee might once see the market. Just like AMD they have the technology (and products) to have an integrated CPU/GPU which will be very interesting for the upcoming OpenCL-market.

They are openly interested in OpenCL. Here is a nice interview which explains how a CPU-designer looks at GPU-designs.

Vivante

Vivante manufactures GPU-chips. They claim their OpenGL ES 2.0-compliant silicon footprint is the smallest on the market. There is a lot of talk about OpenGL Shader Language (OpenCL’s grandpa), for which their products are very well suited for. Quote: “The recent trend in graphics hardware has been to replace fixed functionality with programmability in areas that have grown exceedingly complex, such as vertex processing and fragment processing. The OpenGL® Shading Language was designed to allow application programmers to express the processing that occurs at those programmable points of the OpenGL pipeline. Independently compilable units written in this language are called shaders. A program is a set of shaders that are compiled and linked together.”

Takumi

Japanese corporation Takumi manufactures the GSHARK, a 2D/3D hardware accelerator. The focus is on shaders, like Vivante.

Imagination Technologies (ImTech)

From their homepage: >>POWERVR enables a powerful and flexible solution for all forms of multimedia processing, including 3D/2D/vector graphics and general purpose processing (GP-GPU) including image processing.

POWERVR’s unique tile-based, deferred rendering/shading architecture allows a very small area of a die to deliver higher performance and image quality at lower power consumption than all competing technologies. All major APIs are supported including OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX9/10.1 and OpenCL.<<

Currently all ARM-based OpenCL-capable devices have POWERVR-technology.

Toshiba

Like other huge Japanese everything-factories, you don’t know what else they make. Besides rice cookers they also make multimedia chips.

S3

Once they were big in the consumer-market of graphics cards, but S3 still exists as a more business-oriented manufacturer of graphics products.

CPU Manufacturers

We miss the Power Architecture, but IBM and Freescale are members of this group.

Intel

While AMD tries to make OpenCL available for the CPU, we have not heard of a similar product from Intel yet. They see a future for multi-core CPUs, as seen in these slides.

ARM

Most known for its same-named low-power processor, not supported by MS Windows. You can read below how many companies have a license on their technology. Together with POWERVR-technology they power all the embedded OpenCL devices of the coming year.

IBM

Currently they are most known for their Cell-processor (co-developed with Toshiba and Sony) and have a license to build PowerArchitecture-CPUs. The Cell has full OpenCL-support as first non-GPU. Older types of PS3s (without the latest firmware) ad IBM’s servers can use the power of OpenCL. End of June 2010 Khronos conformed their “Development Kit for Linux” for Power VMX and PowerXCell8i processors.

Freescale

Once a Motorola-division, they make lots of different CPUs. Besides ARM- and PowerArchitecure-based ones, they also have it’s own ‘Coldfire’. We cannot say for which architecture they are interested in OpenCL, but we really would like to hear something from them since they can open many markets for OpenCL.

Systems on a Chip (SoC)

While it is cool to have a GPU-card in your pc, more and more the Graphics-functionality is integrated onto a CPU. Especially in the mobile/embedded/gadget-market you’ll find such System-on-a-Chip solutions, which are actually all ARM- or PowerArchitecture based.

3DLABS (ZiiLabs)

Creators of embedded hardware with focus on handhelds. They have partners of Khronos for a long time, having built the first merchant OpenGL GPU, the GLINT 300SX. They have just released a multimedia-processor, which is an ARM-processor with pretty interesting graphic capabilities.

They have an “early access program for OpenCL” for their ZMS product line.

Movidia

On their Technology overview-page they imply they have flexible accelerators in their designs, which *could* in the future be controlled by OpenCL-kernels. They manufacture mobile GPUs-plus-loads-of-extras which are quite impressive.

Texas Instruments

Besides ARM-based processors they also have DSPs. We watch them, for which product they have OpenCL in mind.

Qualcomm

They might be most famous for their ARM-based Snapdragon-chipset. They have much more products, but we think they start with Snapdragon before building OpenCL in other products.

Apple

The Apple A4 powers their new products, the iPad. It becomes more and more clear Apple has really learned that you cannot rely on one supplier, after waiting for IBM’s G6. With OpenCL Apple can now make software that works on ARM, all kind of GPUs and CPUs.

Samsung

They make anything that is fed by batteries, so for that reason they should be in the “other” category: mobile phones, mp3-players, photo-cameras, camcorders, laptops, TVs, DVD-players and Bluray-players. All products where OpenCL can wield.

A good reason to make their own semi-conductors, ARM-based.

In the beginning of June 2010 they have launched their own Linux-based OS for mobiles: Bada.

Broadcom

Manufactures networking and communications ICs for data, voice, and video applications. They could use OpenCL for their mobile multimedia processors.

Seaweed

Since September acquired by Presagis. We cannot be sure they continue the OpenCL-business of Seaweed, but at least GPGPU is mentioned once.

Presagis is “the worldwide leader in embedded graphics solutions for mission-critical display applications.  The company has provided human-machine interface (HMI) graphical modeling tools, drivers and devices for embedded systems for over 20 years. Presagis pioneered both the prototyping of display graphics and automatic code generation for embedded systems in the 1990s. Since then, code generated by its flagship HMI modeling products  has been deployed to hundreds of aircraft worldwide and its software has been certified on over 30 major aircraft programs worldwide.   Presagis is your trusted partner for reliable, high-performance embedded graphics products and services.”

ST Microelectronics

ST has many products: “Singapore Technologies Electronics is a leader in ICT. It has main businesses in Enterprise, Satellite Communications and Interactive Digital Media. It is divided into several Strategic Business Units consisting of Info-Comms, Info-Software, Training and Simulation, Electro-Optics, Large Scale Group, Satcom & Sensor Systems.”

We think they’ve shown interest for OpenCL for use with their Imaging processors. Together with Ericsson they have a joint-venture in de mobile market, ST-Ericsson.

Handheld Manufacturers

While most companies will find it hard to make OpenCL-business in the consumer-market, consumer-products of other companies make sales a little bit warmer.

Apple

At least the iPad and iPhone have hardware-capabilities of running OpenCL. It is expected that it will come available in the next major release of the iPhone-OS, iOS 4. We’re waiting for more news.

Nokia

The largest manufacturer of mobile phones from Finland has a lot of technology. Besides smartphones, possibly a netbook (in cooperation with Intel) they also have Symbian and the QT-library. Since a while QT has support for OpenCL. We think the support of OpenCL in programming languages (in a more high-level way) is very important. See these slides to read some insights of the company.

Motorola

They have consumer products like mobile phones and business products like networking. It is not clear where they are going to use OpenCL for, since they mostly use other companies’ technologies.

Super-computers

While OpenCL can revive old computers once upgraded with a new GPU, imagine what they can do with Super-computers.

IBM

IBM builds super-computers based on different technologies. With OpenCL-support for their Power VMX and PowerXCell8i processors, it is already possible to use OpenCL with IBM-hardware.

Fujitsu

They have many products, but they also make super-computers which use GPGPU.

Los Alamos National Laboratory

They build super-computers and really can use the extra power.

A job-post talks about heterogeneous architectures and OpenCL.

Petapath

Petapath, founded in 2008, focuses on delivering innovative hardware and software solutions into the high performance computing (HPC) and embedded markets. As can be seen from their homepage they build grids.

NVIDIA

As a newcomer in the super-computer business, they do very well having helped to build the #2 HPC. Many clusters are upgraded with their streaming-processors.

Other Hardware

We don’t know what they are actually doing with the technology, purely because they are to big to make assumptions.

GE

US-based electronics-giant General Electronics builds everything there is, fed by electricity and now also GPGPU-powered solutions as can be found on their GPGPU-page. They probably switched to CUDA.

ST-Ericsson

Ericsson together with ST they have a joint-venture in de mobile market, ST-Ericsson. Ericssson is big in (mobile) networking. It also builds mobile phones with Sony. It is unclear what the joint-venture wants to do with the technology, but it must be mobile.

Software Developers

While OpenCL is very close to hardware, we have to talk software too. Did anybody say there is a strict line between hardware and software?

Graphic Remedy

Builders of debugging software. You will hear later more from us about this company soon. See something about debugging in this presentation.

RapidMind

RapidMind provided a software product that aims to make it simpler for software developers to target multi-core processors and accelerators (GPUs). It was acquired by Intel in august 2009.

HI

Japanese corporation HI has a product MascotCapsule, which is a real-time 3D rendering engine (native library) that runs on embedded devices. We see names of other companies, except SMedia. If you’re not familiar with mobile GPUs, here you have a list.

This is another big hint, OpenCL will have a big future on mobile devices.

MascotCapsule V4 product specification

Operating
environment
CPU ARM: ARM9 or above
Freescale: i.MX Series
Marvell: XScale
Qualcomm: MSM6280/6550/7200/7500 etc.
Renesas Technology: SH-Mobile etc.
Texas Instruments: OMAP
32-bit 150 MHz or above is recommended
(Capable of running without a floating-point hardware)
Code size Approx. 200 KB
Engine
work area
2 MB or more is recommended, including data load area
Note: The actual required work area varies depending on the content
3D hardware
accelerator
ATI: Imageon
Imagination Technologies: PowerVR MBX/MBX Lite/SGX
NVIDIA: GoForce
SMedia: Glamo
TAKUMI: GSHARK
Toshiba: T4G/T5G
Other OpenGL ES compliant 3D accelerators
OS/platforms BREW, iPhone, iPod touch, ITRON, Java, Linux, Symbian OS, Windows CE, Windows Mobile
3D authoring tools 3ds Max 9.0/2008/2009/2010
Maya 8.5/2008/2009/2010
LightWave3D 7.5 or later
SOFTIMAGE|XSI 5.x/6.x/7.0

Codeplay

They are most famous for their compilers for the Playstation. They also make code-analysis software.

QNX

From their homepage: “Middleware, development tools, realtime operating systemsoftware and services for superior embedded design”. Their real-time OS in all kinds of embedded products and they might want to see ways to support specialised low-power chips.

RIM acquired QNX in april 2010.

Fixstars

Newcomer in the list 2010. Famous for their PS3-Linux and for their OpenCL-book. They also have FOXC, Fixstars OpenCL Cross Compiler. They have written one of the few books for OpenCL.

Kestrel Institute

http://www.kestrel.edu/ does not show anything GPGPU. We’ll probably hear from them when the next version of their Specware-product is finished.

Game Designers

Physics-calculations and AI are too demanding to do on a CPU. The game-industry keeps pushing the GPU-industry, but now on a different way than in the 90’s.

Electronic Arts

This game-studio builds loads and loads of games with impressive AI. See these slides to see what EA thinks GPGPU can do.

Activision Blizzard

Yes, they are one company now, so now they are together famous for best-selling hit “World of Warcraft”. Currently not much is known where they use OpenCL for, but probably the same as EA.

Thank you for your interest in this article

If you know more about OpenCL at these companies or job-posts, please let us know via comment or via e-mail.

We’ve made some assumptions about what these companies use OpenCL for – we need your feedback!

Event: Embedded boards comparison

A 2012 board
One of the first OpenCL enabled boards from 2012.

Date: 17 September 2015, 17:00
Location: Naritaweg 12B, Amsterdam
Costs: free

Selecting the right hardware for your OpenCL-powerd product is very important. We therefore organise a three hour open house where we you can test, benchmark and discuss many available chipsets that support OpenCL. For each showcased board you can read and hear about the advantages, disadvantages and preferred types of algorithms.

Board with the following chipsets will be showcased:

  • ARM Mali
  • Imagination PowerVR
  • Qualcomm Snapdragon
  • NVidia Tegra
  • Freescale i.MX6 / Vivante
  • Adapteva

Several demo’s and benchmarks are prepared that will continuously run on each board. We will walk around to answer your questions.

During the evening drinks and snacks are available.

Test your own code

There is time to test OpenCL code for free in our labs. Please get in contact, as time that evening is limited.

Registration

Register by filling in the below form. Mention with how many people you will come, if you come by car and if you want to run your own code.

[contact_form]

PRACE Spring School 2014

prace-spring-school-2014On 15 – 17 April 2014 a 3-day workshop around HPC is organised. It is free, and focuses on bringing industry and academy together.

Research Institute for Symbolic Computation (RISC) / Johannes Kepler University Linz Kirchenplatz 5b (Castle of Hagenberg) 4232 Hagenberg Austria

The PRACE Spring School 2014 will take place on 15 – 17 April 2014 at the Castle of Hagenberg in Austria. The PRACE Seasonal School event is hosted and organised jointly by the Research Institute for Symbolic Computation / Johannes Kepler University Linz (Austria), IT4Innovations / VSB-Technical University of Ostrava (Czech Republic) and PRACE.

The 3-day program includes:

  • A 1-day HPC usage for Industry track bringing together researchers and attendees from industry and academia to discuss the variety of applications of HPC in Europe.
  • Two 2-day tracks on software engineering practices for parallel & emerging computing architectures and deep insight into solving multiphysical problems with Elmer on large-scale HPC resources with lecturers from industry and PRACE members.

The PRACE Spring School 2014 programme offers a unique opportunity to bring users, developers and industry together to learn more about efficient software development for HPC research infrastructures. The program is free of charge (not including travel and accommodations).

Applications are open to researchers, academics and industrial researchers residing in PRACE member countries, and European Union Member States and Associated Countries. All lectures and training sessions will be in English.

Applications are open to researchers, academics and industrial researchers residing in PRACE member countries, and European Union Member States and Associated Countries. All lectures and training sessions will be in English. Please visit http://prace-ri.eu/PRACE-Spring-School-2014/ for more details and registration.

At StreamHPC we support such initiatives.

Avoiding false dependencies in only two steps

Let’s approach the concept of programming through looking at the brain, the code and the computer.

The idea of a program lives in the brain of a programmer. The way to get the program to the computer is using a system process called coding. When the program coded on the computer and the program embedded as an idea in the brain are alike, the programmer is happy. When over time the difference between the brain-version and the computer-version grows, then we go for a maintenance phase (although this is still this mostly from brain to computer).

When the coding-language or important coding-paradigms change, something completely different happens. In such case the program in the brain is updated or altered. Humans are not good at that, or at least not many textbooks discuss how to change from one model to another.

In this article I want to discuss one of these new coding-paradigm: dependencies in parallel software.
Continue reading “Avoiding false dependencies in only two steps”

Using async_work_group_copy() on 2D data

async_all_the_things1When copying data from global to local memory, you often see code like below (1D data):
[raw]

if (get_group_id(0)==0) {
  for (int i=0; i < N; i++) {
      data_local[i] = data_global[offset+i]
  }
}
mem_fence(CLK_LOCAL_MEM_FENCE);

[/raw]
This can be replaced this with an asynchronous copy with the function async_work_group_copy, which results in more manageable and cleaner code. The function behaves like an asynchronous version of memcpy() you know from C++.

event_t async_work_group_copy ( __local gentype *dst,
const __global gentype *src,
size_t  data_size,
event_t event
event_t async_work_group_copy ( __global gentype *dst,
const __local gentype *src,
size_t data_size,
event_t event

The Khronos registry async_work_group_copy provides asynchronous copies between global and local memory and a prefetch from global memory. This way it’s much easier to hide the latency of the data-transfer. In de example below, you effectively get free time to do the do_other_stuff() – this results in faster code.

As I could not find a good code-snippets online, I decided to clean-up and share some of my code. Below is a kernel that uses a patch of size (offset*2+1) and works on 2D data, flattened to a float-array. You can use it for standard convolution-like kernels.

The code is executed on workgroup-level, so there is no need to write code that makes sure it’s only executed by one work-item.

[raw]

kernel void using_local(const global float* dataIn, local float* dataInLocal) {
    event_t event;
    const int dataInLocalWidth = (offset*2 + get_local_size(0));
        
    for (int i=0; i < (offset*2 + get_local_size(1)); i++) {
        event = async_work_group_copy(
             &dataInLocal[i*dataInLocalWidth],
             &dataIn[(get_group_id(1)*get_local_size(1) - offset + i) * get_global_size(0) 
                 + (get_group_id(0)*get_local_size(0)) - offset],
             dataInLocalWidth,
             event);
   }
   do_other_stuff(); // code that you can execute for free
   wait_group_events(1, &event); // waits until the copy has finished.
   use_data(dataInLocal);
}

[/raw]

On the host (C++), the most important part:
[raw]

cl::Buffer cl_dataIn(*context, CL_MEM_READ_ONLY|CL_MEM_HOST_WRITE_ONLY, sizeof(float) 
          * gsize_x * gsize_y);
cl::LocalSpaceArg cl_dataInLocal = cl::Local(sizeof(float) * (lsize_x+2*offset) 
          * (lsize_y+2*offset));
queue.enqueueWriteBuffer(cl_dataIn, CL_TRUE, 0, sizeof(float) * size_x * size_y, dataIn);
cl::make_kernel kernel_using_local(cl::Kernel(*program,"using_local", &error));
cl::EnqueueArgs eargs(queue,cl::NullRange ,cl::NDRange(gsize_x, gsize_y), 
          cl::NDRange(lsize_x, lsize_y));
kernel_using_local(eargs, cl_dataIn, cl_dataInLocal);

[/raw]
This should work. Some have the preference to do local initialisation in the kernel, but I prefer not to do this JIT.

This code might not work optimal if you have special tricks for handling the outer border. If you see any improvement, please share via the comments.

8 reasons why SPIR-V makes a big difference

From all the news that came out of GDC, I’m most eager to talk about SPIR-V. This intermediate language spir-vwill make a big difference for the compute-industry. In this article I’d like to explain why. If you need a technical explanation of what SPIR-V is, I suggest you first read gtruc’s article on SPIR-V and then return here to get an overview of the advantages.

Currently there are several shader and c ompute languages, which SPIR-V tries to replace/support. We have GLSL, HLSL for graphics shaders, SPIR (without the V), OpenCL, CUDA and many others for compute shaders.

If you have questions after reading this article, feel free to ask them in a comment or to us directly. Continue reading “8 reasons why SPIR-V makes a big difference”

Memberships

We are active in several foundations, communities and collaborations. Below is an overview.

Khronos

Khronos_500px_Dec14

Associate member of Khronos, the non-profit technology consortium that maintains important languages like OpenCL, OpenGL, SPIR and Vulkan.

The Khronos Group was founded in 2000 to provide a structure for key industry players to cooperate in the creation of open standards that deliver on the promise of cross-platform technology. Today, Khronos is a not for profit, member-funded consortium dedicated to the creation of royalty-free open standards for graphics, parallel computing, vision processing, and dynamic media on a wide variety of platforms from the desktop to embedded and safety critical devices.

High Tech NL

High Tech NL is the sector organization by and for innovative Dutch high-tech companies and knowledge institutes. High Tech NL is committed to the collective interests of the sector, with a focus on long-term innovation and international collaboration.

We’re a member because HighTech NL is one of the few organizations that understands IT is far more than digitisation. Our main focus there is robotics.

HSA Foundation

HSA-logo

Heterogeneous System Architecture (HSA) Foundation is a not-for-profit industry standards body focused on making it dramatically easier to program heterogeneous computing devices. The consortium comprises various semiconductor companies, tools providers, software vendors, IP providers, and academic institutions that develops royalty-free standards and open-source software.

HiPEAC

HiPEAC’s mission is to steer and increase the European research in the area of high-performance and embedded computing systems, and stimulate (international) collaborations.

We’ve sponsored multiple conferences over the years.

ETP4HPC

ETP4HPC is the European Technology Platform (ETP) in the area of High-Performance Computing (HPC). It is an industry-led think-tank comprising of European HPC technology stakeholders: technology vendors, research centres and end-users. The main objective of ETP4HPC is to define research priorities and action plans in the area of HPC technology provision (i.e. the provision of supercomputing systems).

OpenPower

OpenPOWER Foundation is an open, not-for-profit technical membership group incorporated in December 2013. It was incepted to enable today’s data centers to rethink their approach to technology. OpenPOWER was created to develop a broad ecosystem of members that will create innovative and winning solutions based on POWER architecture.

4 October talk in Amsterdam on mobile compute

Thursday 4 October I talk on mobile compute at Hackers&Founders Amsterdam on what mobile compute can do. The goal is to initiate new ideas for start-ups, as not many know their mobile phone and tablet is very powerful and next year can be used for compute intensive tasks.

The other talk is from Mozilla on Firefox OS (Edit: it was cancelled), which is actually reason enough to visit this Hackers&Founders Meetup. Entrance is free, drinks are not. Alternatively you could go to the Hadoop User Group Meetup at Science Park, Amsterdam.

Continue reading “4 October talk in Amsterdam on mobile compute”

European HPC Magazines

If one thing can be said about Europe, is that it is quite diverse. Each country  solves or fails to solve its own problems individually, while European goals are not always well-cared for. Nevertheless, you can notice things changing. One of the areas where things have changed, is that of HPC. HPH  has always been a well-interconnected research in Europe (with its centre in CERN), but there is a catch-up going on for the European commercial market. The whole of Europe has new goals set for better collaboration between companies and research institutes with programs like Horizon 2020. This means that it becomes necessary to improve interconnections among much larger groups.

In most magazines HPC is a section of a broader scope. This is also very important as this introduces HPC to more people. Now, I’d like to concentrate on the focus magazines. There are mainly two magazines available: Primeur Magazine and HPC Today.

Primeur Magazine

logo-weeklyDe Netherlands based magazine Primeur-magazine has been around for years, with HPC-news from Europe, video-channel, knowledge-base, calendar and more. Issues of past weeks can be read online (for free), but news can also be delivered via a weekly e-mail (paid service, prices range from €125 to €4000 per company/institute, depending on size).

They focus on being a news-channel for what is going on in in the HPC-world, both in the EU and the US. Don’t forget to follow them on Twitter.

HPC Today

Update: the magazine changed its name from HPC Magazine to HPC Today.

With several editions (Americas, Europe and France), websites and TV channels, the France based HPC Today brings an actionable coverage of the HPC and Big Data news, technologies, uses and research. Subscriptions are free, as the magazine is paid-for by advertisements. They balance their articles by targeting both people who deeply understand malloc() and people who want to know what is going on. Their readers are developers and researchers from both academic and private sectors.

With the change to HPC Today the content has slightly changed according the requests from the readers: less science, more HPC news. For the rest it’s about the same.

HPC-today
To get an idea of how they’re doing, check the partners of HPC Magazine: Teratec, ISC events and SC conference.

Other European HPC sources

Not all information around the web is nicely bundled in a PDF. Find a small list below to help you start.

InSiDE

The German National Supercomputing Centers HLRS, LRZ, NIC publish the online magazine InSiDE (Innovatives Supercomputing in Deutschland) twice a year. The articles are available in html and PDF. It gives a good overview of what is going on in Germany and Europe. There are no ways to subscribe via e-mail, so it would be better to put it in your calendar.

e-IRG

The e-Infrastructure initiative‘s main goal is to support the creation of a political, technological and administrative framework for an easy and cost-effective shared use of distributed electronic resources across Europe.

e-IRG is not a magazine, but it is a good start to find information about HPC in Europe. Their knowledge-base is very useful when trying to get an overview what is there in Europe: Projects, Country-statistics, Computer centers and more. They closely collaborate with Primeur-magazine, so can you see some overlap in the information.

PRACE Digest

PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery, as well as engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to achieve this mission by offering world class computing and data management resources and services through a peer review process.

The PRACE digest appears as twice a year as a PDF.

More?

Did we miss an important news-source or magazine? Let us know in the comments below!

Will OpenCL work for me?

OpenCL_LogoOpenCL can accelerate your software multiple factors, but… only if the data and the software are fit.

The same applies to CUDA and other GPGPU-methods.

Get to know if you can speed up your software with OpenCL in 4 steps.
[columns]
[one_half title=”1. Lots of repetitions”]
The main focus to find code that can be run in parallel is finding loops that take relatively much time. If an action needs to be done for each part of the data-input, then the code certainly contains a lot of loops. You can go to the next step.

If data goes through the code from A to B in a straight line without many loops, then there is a very low chance that computing-speed is the bottle-neck. A faster network, better caching, faster memory and such should be first looked into.
[/one_half]
[one_half title=”2. No or few dependencies”]
If in loops there are no dependencies on the previous step, then you can go to the next step.

As low interdependencies do not matter for single-core software, this was not an important developer’s focus even five years ago. Know there are many new algorithms now, which decrease loop-internal dependencies. If your software has been optimised for several processors or even a cluster, then the step to OpenCL is much smaller.

For example search-problems can be sped up by dividing the data between many processes. Even though the dependency is high within the thread, the dependency on the other threads is very low.
[/one_half]

[/columns]

[columns]

[one_half title=”3. High predictability to avoid branching”]

Computations need to be as predictable as possible, to get the highest speed-up. That means the code within the loops needs to have no or few branches. That is code without statements like if, while or switch. This is because GPUs work better if the whole processor does the same. So if you now have many threads which all do different things, then a CPU is still the best solution. Like for decreasing dependencies from step two, in many cases redesigning the algorithm can result in performing GPU-code.

[/one_half]

[one_half title=”4. Low Data-transport overhead”]

In step 1 you looked for repeated computations. In this last step we look at the ratio between computations and data-size.

If the computations per data-chunk is high, then using the GPU is a good solution. A simple way to find out if a lot of computations are done is to look at CPU-usage in the system monitor. The reason is that data needs to be transferred to and from the GPU, which takes time even with 3 to 6 GB throughput per second.

When computations per data-chunk is low, doubling of speed is still possible when OpenCL is used on CPUs. See the technical explanation how OpenCL on modern CPUs work and can even  outperform a GPU.

[/one_half]
[/columns]


Does it fit?

Found out OpenCL is right for you? Contact us immediately and we can discuss how we can make your software faster. Not sure? Request a code-review or Rapid OpenCL Assessment to quickly find out if it works.

Do you think openCL is not the solution, but still processing data at the limits of your system? Feel free to contact us, as we can give you feedback for free on how to solve your problem with other techniques.

More to read on our blog

OpenCL is supported on many CPUs and GPUs. See this blog article to have an extensive overview of hardware that supports OpenCL.

A list of application areas where OpenCL can be used is written down here.

Finally there is aso a series on parallel programming theories, which explain certain theories behind OpenCL.

We accelerated the OpenCL backend of pyPaSWAS sequence aligner

Last year we accelerated the OpenCL-code in PaSWAS, which is open source software to do DNA/RNA/protein sequence alignment and trimming. It has users world-wide in universities, research groups and industry.

Below you’ll find the benchmark results of our acceleration work. You can also test out yourself, as the code is public. In the readme-file you can learn more about the idea of the software. Lots of background information is described in these two papers:

We chose PaSWAS because we really like bio-informatics and computational chemistry – the science is interesting, the problems are complex and the potential GPU-speedup is real. Other examples of such software we worked on are GROMACS and TeraChem.

Continue reading “We accelerated the OpenCL backend of pyPaSWAS sequence aligner”

We don’t work for the war-industry

Last week we emphasized that we don’t work for the war-industry. We did talk to a national army some years ago, but even though the project never started, we would have probably said no. Recently we got a new request, got uncomfortable and did not send a quote for the training.

https://twitter.com/StreamHPC/status/1055121211787763712

This is because we like to think about the next 100 years, and investment in weapons is not something that would solve things for the long term.

To those, who liked the tweet or wanted to, thank you for your support to show us we’re not standing alone here. Continue reading “We don’t work for the war-industry”

Rebranding the company name from StreamComputing to StreamHPC

Since 2010 the name StreamComputing has been used and is widely known now in the GPU-computing industry. But the name has three problems: we don’t have the .com domain, it does not directly show what we do, and the name is quite long.

Some background on the old domain name

While the initial focus was Europe, for years our projects are done for 95% for customers outside the Netherlands and over 50% outside Europe – with the .eu domain we don’t show our current international focus.

But that’s not all. The name sticks well in academics, as they’re more used to have longer names – just try to read a book on chemistry. Names I tested as alternatives were not well-received for various reasons. Just like “fast” is associated with fast food, computing is not directly associated with HPC. So fast computing gets simply weird. Since several customers referred to us as Stream, it made much sense to keep that part of the name.

Not a new begin, but a more focused continuation

Stream HPC defines more what we are: we build HPC software. Stream HPC combines the well-known name combined with our diverse specialization.

  • Programming GPUs with CUDA or OpenCL.
  • Scaling code to multiple CPU and GPUs
  • Creating AI-based software
  • Speeding up code and optimizing for given architectures
  • Code improvement
  • Compiler tests and compiler building (LLVM)

With the HPC-focus we were more able to improve ourselves. We have put a lot of time in professionalizing our development-environment and project-management by implementing suggestions from current customers and our friends. We were already used to work fully independent and be self-managed, but now we were able to standardize more of it.

The rebranding process

Rebranding will take some time, as our logo and name is in many places. For the legal part we will take some more time, as we don’t want to get into problems with i.e. all the NDAs. Email will keep working on both domains.

We will contact all organizations we’re a member of over the coming weeks. You can also contact us, if you read this.

StreamComputing will never really go away. The name was with us for 7 years and stands for growing with the upcoming of GPU-computing.

What does it mean to work at Stream HPC?

High-performance computing on many-core environments and low-level optimizations are very important concepts in large scientific projects nowadays. Stream HPC is one of the market’s more prominent companies active in mostly North America and Europe.

As we often get asked how it is to work at the company, we’d like to give you a little peak into our kitchen.

What we find important

We’re a close-knitted group of motivated individuals, who get a kick out of performance optimizations and are experienced in programming GPUs. Every day we have discussions on performance. Finding out why certain hardware behaves in a certain manner when a specific computing load is applied. For instance why certain code is not as fast as theoretically promised, and then finding the bottlenecks by analyzing the device and finding solutions for removing those bottlenecks. As a team we make better code than we could ever do as individuals.

Quality is important for everybody on the team, which is a whole step further than “just getting the job done”. This has a simple reason: we cannot speed up code that is of low quality. This is also why we don’t use many tools that automatically do magic, as these often miss many significant improvements and don’t improve the code quality. We don’t expect AI to dully replace us soon, but once it’s possible we’ll probably be part of that project ourselves.

Computer science in general is evolving at a fast rate and therefore learning, is an important part of the job. Reading papers, finding new articles, discussing future hardware architectures and how they would affect performance, is very important. With every project, we have to gather as much data as possible using scientific publications, interesting blog posts and code repositories in order to be on the bleeding edge of technology for our project. Why use a hammer to speedup code, when you don’t know which hammer to use best?

Our team-culture

Personality of the team

We are all kind, focused on structured problem-solving, communicative about wins and struggles, focus on group-wins above personal gains, and all gamers. To have good discussions and have good disagreements, we seek people who are also open-minded.

And we share and appreciate humor!

Tailored work environment

As we have all kinds of people in the team, who need different ways of recharging. One needs a walk, while somebody else needs a quiet place. We help each other on more than just work-related obstacles. We think that a broad approach on differences makes us understand how to progress to the next professional level the quickest. This is inclusivity-in-action, we’re proud of. Ow, and we have noise-canceling headphones.

Creating a safe place to speak up is critical for us. This helps us learn new skills and do things we never did before. And this approach helps well with all those who don’t have Asperger or ADHD at all, but need to progress without first fitting a certain norm.

Projects we do

Today we work on plenty of exciting projects and no year has been the same. Below is a page with projects we’re proud of.

https://streamhpc.com/about-us/work-we-do

Style of project handling

We use Gitlab and Mattermost to share code and have discussions. This makes it possible to keep good track of each project – searching for what somebody said or coded two years ago is quite easy. Using modern tools has changed the way we work a lot, thus we have questioned and optimized everything that was presented as “good practice”. Most notable are the management and documentation style.

Saying an engineer hates documentation and being managed because he/she is lazy is simply false. It’s because most management and documentation styles are far from optimal.

Pull-style management is where the tasks are written down by the team, based on the proposal. All these tasks are put into the task-list of the project, and then each team member picks the tasks that are a good fit. The last resort for the tasks that stay behind and have a deadline (being pushed) was only needed in a few cases.

All code (MR) is checked by one or two colleague, chosen by the one who wrote the code. More important are the discussions in advance, as the group can give more insight than any individual and one can get into the task well-prepared. The goal is not to get the job finished, but not having written the code where a future bug has been found.

All types of code can contain comments and Doxygen can create documentation automatically, so there is no need to copy functions into a Word-document. Log-style documentation was introduced, as git history and Doxygen don’t answer why a certain decision has been made. By writing down a logbook, a new member of the team can just read these remarks and fully understand why the architecture is how it is and what the limits are. We’ll discuss this in more detail later.

These type of solutions describe how we work and differ from a corporate environment: no-nonsense and effective.

Where do we fit in your career?

Each job should get you forward, when done at the right moment. Question is when Stream HPC is the right choice.

As you might have seen, we don’t require a certain education. This is because a career is a sum, and an academic study can be replaced by various types of experience. The optimum is often both a study and the right type of experience. This means that for us, a senior can be a student and a junior can have been 20 years in the field.

So what is the “right type of experience”? Let’s talk about those who only have job-experience with CPUs. First, being hooked by performance, as primary interest, would be the first reason to get into HPC and GPGPU. Second, being good at C and C++ programming. Third, knowing algorithms and mathematics really well and can quickly apply them. Fourth, being a curious and quick learner, which shows by you having experimented with GPUs. This is also exactly what we test and check during the application procedure.

During your job you’ll learn anything around GPU-programming with a balance between theory and practice. Preparation is key in how we work, and this you will develop in many circumstances.

Those who left Stream HPC have gotten very senior roles, from team lead to CTO. With Stream HPC growing in size, the growth opportunities within the company are also increasing.

Make the decision for a new job

Would you like to work for a rapidly growing company of motivated GPU professionals in Europe? We seek motivated, curious, friendly people. If you liked what you read here, do check our open job positions.

We’re looking for an intern to do the cool stuff: benchmarking and Linux wizarding

intern
So, don’t let us retype your documents and blog posts, as that would make us your intern.

We have some embedded devices here, which badly need attention. Some have gotten some private time on the bench, but we did not share anything on the blog yet with our readers. We simply need some extra hands to do this. Because it’s actually cool to do, but admittedly a bit boring when doing several devices, it was the perfect job for an intern. Besides the benchmarking, we have some other Linux-related projects for you. You’ll get an average payment for an internship in the Netherlands (in Dutch: “stagevergoeding”), lunch, a desk and a bunch of devices (aka toys-for-techies).

Like more companies in the Netherlands, we don’t care about how you where born, but who you are as a person. We expect from you that you…

  • know everything about Linux administration, from servers to embedded devices.
  • know how to setup a benchmark.
  • document all what you do, not only the results.
  • speak and write Dutch and English.
  • have great humor! (Even if you’re the only one who laughs at your jokes).
  • study in the EU, or can arrange the paperwork to get to the EU yourself.
  • have a place to live/crash in or nearby Amsterdam, or don’t mind the daily travelling. You cannot sleep in the office.

Together with your educational institute we’ll discuss the exact learning goals of the internship, and make a plan for a period of 3 to 6 months.

If you are interested, send a mail to jobs@streamhpc.com. If you know somebody who would be interested, please tell that person that we’re waiting for him/her! Also tips&tricks on finding the right person are very welcome.