Mega-kernel versus Micro-kernels in LuxRender (repost)

LuxRenderer demo-rendering
LuxRenderer demo-rendering

Below is a (slightly edited) repost of a blog by

I find micro-kernels an important subject, since micro-kernels have clear advantages. In OpenCL 2.0 there are more possibilities to create smaller kernels. Also making smaller and more focused functions is considered good software engineering, defined as “Separation of Concerns“.


 

For a general introduction to the concept of “Mega Vs Micro” kernels, read “Megakernels Considered Harmful: Wavefront Path Tracing on GPUs” by Samuli Laine, Tero Karras, and Timo Aila of NVIDIA. Abstract:

When programming for GPUs, simply porting a large CPU program
into an equally large GPU kernel is generally not a good approach.
Due to SIMT execution model on GPUs, divergence in control flow
carries substantial performance penalties, as does high register us-
age that lessens the latency-hiding capability that is essential for the
high-latency, high-bandwidth memory system of a GPU. In this pa-
per, we implement a path tracer on a GPU using a wavefront formu-
lation, avoiding these pitfalls that can be especially prominent when
using materials that are expensive to evaluate. We compare our per-
formance against the traditional megakernel approach, and demon-
strate that the wavefront formulation is much better suited for real-
world use cases where multiple complex materials are present in
the scene.

OpenCL kernels in “SmallLuxGPU” (raytracer, originally made by David) have followed the micro-kernel approach from the very beginning. However, with the merge with LuxRender and the introduction of LuxRender materials, textures, light sources, etc. one of the kernels sized up to the point of being a “Mega-kernel”.

The major problem with “Mega-kernel”, aside of the inability of AMD OpenCL compiler to compile them, is the huge register usage and the very low GPU utilization. Why this happens, is well explained in the paper.

PATHOCL Micro-kernels edition, the results

The number of kernels increases from 2 to 10, the register usage decrease from 196 (!!!) to 3-84 and the GPU utilization rise from a miserable 10% to a more healthy 30%-100%.

Occupancy increases from 10% to 30% or more
Occupancy increases from 10% to 30% or more

The performance increase is huge on some platform (Linux + FirePro W8100), 3.6 times:

Speed increases from 0.84M to 3.07M samples/sec
Speed increases from 0.84M to 3.07M samples/sec

A speedup in the 20% to 40% range has been reported on MacOS/Windows + NVIDIA GPUs.

It solves the problems with AMD compiler

Micro-kernels not only improve the performance but also addressees the major issues with AMD OpenCL compiler. For the very first time since the release of first AMD OpenCL SDK beta, I’m not aware of a scene not running on AMD GPUs. This is SATtva’s Mic scene running on GPUs for the first time:

Scene builds correctly on AMD hardware for the first time
Scene builds correctly on AMD hardware for the first time

Try it out yourself

This feature will be extended to BIASPATHOCL and available in LuxRender v1.5.

A new version of PATHOCL is available in this branch. The sources of micro-kernels are available here.

To run with micro-kernels, use “path.microkernels.enable=1”.

9 questions on OpenCL’s future answered at IWOCL

IWOCL-logoDuring the panel discussion some very interesting questions were asked, I’d like to share with you.

Should the Khronos group poll the community more often about the future of OpenCL?

I asked it on twitter, and this is the current result:khronos-community-feedback

Khronos needs more feedback from OpenCL developers, to better serve the user base. Tell the OpenCL working group what holds you back in solving your specific problems here. Want more influence? You can also join the OpenCL advisory board, or join Khronos with your company. Get in contact with Neil Trevett for more information.

How to (further) popularise OpenCL?

While the open standard is popular at IWOCL, it is not popular enough at universities. NVidia puts a lot of effort in convincing academics that OpenCL is not as good as CUDA and to keep CUDA as the only GPGPU API in the curriculum.

Altera: “OpenCL is important to be thought at universities, because of the low-level parts, it creates better programmers”. And I agree, too many freshly graduated CS students don’t understand malloc() and say “The compiler should solve this for me”.

The short answer is: more marketing.

At StreamHPC we have been supporting OpenCL with marketing (via this blog) since 2010. 6 years already. We are now developing the website opencl.org to continue the effort, while we have diversified at the company.

How to get all vendors to OpenCL 2.0?

Ofcourse this was a question targeted at NVidia, and thus Neil Trevett answered this one. Use a carrot and not a stick, as it is business in the end.

Think more marketing and more apps. We already have a big list:opencl-library-ecosphere

Know more active(!) projects? Share!

Can we break the backwards compatibility to advance faster?

This was a question from the panel to the audience. From what I sensed, the audience and panel are quite open to this. This would mean that OpenCL could make a big step forward, fixing the initial problems. Deprecation would be the way to go the panel said. (OpenCL 2.3 deprecates all nuisances and OpenCL 3.0 is a redesign? Or will it take longer?)

See also the question below on better serving FPGAs and DSPs.

Should we do a specs freeze and harden the implementations?

Michael Wong (OpenMP) was clear on this. Learn from C++98. Two years were focused on hardening the implementations. After that it took 11 years to restart the innovation process and get to C++11! So don’t do a specs freeze.

How to evolve OpenCL faster?

Vendor extensions are the only option.

At StreamHPC we have discussed a lot about it, especially fall-backs. In most cases it is very doable to create slower fall-backs, and in other cases (like with special features on i.e. FPGAs) it can be the only option to make it work.

How to get more robust OpenCL implementations?

Open sourcing the Vulkan conformance tests was a very good decision to make Vulkan more robust. Khronos gets a lot of feedback on the test cases. It will be discussed soon to what extend this also can be done for OpenCL.

Test-cases from open source libraries are often used to create more test cases.

How to better support FPGAs and DSPs?

Now GPUs are the majority and democracy doesn’t work for the minorities.

An option to better support FPGAs and DSPs in OpenCL is to introduce feature sets. A lesson learnt from Vulkan. This way GPU vendors don’t need to spend time implementing features that they don’t find interesting.

Do we see you at IWOCL 2017?

Location will be announced later. Boston and Toronto are mentioned.

Birthday present! Free 1-day Online GPGPU crash course: CUDA / HIP / OpenCL

Stream HPC is 10 years old on 1 April 2020. Therefore we offer our one day GPGPU crash course for free that whole month.

Now Corona (and fear for it) spreads, we had to rethink how to celebrate 10 years. So while there were different plans, we simply had to adapt to the market and world dynamics.

5 years ago…
Continue reading “Birthday present! Free 1-day Online GPGPU crash course: CUDA / HIP / OpenCL”

OpenCL alternatives for CUDA Linear Algebra Libraries

While CUDA has had the advantage of having many more libraries, this is no longer its main advantage if it comes to linear algebra. If one thing changed over the past year, then it is linalg library-support for OpenCL. The choices have been increased at a continuous rate, as you can see the below list.

A general remark when using these libraries. When using them you need to handle your data-transfers and correct data-format, with great care. If you don’t think it through, you won’t get the promised speed-up. If not mentioned, then free.

Subject CUDA OpenCL
FFT
cuff_ampchart

The NVIDIA CUDA Fast Fourier Transform library (cuFFT) provides a simple interface for computing FFTs up to 10x faster. By using hundreds of processor cores inside NVIDIA GPUs, cuFFT delivers the…

clFFT is a software library containing FFT functions written in OpenCL. In addition to GPU devices, the library also supports running on CPU devices to facilitate debugging and multicore programming.
Linear Algebra
MAGMA-Logo

MAGMA is a collection of next generation, GPU accelerated ,linear algebra libraries. Designed for heterogeneous GPU-based architectures. It supports interfaces to current LAPACK and BLAS standards.

clMAGMA is an OpenCL port of MAGMA for AMD GPUs. The clMAGMA library dependancies, in particular optimized GPU OpenCL BLAS and CPU optimized BLAS and LAPACK for AMD hardware, can be found in the AMD Accelerated Parallel Processing Math Libraries (APPML).
Sparse Linear Algebra
cusp_logo

CUSP is an open source C++ library of generic parallel algorithms for sparse linear algebra and graph computations on CUDA architecture GPUs. CUSP provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems.

clBLAS implements the complete set of BLAS level 1, 2 & 3 routines. Please see Netlib BLAS for the list of supported routines. In addition to GPU devices, the library also supports running on CPU devices to facilitate debugging and multicore programming.ViennaCL is a free open-source linear algebra library for computations on many-core architectures (GPUs, MIC) and multi-core CPUs. The library is written in C++ and supports CUDA, OpenCL, and OpenMP. In addition to core functionality and many other features including BLAS level 1-3 support and iterative solvers, the latest release ViennaCL 1.5.0 provides many new convenience functions and support for integer vectors and matrices.VexCL is a vector expression template library for OpenCL/CUDA. It has been created for ease of GPGPU development with C++. VexCL strives to reduce amount of boilerplate code needed to develop GPGPU applications. The library provides convenient and intuitive notation for vector arithmetic, reduction, sparse matrix-vector products, etc. Multi-device and even multi-platform computations are supported.
Random number generation
cuRandImage

The NVIDIA CUDA Random Number Generation library (cuRAND) delivers high performance GPU-accelerated random number generation (RNG). The cuRAND library delivers high quality random numbers 8x…

The Random123 library is a collection of counter-based random number generators (CBRNGs) for CPUs (C and C++) and GPUs (CUDA and OpenCL). They are intended for use in statistical applications and Monte Carlo simulation and have passed all of the rigorous SmallCrush, Crush and BigCrush tests in the extensive TestU01 suite of statistical tests for random number generators. They are not suitable for use in cryptography or security even though they are constructed using principles drawn from cryptography.
KeyVisual_Primary_verysm

The CUDA Math library is an industry proven, highly accurate collection of standard mathematical functions. Available to any CUDA C or CUDA C++ application simply by adding “#include math.h” in…

Looking into the details of what the CUDA math lib exactly is.
AI
GPU_AI_games

A technology preview with CUDA accelerated game tree search of both the pruning and backtracking styles. Games available: 3D Tic-Tac-Toe, Connect-4, Reversi, Sudoku and Go.

There are many tactics to speed up such algorithms. This CUDA-library can therefore only be used for limited cases, but nevertheless it is a very interesting research-area. Ask us for an OpenCL based backtracking and pruning tree searching, tailored for your problem.
Dense Linear Algebra
CULAtoolslogo2
Provides accelerated implementations of the LAPACK and BLAS libraries for dense linear algebra. Contains routines for systems solvers, singular value decompositions, and eigenproblems. Also provides various solvers.
Free (with limitations) and commercial.
See ViennaCL, VexCL and clBLAS above. Kudos to the CULA-team, as they were one of the first with a full GPU-accelerated linear algebra product.
Fortran
RogueWave-IMSL-Box2
The IMSL Fortran Numerical Library is a comprehensive set of mathematical and statistical functions available that offloads CPU work to NVIDIA GPU hardware where the cuBLAS library is utilized.
Free (with limitations) and commercial.
OpenCL-FORTRAN is not available yet. Contact us, if you have interest and wish to work with a pre-release once available.
Subject
arrayfire_logo340

Comprehensive GPU function library, including functions for math, signal processing, image processing, statistics, and more. Interfaces for C, C++, Fortran, and Python. Integrates with any CUDA-program.

Free (with limitations) and commercial.

ArrayFire 2.0 is also available for OpenCL. Note that currently fewer functions are supported in the OpenCL-version than are supported in CUDA-ArrayFire, so please check the OpenCL documentation for supported feature list.Free (with limitations) and commercial.
Subject
nppeye

The NVIDIA Performance Primitives library (NPP) is a collection of over 1900 image processing primitives and nearly 600 signal processing primitives that deliver 5x to 10x faster performance than…

Kudos for NVIDIA for bringing it all at one place. OpenCL-devs have to do some googling for specific algorithms.

So the gap between CUDA and OpenCL is certainly closing. CUDA provides a lot more convenience, so OpenCL-devs still have to keep reading blogs like this one to find what’s out there.

As usual, if you have additions to this list (free and commercial), please let me know in the comments below or by mail. I also have a few more additions to this list myself – depending on your feedback, I might represent the data differently.

AccelerEyes ArrayFire

There is a lot going on at the path to GPGPU 2.0 – the libraries on top of OpenCL and/or CUDA. Among many solutions we see for example Microsoft with C++ AMP on top of DirectCompute, NVidia (and more) with OpenACC, and now AccelerEyes (most known for their Matlab-extension Jacket and libJacket) with ArrayFire.

I want you to show how easy programming GPUs can be when using such libraries – know that for using all features such as complex numbers, multi-GPU and linear algebra functions, you need to buy the full version. Prices start at $2500,- for a workstation/server with 2 GPUs.

It comes in two flavours: for OpenCL (C++) and for CUDA (C, C++, Fortran). The code for both is the same, so you can easily switch – though you still see references to cuda.h you can compile most examples from the CUDA-version using the OpenCL-version with little editing. Let’s look a little into what it can do.

Continue reading “AccelerEyes ArrayFire”

N-Queens project from over 10 years ago

Why you should just delve into porting difficult puzzles using the GPU, to learn GPGPU-languages like CUDA, HIP, SYCL, Metal or OpenCL. And if you did not pick one, why not N-Queens? N-Queens is a truly fun puzzle to work on, and I am looking forward to learning about better approaches via the comments.

We love it when junior applicants have a personal project to show, even if it’s unfinished. As it can be scary to share such unfinished project, I’ll go first.

Introduction in 2023

Everybody who starts in GPGPU, has this moment that they feel great about the progress and speedup, but then suddenly get totally overwhelmed by the endless paths to more optimizations. And ofcourse 90% of the potential optimizations don’t work well – it takes many years of experience (and mentors in the team) to excel at it. This was also a main reason why I like GPGPU so much: it remains difficult for a long time, and it never bores. My personal project where I had this overwhelmed+underwhelmed feeling, was with N-Queens – till then I could solve the problems in front of me.

I worked on this backtracking problem as a personal fun-project in the early days of the company (2011?), and decided to blog about it in 2016. But before publishing I thought the story was not ready to be shared, as I changed the way I coded, learned so many more optimization techniques, and (like many programmers) thought the code needed a full rewrite. Meanwhile I had to focus much more on building the company, and also my colleagues got better at GPGPU-coding than me – this didn’t change in the years after, and I’m the dumbest coder in the room now.

Today I decided to just share what I wrote down in 2011 and 2016, and for now focus on fixing the text and links. As the code was written in Aparapi and not pure OpenCL, it would take some good effort to make it available – I decided not to do that, to prevent postponing it even further. Luckily somebody on this same planet had about the same approaches as I had (plus more), and actually finished the implementation – scroll down to the end, if you don’t care about approaches and just want the code.

Note that when I worked on the problem, I used an AMD Radeon GPU and OpenCL. Tools from AMD were hardly there, so you might find a remark that did not age well.

Introduction in 2016

What do 1, 0, 0, 2, 10, 4, 40, 92, 352, 724, 2680, 14200, 73712, 365596, 2279184, 14772512, 95815104, 666090624, 4968057848, 39029188884, 314666222712, 2691008701644, 24233937684440, 227514171973736, 2207893435808352 and 22317699616364044 have to do with each other? They are the first 26 solutions of the N-Queens problem. Even if you are not interested in GPGPU (OpenCL, CUDA), this article should give you some background of this interesting puzzle.

An existing N-Queen implementation in OpenCL took N=17 took 2.89 seconds on my GPU, while Nvidia-hardware took half. I knew it did not use the full potential of the used GPU, because bitcoin-mining dropped to 55% and not to 0%. 🙂 I only had to find those optimizations be redoing the port from another angle.

This article was written while I programmed (as a journal), so you see which questions I asked myself to get to the solution. I hope this also gives some insight on how I work and the hard part of the job is that most of the energy goes into resultless preparations.

Continue reading “N-Queens project from over 10 years ago”

NVIDIA ended their support for OpenCL in 2012

If you are looking for the samples in one zip-file, scroll down. The removed OpenCL-PDFs are also available for download.

This sentence “NVIDIA’s Industry-Leading Support For OpenCL” was proudly used on NVIDIA’s OpenCL page last year. It seems that NVIDIA saw a great future for OpenCL on their GPUs. But when CUDA began borrowing the idea of using LLVM for compiling kernels, NVIDIA’s support for OpenCL slowly started to fade instead. Since with LLVM CUDA-kernels can be loaded in OpenCL and vice versa, this could have brought the two techniques more together.

What is the cause for this decreased support for OpenCL? Did they suddenly got aware LLVM would decrease any advantage of CUDA over OpenCL and therefore decreased support for OpenCL? Or did they decide so long ago, as their last OpenCL-conformant product on Windows is from July 2010? We cannot be sure, but we do know NVIDIA does not have an official statement on the matter.

The latest action demonstrating NVIDIA’s reduced support of OpenCL is the absence of the samples in their GPGPU-SDK. NVIDIA removed them without notice or clear statement on their position on OpenCL. Therefore we decided to start a petition to get these OpenCL samples back. The only official statement on the removal of the samples was on LinkedIn:

All of our OpenCL code samples are available at http://developer.nvidia.com/opencl, and the latest versions all work on the new Kepler GPUs.
They are released as a separate download because developers using OpenCL don’t need the rest of the CUDA Toolkit, which is getting to be quite large.
Sorry if this caused any alarm, we’re just trying to make life a little easier for OpenCL developers.

Best regards,

Will.

William Ramey
Sr. Product Manager, GPU Computing
NVIDIA Corporation

Continue reading “NVIDIA ended their support for OpenCL in 2012”

Caffe and Torch7 ported to AMD GPUs, MXnet WIP

Last week AMD released ports of Caffe, Torch and (work-in-progress) MXnet, so these frameworks now work on AMD GPUs. With the Radeon MI6, MI8 MI25 (25 TFLOPS half precision) to be released soonish, it’s ofcourse simply needed to have software run on these high end GPUs.

The ports have been announced in December. You see the MI25 is about 1.45x faster then the Titan XP. With the release of three frameworks, current GPUs can now be benchmarked and compared.

Especially the expected good performance/price ratio will make this very interesting, especially on large installations. Another slide discussed which frameworks will be ported: Caffe, TensorFlow, Torch7, MxNet, CNTK, Chainer and Theano.

This leaves HIP-ports of TensorFlow, CNTK, Chainer and Theano still be released. Continue reading “Caffe and Torch7 ported to AMD GPUs, MXnet WIP”

Online Tutorials are here

46188854 - beautiful smiling female student using online education service. young woman looking in laptop display watching training course and listening it with headphones. modern study technology concept
Online training

We’re going online with our presentations and tutorials. This makes it easy to reach more people and make our trainings more flexible.

We’re starting with short introductory trainings, but we have bigger plans. Keep an eye on our events (shared on Twitter, LinkedIn, this blog and the newsletter) to see what the offerings are. And you’re very welcome to join!

On 4 October (new date) there will be an OpenCL 101 of two hours for free. Target timezone is East-America and Europe.

Agenda Online OpenCL 101

  • Introductions (20 minutes)
    • StreamHPC
    • GPUs and paralellism
    • OpenCL
  • By example: Getting started with OpenCL (30 minutes)
  • By example: Porting a simple program to OpenCL (30 minutes)
  • Q&A in parallel (30 minutes). Ask us any question, for instance:
    • General OpenCL.
    • OpenCL on GPUs.
    • OpenCL on FPGAs.
    • What algorithms work well with GPUs, CPUs and FPGAs.
    • StreamHPC services.
  • The next steps (5 minutes).
  • Closing words (5 minutes).

Read more here…

Tutorial server

You can already test if the tutorial server works for you by looking around in our demo room. The tutorial itself will be in another room. Use your own name and password “ap“.

The room linked to this resource is not configured correctly.

See you soon!

AMD gets into Machine Intelligence with “MI” range of hardware and software

Always good to have a share out of that curve.

In June we wrote on “AMD is back!“, where this is one of the blog posts with more details in a specific direction. This post is about AMD specifically targeting machine learning with the MI ( = Machine Intelligence) range of hardware and software.

With all the news around AMD’s new processors Ryzen (CPU) and VEGA (GPU), it became apparent that AMD wants a good share of the Deep Learning market.

And they seem to succeed. Here is the current status.

Hardware: 25 TFLOPS @ 16-bit

Recently released have been the “Radeon Instinct” series, which purely focus on compute. How the new naming of AMD is organised will be discussed in a separate blog post. Continue reading “AMD gets into Machine Intelligence with “MI” range of hardware and software”

Thalesians talk – OpenCL in financial computations

End of October I had a talk for the Thalesians, a group that organises different kind of talks for people working or interested in the financial market. If you live in London, I would certainly recommend you visit one of their talks. But from a personal perspective I had a difficult task: how to make a very diverse public happy? The talks I gave in the past were for a more homogeneous and known public, and now I did not know at all what the level of OpenCL-programming was of the attendants. I chose to give an overview and reserve time for questions.

After starting with some honest remarks about my understanding of the British accent and that I will kill my business for being honest with them, I spoke about 5 subjects. Some of them you might have read here, but not all. You can download the sheets [PDF] via this link: Vincent.Hindriksen.20101027-Thalesians. The below text is to make the sheets more clear, but certainly is not the complete talk. So if you have the feeling I skipped a lot of text, your feeling is right.

Continue reading “Thalesians talk – OpenCL in financial computations”

Download all OpenCL header files and build your own OpenCL library

opencl-logoOpenCL header files

When you develop professional software, it is a best practise to have external header files fixed, to have versioning under full control. This way you don’t get the surprises when your colleague has another OpenCL SDK installed. Luckily the Khronos Group has put all version of the OpenCL header files on Github, so you can easily download the targeted OpenCL version.

Download a zip of the header files here:

If you found problems in one of these, you can directly communicate with the working group by submitting an issue on Github.

OpenCL.lib / libOpenCL.so

But wait, there is more!

You can build your own ICD, as the sources are open (licence). OpenCL version 2.1 is implemented, but it is fully backwards compatible to OpenCL 1.0. You can assume that the vendors use this code for their own, so you can safely use this code in your project.

Get the project from Github.

All the members of the OpenCL working group 2010

(If you’re searching for companies who offer OpenCL-products and services, please visit OpenCL:Pro)

You probably have heard AMD is on the OpenCL working group of Khronos; but there are many more and they possibly all have plans to use it. Here is an overview, so you can make your own conclusions about the future that lays ahead. Is your company on “the list”?

We’re specially interested in the less known companies, so most information is about the companies you and us possibly have not heard from before. We’ve made  assumptions what the companies use OpenCL for, so we need your feedback if you think we’re wrong! Most of these companies have not openly written about their (future) accelerated products, so we had to make those guesses.

Disclaimer: All brand and product names are or may be trademarks of, and are used to identify products or services of, their respective owners.

Last updated 6-Oct-2010.

GPU Manufacturers

GPUs being the first products targeted by OpenCL, we blast away with a list of CPU-manufacturers. You might see some unknown companies and now know which companies missed the train; it is pretty clear why GPU-manufacturers have interest in OpenCL.
We skip the companies who have a GPU-stack built upon ARM-techology and only focus on pure GPU-manufacturers in this category.

AMD

We’ve already discussed the biggest fan of OpenCL several times. While having better GPU-cards than NVIDIA (arguable per quarter of the year), they put their bets completely on OpenCL. They even get credits like “AMD’s OpenCL” when compared with NVIDIA’s CUDA.

The end of 2010, beginning of 2011 they will ship their Fusion-product having a CPU and GPU on one chip. The first Fusion-chips will not have a high-end GPU because of heating problems, is told to PC-store employees.

NVIDIA

AMD’s biggest competitor with the very well marketed similar product CUDA. Currently they have the most specialised products in market for servers. While they put more energy in their own technology CUDA, it must be said that they have adopted OpenCL more than any other hardware vendor.

Intel

The biggest part of the CPU-market is for Intel en guess once, who has the biggest GPU-market in hands? Correct: onboard-GPUs are Intel’s speciality, but their high-end GPU Larrabee might once see the market. Just like AMD they have the technology (and products) to have an integrated CPU/GPU which will be very interesting for the upcoming OpenCL-market.

They are openly interested in OpenCL. Here is a nice interview which explains how a CPU-designer looks at GPU-designs.

Vivante

Vivante manufactures GPU-chips. They claim their OpenGL ES 2.0-compliant silicon footprint is the smallest on the market. There is a lot of talk about OpenGL Shader Language (OpenCL’s grandpa), for which their products are very well suited for. Quote: “The recent trend in graphics hardware has been to replace fixed functionality with programmability in areas that have grown exceedingly complex, such as vertex processing and fragment processing. The OpenGL® Shading Language was designed to allow application programmers to express the processing that occurs at those programmable points of the OpenGL pipeline. Independently compilable units written in this language are called shaders. A program is a set of shaders that are compiled and linked together.”

Takumi

Japanese corporation Takumi manufactures the GSHARK, a 2D/3D hardware accelerator. The focus is on shaders, like Vivante.

Imagination Technologies (ImTech)

From their homepage: >>POWERVR enables a powerful and flexible solution for all forms of multimedia processing, including 3D/2D/vector graphics and general purpose processing (GP-GPU) including image processing.

POWERVR’s unique tile-based, deferred rendering/shading architecture allows a very small area of a die to deliver higher performance and image quality at lower power consumption than all competing technologies. All major APIs are supported including OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX9/10.1 and OpenCL.<<

Currently all ARM-based OpenCL-capable devices have POWERVR-technology.

Toshiba

Like other huge Japanese everything-factories, you don’t know what else they make. Besides rice cookers they also make multimedia chips.

S3

Once they were big in the consumer-market of graphics cards, but S3 still exists as a more business-oriented manufacturer of graphics products.

CPU Manufacturers

We miss the Power Architecture, but IBM and Freescale are members of this group.

Intel

While AMD tries to make OpenCL available for the CPU, we have not heard of a similar product from Intel yet. They see a future for multi-core CPUs, as seen in these slides.

ARM

Most known for its same-named low-power processor, not supported by MS Windows. You can read below how many companies have a license on their technology. Together with POWERVR-technology they power all the embedded OpenCL devices of the coming year.

IBM

Currently they are most known for their Cell-processor (co-developed with Toshiba and Sony) and have a license to build PowerArchitecture-CPUs. The Cell has full OpenCL-support as first non-GPU. Older types of PS3s (without the latest firmware) ad IBM’s servers can use the power of OpenCL. End of June 2010 Khronos conformed their “Development Kit for Linux” for Power VMX and PowerXCell8i processors.

Freescale

Once a Motorola-division, they make lots of different CPUs. Besides ARM- and PowerArchitecure-based ones, they also have it’s own ‘Coldfire’. We cannot say for which architecture they are interested in OpenCL, but we really would like to hear something from them since they can open many markets for OpenCL.

Systems on a Chip (SoC)

While it is cool to have a GPU-card in your pc, more and more the Graphics-functionality is integrated onto a CPU. Especially in the mobile/embedded/gadget-market you’ll find such System-on-a-Chip solutions, which are actually all ARM- or PowerArchitecture based.

3DLABS (ZiiLabs)

Creators of embedded hardware with focus on handhelds. They have partners of Khronos for a long time, having built the first merchant OpenGL GPU, the GLINT 300SX. They have just released a multimedia-processor, which is an ARM-processor with pretty interesting graphic capabilities.

They have an “early access program for OpenCL” for their ZMS product line.

Movidia

On their Technology overview-page they imply they have flexible accelerators in their designs, which *could* in the future be controlled by OpenCL-kernels. They manufacture mobile GPUs-plus-loads-of-extras which are quite impressive.

Texas Instruments

Besides ARM-based processors they also have DSPs. We watch them, for which product they have OpenCL in mind.

Qualcomm

They might be most famous for their ARM-based Snapdragon-chipset. They have much more products, but we think they start with Snapdragon before building OpenCL in other products.

Apple

The Apple A4 powers their new products, the iPad. It becomes more and more clear Apple has really learned that you cannot rely on one supplier, after waiting for IBM’s G6. With OpenCL Apple can now make software that works on ARM, all kind of GPUs and CPUs.

Samsung

They make anything that is fed by batteries, so for that reason they should be in the “other” category: mobile phones, mp3-players, photo-cameras, camcorders, laptops, TVs, DVD-players and Bluray-players. All products where OpenCL can wield.

A good reason to make their own semi-conductors, ARM-based.

In the beginning of June 2010 they have launched their own Linux-based OS for mobiles: Bada.

Broadcom

Manufactures networking and communications ICs for data, voice, and video applications. They could use OpenCL for their mobile multimedia processors.

Seaweed

Since September acquired by Presagis. We cannot be sure they continue the OpenCL-business of Seaweed, but at least GPGPU is mentioned once.

Presagis is “the worldwide leader in embedded graphics solutions for mission-critical display applications.  The company has provided human-machine interface (HMI) graphical modeling tools, drivers and devices for embedded systems for over 20 years. Presagis pioneered both the prototyping of display graphics and automatic code generation for embedded systems in the 1990s. Since then, code generated by its flagship HMI modeling products  has been deployed to hundreds of aircraft worldwide and its software has been certified on over 30 major aircraft programs worldwide.   Presagis is your trusted partner for reliable, high-performance embedded graphics products and services.”

ST Microelectronics

ST has many products: “Singapore Technologies Electronics is a leader in ICT. It has main businesses in Enterprise, Satellite Communications and Interactive Digital Media. It is divided into several Strategic Business Units consisting of Info-Comms, Info-Software, Training and Simulation, Electro-Optics, Large Scale Group, Satcom & Sensor Systems.”

We think they’ve shown interest for OpenCL for use with their Imaging processors. Together with Ericsson they have a joint-venture in de mobile market, ST-Ericsson.

Handheld Manufacturers

While most companies will find it hard to make OpenCL-business in the consumer-market, consumer-products of other companies make sales a little bit warmer.

Apple

At least the iPad and iPhone have hardware-capabilities of running OpenCL. It is expected that it will come available in the next major release of the iPhone-OS, iOS 4. We’re waiting for more news.

Nokia

The largest manufacturer of mobile phones from Finland has a lot of technology. Besides smartphones, possibly a netbook (in cooperation with Intel) they also have Symbian and the QT-library. Since a while QT has support for OpenCL. We think the support of OpenCL in programming languages (in a more high-level way) is very important. See these slides to read some insights of the company.

Motorola

They have consumer products like mobile phones and business products like networking. It is not clear where they are going to use OpenCL for, since they mostly use other companies’ technologies.

Super-computers

While OpenCL can revive old computers once upgraded with a new GPU, imagine what they can do with Super-computers.

IBM

IBM builds super-computers based on different technologies. With OpenCL-support for their Power VMX and PowerXCell8i processors, it is already possible to use OpenCL with IBM-hardware.

Fujitsu

They have many products, but they also make super-computers which use GPGPU.

Los Alamos National Laboratory

They build super-computers and really can use the extra power.

A job-post talks about heterogeneous architectures and OpenCL.

Petapath

Petapath, founded in 2008, focuses on delivering innovative hardware and software solutions into the high performance computing (HPC) and embedded markets. As can be seen from their homepage they build grids.

NVIDIA

As a newcomer in the super-computer business, they do very well having helped to build the #2 HPC. Many clusters are upgraded with their streaming-processors.

Other Hardware

We don’t know what they are actually doing with the technology, purely because they are to big to make assumptions.

GE

US-based electronics-giant General Electronics builds everything there is, fed by electricity and now also GPGPU-powered solutions as can be found on their GPGPU-page. They probably switched to CUDA.

ST-Ericsson

Ericsson together with ST they have a joint-venture in de mobile market, ST-Ericsson. Ericssson is big in (mobile) networking. It also builds mobile phones with Sony. It is unclear what the joint-venture wants to do with the technology, but it must be mobile.

Software Developers

While OpenCL is very close to hardware, we have to talk software too. Did anybody say there is a strict line between hardware and software?

Graphic Remedy

Builders of debugging software. You will hear later more from us about this company soon. See something about debugging in this presentation.

RapidMind

RapidMind provided a software product that aims to make it simpler for software developers to target multi-core processors and accelerators (GPUs). It was acquired by Intel in august 2009.

HI

Japanese corporation HI has a product MascotCapsule, which is a real-time 3D rendering engine (native library) that runs on embedded devices. We see names of other companies, except SMedia. If you’re not familiar with mobile GPUs, here you have a list.

This is another big hint, OpenCL will have a big future on mobile devices.

MascotCapsule V4 product specification

Operating
environment
CPU ARM: ARM9 or above
Freescale: i.MX Series
Marvell: XScale
Qualcomm: MSM6280/6550/7200/7500 etc.
Renesas Technology: SH-Mobile etc.
Texas Instruments: OMAP
32-bit 150 MHz or above is recommended
(Capable of running without a floating-point hardware)
Code size Approx. 200 KB
Engine
work area
2 MB or more is recommended, including data load area
Note: The actual required work area varies depending on the content
3D hardware
accelerator
ATI: Imageon
Imagination Technologies: PowerVR MBX/MBX Lite/SGX
NVIDIA: GoForce
SMedia: Glamo
TAKUMI: GSHARK
Toshiba: T4G/T5G
Other OpenGL ES compliant 3D accelerators
OS/platforms BREW, iPhone, iPod touch, ITRON, Java, Linux, Symbian OS, Windows CE, Windows Mobile
3D authoring tools 3ds Max 9.0/2008/2009/2010
Maya 8.5/2008/2009/2010
LightWave3D 7.5 or later
SOFTIMAGE|XSI 5.x/6.x/7.0

Codeplay

They are most famous for their compilers for the Playstation. They also make code-analysis software.

QNX

From their homepage: “Middleware, development tools, realtime operating systemsoftware and services for superior embedded design”. Their real-time OS in all kinds of embedded products and they might want to see ways to support specialised low-power chips.

RIM acquired QNX in april 2010.

Fixstars

Newcomer in the list 2010. Famous for their PS3-Linux and for their OpenCL-book. They also have FOXC, Fixstars OpenCL Cross Compiler. They have written one of the few books for OpenCL.

Kestrel Institute

http://www.kestrel.edu/ does not show anything GPGPU. We’ll probably hear from them when the next version of their Specware-product is finished.

Game Designers

Physics-calculations and AI are too demanding to do on a CPU. The game-industry keeps pushing the GPU-industry, but now on a different way than in the 90’s.

Electronic Arts

This game-studio builds loads and loads of games with impressive AI. See these slides to see what EA thinks GPGPU can do.

Activision Blizzard

Yes, they are one company now, so now they are together famous for best-selling hit “World of Warcraft”. Currently not much is known where they use OpenCL for, but probably the same as EA.

Thank you for your interest in this article

If you know more about OpenCL at these companies or job-posts, please let us know via comment or via e-mail.

We’ve made some assumptions about what these companies use OpenCL for – we need your feedback!

Self-Assessment GPGPU-role

As we’re not a university but a company, there needs to be a balance between things you can offer and the things we offer. Like in every job description, there is a list of bullet points to explain what we seek. To make it possible to self-asses your fitness for the job, we’ve put the number of points (✪) for each bullet point.

INSTRUCTIONS. For each section, assess yourself as being a:

  1. beginner: have been in contact with it briefly
  2. junior: had some experience, but not difficult problems
  3. medior: had more experience, but cannot coach others yet
  4. senior: experienced enough to coach others to really advance on this subject
  5. lead: can teach new things to a senior
  6. principal/master/guru: one of the world’s best

You need to look for the level where you get the most points. For example, if you are a master in C++ but are a medior in C and math, it might actually be best to assess as a medior and mention your C++ knowledge specifically. Or for example, if you are sure you can successfully finish a tutorial on GPGPU, you’re a beginner.

Real question to answer before applying: do you want to become a senior in GPGPU?

If you have the imposter syndrome, don’t be too harsh on yourself. If you’re overconfident, be realistic. If you worry you’re both, pick imposter syndrome only.

Heads up. During the interviews, we ask questions for the above self-assessment. If you assess yourself as a senior for GPU-coding because you were the best-of-class, you’ll get questions. And there is no person who can be defined by lists, so do mention where you stand out.

You, as a CPU Developer (9 items, 18 weights)

We seek people with experience. This can be open source projects for first jobseekers, or past jobs for those with job experience.

  1. You are capable of designing mathematical algorithms, both serial and parallel. ✪
  2. You are strong in math and hard sciences. ✪✪
  3. You know how compilers work, and you are unfortunate enough to know when they don’t. ✪✪
  4. You are experienced in C. ✪✪
  5. You are experienced in C++. ✪✪
  6. You have experience designing performance-driven architectures. ✪✪
  7. You know how to write tests. ✪✪✪
  8. You are experienced with continuous development. ✪✪
  9. You have experience with low-level optimizations. ✪✪

You, as a GPU Developer (6 items, 15 weights)

We seek people with experience. This can be open source projects for first jobseekers, or past jobs for those with job experience.

  1. You have read “hardware architecture specification documents” or ISA-docs. ✪
  2. You know how GPU-compilers work, and you are unfortunate enough to know when they don’t. ✪✪
  3. You are experienced in CUDA and/or HIP. ✪✪✪✪
  4. You are experienced in OpenCL and/or SYCL. ✪✪✪✪
  5. You know your way around with GPU-libraries. ✪
  6. You are experienced in porting algorithms to the GPUs, without the use of any library. ✪✪✪

As the four stars indicate, we do need minimal GPGPU-experience, as you won’t learn it here.

You, as a Problem Solver (8 items, 21 weights)

Coding is only one part of the solution. Most of the time we’re solving problems, where coding is just the means.

  1. You like the ideas and theories around the “learning mindset”. ✪✪✪
  2. You have a structured problem-solving approach that you could explain. ✪✪✪
  3. You have high self-awareness and can self-observe. ✪✪✪✪
  4. You have high standards for yourself. ✪✪
  5. You test out approaches by making quick experiments. ✪✪
  6. You test out possible solutions by mentally putting them in different scenarios. ✪
  7. You regularly take time to zoom out to get an overview on the problem, to be able to balance the inputs for the solution. ✪✪✪✪
  8. You always follow through. ✪✪

If you score high here, this will compensate for any lack of technical experience. Also for continuous growth, you’ll need to score high here.

You, as a Project Team Member (10 items, 20 weights)

Our company’s strength is that we work in teams. We don’t know everything as individuals, but as a team we can solve almost any problem around HPC and GPUs. This means we highly value collaboration and thus must be efficient in project handling.

  1. You have a proven track record of being focused on results. ✪
  2. You have a talent for turning vague problems into the right actions, and you want to build on it. ✪
  3. You normally write down tasks, and then prioritize & ESTIMATE them. ✪✪✪
  4. You understand that well-defined, well-communicated delivery criteria are the responsibility of every team member. ✪✪✪
  5. You can identify something’s missing to move a project forward smoothly. ✪✪✪
  6. You speak up when the project diverges from the trajectory. ✪✪
  7. You are used to administrating your time spent on an issue. ✪
  8. You can delegate work. ✪✪
  9. You can get work delegated. ✪✪
  10. You can explain, with examples, why the above are important. ✪✪

We explicitly did not state “project management”. It is about playing your part of making an efficient team.

What’s next?

If you got at least junior on CPU, Team, and problem-solving, beginner on GPU, and at least one of the 4 on medior? Then you should apply. Go to https://streamhpc.com/jobs/ for the instructions and links to other articles that should help you with understanding if this is a job for you.

Understand that if you are a true beginner in GPGPU, it’s best to follow the tips&tricks explained here.

We have been awarded the Khronos project to upgrade the OpenCL test suite to 2.2!

Some weeks ago we started with implementing the Compiler Test Suite for OpenCL 2.2. The biggest improvement of OpenCL 2.2 is C++ kernels, which originally was planned for 2.1. SPIRV 1.1 is another big improvement.

We are very happy to have a part in making OpenCL better! We find OpenCL C++ kernels very important, even if it has its limitations. Thanks to SPIRV 1.1 it gets easier to have more (unofficial) kernel languages next to C and C++, and to get SYCL. Also upgrading from 2.0 to 2.2 is rather easy thanks to the open source libclcxx.

Personally I found this project to also be very important for our internal knowledge building, as almost every function would be touched and discussed.

OpenCL 2.2 CTS RFQ has been awarded to StreamHPC

Khronos issued a Request For Quote (RFQ) back in September 2016 to enhance and expand the existing OpenCL 2.1 conformance tests to create an OpenCL 2.2 test suite to be used to define conformance for OpenCL 2.2 implementations. The contract has been awarded to StreamHPC. StreamHPC is a software consultancy company specialized in performance tuned software development for CPU, GPU and FPGA. A large part of their clients hires them for their OpenCL expertise.

Already improvements have been added, bugs splatted and documentation improved. We hope to continue this the coming months!

We’ll be ready in March. Hopefully the first implementations are ready by then, as there is a test suite ready to iron out any bug discovered. Which three OpenCL drivers do you think will be first to have OpenCL 2.2? Intel, AMD, NVidia, ARM, Imagination, Qualcomm, TI, Intel FPGA (Altera), Xilinx, Portable OpenCL or another?

AMD gDEBugger 6.2 for Linux

The printf-funtion in kernels isn’t the solution to everything, so hence profilers and debuggers specially tailored for GPU-programming. On Windows there is a lot of choice, but mostly only if you have a paid version of Visual Studio. On Linux you have GDB, but that program is not really user-friendly for the GUI-lovers.

For AMD there is now gDEBugger again available for Linux. Again, as version 5.8 by Gremedy worked with Linux, after AMD bought the company it got Windows-only for version 6. A few weeks ago, 10 months after 6.0, Linux-binaries got back with version 6.2. It supports OpenCL 1.2, OpenGL 3.2 and quite some extensions. As only AMD is supported, later more on debugging OpenCL-applications on NVidia and Intel.

Installation is quite straightforward. For creating a menu-item, you’ll find an useful image in /opt/gDEBugger6.2.xxx/tutorial/images/.

Continue reading “AMD gDEBugger 6.2 for Linux”

Professional and Consumer Media Software using OpenCL

OpenCL_Logo

More and more professional media software now has support for OpenCL. It starts to be a race where you cannot stay behind. If the competitor runs more than twice as fast on the same hardware, then you just can’t say “Sorry, you should buy NVIDIA hardware”. I expected this to happen, but could not tell in what industry they would run fastest. Seems it is fluid dynamics, video-editors and photo-editors.

AMD and Intel mostly have been selected as collaboration partners. Apple has been a main drive, especially with the introduction of their new MAC Pro with two high-end AMD FirePro GPUs.

Sony Catalyst Family

Sony released three new software packages to support video professionals in pre- and post-production.

Sony-catalyst

This new family of products, Catalyst Browse (media management), Catalyst Prepare (video preproduction assistant) and Catalyst Edit (4K and Sony RAW video editing) has OpenCL support from the start.

Colorfront Express Dailies and On-Set Dailies

This software is an on-set dailies processing system (playback and sync, QC, colour grading, audio and metadata management).

The 2014 versions have OpenCL support in their transcoder plugin, Transkoder.

CGE05_a_OnsetDailies

RED REDCINE-X PRO

redcine-x

REDCINE-X is a coloring toolset, integrated timeline, and post effects collection in a professional, flexible environment for your 4K or 5K .R3D files. RED has added support for OpenCL in build 22.

The Foundry Nuke Blink framework

As presented on GPUconf, The Foundry has opened their framework for running OpenCL kernels. It creates OpenCL-kernels (optimised for AMD or NVIDIA) from C++ Blink kernels.

nukestudio

NUKE studio is a node-based VFX, editorial and finishing studio. As with most products on this page, look for the “reel” to get a nice demo of its capabilities.

Magix Hybrid Video Engine

Video Deluxe and Movie Edit both have OpenCL support since 2012, thanks to the new shared video engine.

http://www.youtube.com/watch?v=27M7vJIYR3c

Adobe CS6 creative suite

Adobe has entered the OpenCL market publicly with . With Premiere Pro (video editing) and Photoshop (photo-editing) two main products with advanced GPU-acceleration via OpenCL.

http://www.youtube.com/watch?v=F3LwNT1QUPQ

Video on GPU-effects on Premiere Pro CS5.

FAQ on GPU-acceleration on Photoshop CS6.

Sony Vegas Pro

Vegas Pro is a video editing software package for non-linear editing systems, and has OpenCL support since version 10d. Also in the consumer version (Sony Movie Studio) there is OpenCL-support.

Sony-Vegas-Pro-13

RealFlow Hybrido2 engine

RealFlow is fluid dynamics software and its new engine Hybrido2 has support for OpenCL since this year. And you just have to love their commercial videos.

http://www.youtube.com/watch?v=Fj0err96BbQ

Autodesk Maya

Maya is a toolsets to help create and maintain the modern, open pipelines you need to address today’s challenging 3D animation, visual effects, game development and post-production projects. Since the 2013 version it is accelerated for physics simulations via Bullet and OpenCL.

http://www.youtube.com/watch?v=36bIdH6EBkM

ArcSoft SimHD and Sim3D engine

ArcSoft media-engines SimHD and Sim3D have OpenCL support since several years and are used in several of their  products.

http://www.youtube.com/watch?v=tvXyLKEeX2I

simHD

BlackMagic Design

BMD has two suites which use OpenCL, Resolve and Fusion. DaVinci was acquired in 2009 and EyeOn in 2014.

(DaVinci) Resolve

Resolve has real-time colour correction thanks to OpenCL.

http://www.youtube.com/watch?v=lfrudtCTwv0

(Eyeon) Fusion

EyeonFusionScreenshotSmall

Fusion is an image compositing software program created by eyeon Software Inc. It is typically used to create visual effects and digital compositing for film, HD and commercials.

It uses OpenCL since version 6.

Roxio Creator Suite

Roxio uses OpenCL for accelerated rendering in their suite. They were one of the first to implement OpenCL – I think already in 2010, before OpenCL was even cool.

boxshot-creator

Unluckily they don’t have much information – just a mention that they have support.

Apple Final Cut Pro and iMovie

Apple has support in Final Cut Pro X, Motion 5 and Compressor 4.

finalcutprox_magnetic

Also iMovie works a lot faster when you have an OpenCL capable MAC.

Blender Cycles & Bullet

You cannot find any demonstration of new video hardware without Big Bucks Bunny, the short CG movie created with Blender.

It uses OpenCL in two parts: physics simulations (Bullet) and compositor (Cycles).

http://www.youtube.com/watch?v=QbzE8jOO7_0

Side Effects Houdini

Houdini is a procedural node based 3D animation and visual effects tools for film, broadcast, entertainment and visualisation production.

http://vimeo.com/46444204

zMatte_4bDigitalFilmTools

There is support for OpenCL in zMatte, Composite Suite Pro and Film Stocks since Q4 2013.

zMatte is a keyer for blue and green screen composites. Composite Suite Pro is a collection of visual effects plug-ins. Film Stocks simulates color and black and white still photographic film stocks, motion picture films stocks and historical photographic processes.

OTOY OctaneRender 3

OctaneRender is a GPU-based, real-time 3D, unbiased rendering application. In March 2015 OTOY announced OctaneRender 3, which has full OpenCL support:

OpenCL support: OctaneRender 3 will support the broadest range of processors possible using OpenCL to run on Intel CPUs with support for out-of-core geometry, OpenCL FPGAs and ASICs, and AMD GPUs.

Below is a reel of OcateRender 2 with CUDA. According to OTOY the performance on AMD and NVidia is comparable.

https://www.youtube.com/watch?v=gLSBVt0VQSI

SAM Alchemist XF

SAM-alchemist-XF

Alchemist XF supports format and framerate conversion from SD up to 4K for a wide variety of file formats at high speed.

More?

There is a lot more OpenCL-powered software coming up rapidly (we hear things). But we also missed (or accidentally forgot) software. Please help making this list complete and send us an email.

Apple’s dragging OpenCL compiler problem

OSX-brokenRemember the times that the OpenCL compilers where not that good as they’re now? Correct source-code being rejected, typos being accepted, long compile times, crashes during compiling and other irritating bugs. These made the work of an OpenCL developer in “the old days” quite tiresome – you needed a lot of persistence and report bugs. Lucky on desktops the drivers have improved a lot.

Apple’s buggy OpenCL compiler

Now to Apple. There have always been complaints about the irritating bugs that were in Apple’s compiler. Recently the Luxrender community started to make more complaints, as the guy responsible for the OSX port decided to quit. This was due to utter frustration: code that worked on every other OS, simply did not work on OSX. Luxrender’s Paolo Ciccone stood up and made this extremely public, by writing an open letter to Apple’s CEO Tim Cook (posted below).

The letter is not specific about the kind of bugs and and therefore asked him via Twitter which were the bugs he was talking about. He explained me that it’s very simple:

https://twitter.com/RealityPaolo/status/595972568961519616

Here at StreamHPC we could write around those bugs in most cases, but Luxrender has bigger and more complex kernels than we used in our projects – then it’s simply impossible to write around, as the compiler simply crashes. It seems that OSX still has those old compilers, Linux and Windows used to have years ago.

Metal

Metal is the OpenCL-alternative on iOS 8 and up.

If you’re thinking that Metal could be a reason – that language looks very much like OpenCL, as it’s simply OpenCL as Apple would like it to be. Porting between the two languages is therefore quite simple. This also means that with some small fixes a Metel-kernel could be compiled by existing OpenCL-compiler. Ok, there is much more than the compute part, but the message is that more complex Metal wouldn’t be possible using this driver-stack.

If we end up in a situation that Metal comes to OSX and is more stable than OpenCL, only then we can say that Apple tries to block OpenCL in favour of their own APIs.

The letter

I’m really happy that Paolo Ciccone had the guts to publicly complain. This is the letter he wrote:

Dear Mr. Cook.

I’m sorry to bother you but we have tried all other channels and nothing worked.

I’m part of a group of developers of a physically-based renderer called LuxRender. LuxRender has been written to use OpenCL to accelerate its enormous amount of computation necessary to generate photo-realistic scenes. You can see some of the images generated by Lux at http://luxrender.net. Lux is an Open Source program.

Apple has defined OpenCL and we have adopted this API instead of the proprietary CUDA in order to be able to work with all kind of hardware on all major platforms. It made sense for an OSS to use an open standard.

The reason why I’m writing to you is that, after waiting for years, we still have broken GPU drivers on OS X. Scenes that render perfectly well on Windows and even on Linux simply abort on OS X. This is happening with both AMD and nVidia GPUs.

The problem is unsolvable from our side. We need updated, fixed drivers for OS X. The problem is so bad hat our main OS X developer has announced, today, that he is giving up OS X. He simply can’t do his job.

I kindly request that you look into this and give us working AMD and nVidia drivers in an upcoming, possibly soon, update of OS X. We are more than willing to work with your engineers, if you need any kind of specific help in identifying the problem.

Thank you for your attention.

Paolo Ciccone

If you want to help, also post this letter on your blog or in a forum. The more this is shared, the better. Especially Apple’s forum, asking for the official statement.

Our Onboarding Process

We currently have 4 to 6 months for onboarding. This helps you, as our new colleague, getting used to the different environment, the company’s priorities and ways of working. You’re not left alone.

What requirements we signal before hiring

The self-assessment contains all the elements we try to foster. It shows 4 main areas:

  • CPU programming & algorithms
  • GPU programming & performance oriented programming
  • Problem solving
  • Collaboration, planning & prioritization

Each of these, and some personal skills, we discuss, improve and measure. 4 to 6 months is never enough to improve on everything, but it is enough to focus on the items where most needs to be done. It’s certainly enough time to understand what quality means and how to get there.

It’s easy to get confused about quality. Quality might mean deluxness, fanciness, expensiveness. That’s not what quality means. Quality means meeting spec. Keeping the promise. Doing exactly what you said you would do. It is measured in consistency.

Seth Godin

First week(s)

Getting started means you get everything on your desk at once. Best to have a checklist:

  • Creation of accounts
  • Introduction to others via chat
  • Welcome call
  • Learning the basics of the development and collaboration environments
  • Determine a learning plan
  • Implement a time management system
  • Get necessary hardware, including noise-canceling headphones.
  • Make technical contributions from the first day, as this provides:
    • Measurable contributions
    • Time management

Months 1-4

With a learning-plan we try to get you to a certain target-level in all 4 areas. We have defined 20 levels, where each 5 levels you go from beginner to junior to medior to senior. It depends on the person how quick this goes.

  • Trainings:
    • Problem-solving
    • Gitlab, Mattermost and other services
    • GPU and CPU coding
    • Special focus subjects according to the learning-plan
  • Self-study
  • Weekly check-ins with HR:
    • Reflections
    • Planning
    • Time management
    • Obstacles
    • Learning plan
  • Daily reflection exercises (journaling)
  • Learn by doing:
    • Do isolated non-customer projects
    • Work on customer-projects
    • Give and receive code-reviews
    • Discuss approaches with colleagues

In both week 3 and week 11 there is a discussion on continuation. We don’t expect the answer is no, but we always have these discussions. We found that clear communication on yes/maybe/no gives an ease of mind, as expectations and goals are synchronized and on the table.

Post-onboarding

The time after onboarding still has the same learning-focused progress, but with less help from outside. You should know your colleagues well enough, to find ways to get to your goals.

Scientific Visualisation of Molecules

In many hard sciences focus is on formulas and text, whereas images are mainly graphs or simplified representations of researched matters. Beautiful visualisations are mainly artist’s impressions in popular media targeting hobby-scientists. When Cyrille Favreau made the first good-working version of his real-time GPU-accelerated raytracer, he saw potential in exactly this area: beautiful, realistic visualisations to be used in serious science. This resulted in software called IPV.

He chose to focus on rendering molecules of proteins and this article discusses raytracing in molecular sciences, while highlighting the features of the software.

This project has been discussed on GPU Science, but this article looks at the the software from a slightly different perspective. If you don’t want to know how the software works and what it can do, scroll down for a download-link.

Continue reading “Scientific Visualisation of Molecules”

The current state of WebCL

Years ago Microsoft was in court as it claimed Internet Explorer could not be removed from Windows without breaking the system, while competitors claimed it could. Why was this so important? Because (as it seems) the browser would get more important than the OS and internet as important as electricity in the office and at home. I was therefore very happy to see the introduction of WebGL, the browser-plugin for OpenGL, as this would push web-interfaces as the default for user-interfaces. WebCL is a browser-plugin to run OpenCL-kernels. Meaning that more powerful hardware-devices are available to JavaScript. This post is work-in-progress as I try to find more resources! Seen stuff like this? Let me know.

Continue reading “The current state of WebCL”