Install OpenCL on Debian, Ubuntu and Mint orderly

Libraries – can’t have enough

If you read different types of manuals how to compile OpenCL software on Linux, then you can get dizzy of all the LD-parameters. Also when installing the SDKs from AMD, Intel and NVIDIA, you get different locations for libraries, header-files, etc. Now GPGPU is old-fashioned and we go for heterogeneous programming, the chances get higher you will have more SDKs on your machine. Also if you want to keep it the way you have, reading this article gives you insight in what the design is after it all. Note that Intel’s drivers don’t give OpenCL support for their GPUs, but CPUs only.

As my mother said when I was young: “actually cleaning up is very simple”. I’m busy creating a PPA for this, but that will take some more time.

First the idea. For developers OpenCL consists of 5 parts:

  • GPUs-only: drivers with OpenCL-support
  • The OpenCL header-files
  • Vendor specific libraries (needed when using -lOpenCL)
  • -> a special driver
  • An installable client driver

Currently GPU-drivers are always OpenCL-capable, so you only need to secure 4 steps. These are discussed below.

Please note that in certain 64-bit distributions there is not lib64, but only ‘lib’ and ‘lib32’. If that is the case for you, you can use the commands that are mentioned with 32-bit.

Continue reading “Install OpenCL on Debian, Ubuntu and Mint orderly”

OpenCL vs CUDA Misconceptions

Translation available: Russian/Русский. (Let us know if you have translated this article too… And thank you!)

Last year I explained the main differences between CUDA and OpenCL. Now I want to get some old (and partly) false stories around CUDA-vs-OpenCL out of this world. While it has been claimed too often that one technique is just better, it should be also said that CUDA is better in some aspects, whereas OpenCL is better in others.

Why did I write this article? I think NVIDIA is visionary in both technology and marketing. But as I’ve written before, the potential market for dedicated graphics cards is shrinking and therefore forecasting the end of CUDA on desktop. Not having this discussion opens the door for closed standards and delaying innovation, which can happen on top of OpenCL. The sooner people & companies start choosing for a standard that gives equal competitive advantages, the more we can expect from the upcoming hardware.

Let’s stand by what we have learnt at school when gathering information sources, don’t put all your eggs in one basket! Gather as many sources and references as possible. Please also read articles which claim (and underpin!) why CUDA has a more promising future than OpenCL. If you can, post comments with links to articles you think others should read too. We appreciate contributions!

Also found that Google Insights agrees with what I constructed manually.

Continue reading “OpenCL vs CUDA Misconceptions”

Intel’s OpenCL SDK examples for GCC

Update august 2012: There is a new post for the latest Linux examples.

Note: these patches won’t work anymore! You can learn from the patches how to fix the latest SDK-code for GCC and Linux/OSX.

Code-examples are not bundled with the Linux OpenCL SDK 1.1 beta. Their focus is primarily Windows, so VisualStudio seems to be a logical target. I just prefer GCC/LLVM which you can get to work with all OSes. After some time trying to find the alternatives for MS-specific calls, I think I managed. Since ShallowWater uses DirectX and is quite extensive, I did not create a patch for that one – sorry for that.

I had a lot of troubles getting the BMP-export to work, because serialisation of the struct added an extra short. Feedback (such as a correct BMP-export of a file) is very welcome, since I the colours are correct. For the rest: most warnings are removed and it just works – tested with g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 on 64 bit (llvm-g++-4.2 seems to work too, but not fully tested).


Continue reading “Intel’s OpenCL SDK examples for GCC”

InsideHPC: SuperComputing. Where to from here?

In this video, Moderator Bob Feldman hosts a session entitled: Supercomputing: Where to from Here? Recorded at the National HPCC Conference 2011 in Newport.

Dr. Eng Lim Goh, SGI
Bill Feiereisen, Intel
Shumel Shottan, BlueARC
Steve Lyness, Appro International, Inc.
Marc Hamilton, HP Americas

Below is a summary of what is told. It is just my notes, so go to the times mentioned to listen to the exact answers. Some details I did not write down, you might think are important, but I did not (or missed as I English is not my mother-tongue).

Continue reading “InsideHPC: SuperComputing. Where to from here?”

28 June: OpenCL course in Utrecht, NL

At 28 June 2011 StreamCompting will give a 1-day course on OpenCL in Utrecht. As it is quite new, the priced is reduced. Also if you want to learn CUDA or any other GPGPU-language, this course is also a good option for you. The most important thing about GPGPU are the concepts. In other words the “why” they chose to make GPGPU-languages ike this. In my course you will get it after a one-day training. Most of the day consists of lectures with a short lab-sessions. The training makes use of a unique block-method, so you learn the technique top-down and almost can fill in the spaces yourself. At least 2 years of thorough programming-experience in Java, C++ or Objective C is preferred, because of the level of the subjects. The following is discussed with the big why-question as leading:

  • OpenCL debunked: getting to understand how OpenCL is engineered.
  • Algoritms: which can be sped-up with GPGPU/OpenCL and which not.
  • Architectures & Optimalisations: why does one OpenCL-program work better on one architecture and not on another.
  • Software-engineering: wrapper-languages, code re-use and integration in existing software.
  • Debugging: not the screenshots, but giving you insight in how the memory-models work.

The lab-sessions are very minimal; you get (fully documented) homework which you can do the subsequent week (with assistance via mail). If you prefer to have extensive lab-sessions, please inform to the possibilities. After the session and the homework you’ll be able to decide on your own what kind of software can be sped up by using OpenCL and which not. Als you will be able to integrate OpenCL into your own software and engineer OpenCL-kernels. Note that the advances you make depend heavily on your seniority in programming. If all attendees are Dutch, it is given in Dutch. Future sessions will be in other cities, so if you prefer to receive training more local or at your company, please ask for the possibilities.

If you want more information, contact us.

MS just did not port Windows 8 to ARM

With a lot of fanfare Microsoft said they would offer Windows 8 in both an X86 as an ARM version. I was happy to see that Microsoft was innovating again after 10 years, and even saw loads of advantages of their Java-clone .NET. But then I started to read into Windows CE, Windows Embedded Compact, Windows Embedded Standard, Windows Mobile and Windows 8 (Desktop). I want to share this with you, even if it has not anything to do with OpenCL.

So they did port Windows to ARM which evolved to the 2012 version of the OS, but did not port Windows 8.0 from scratch. Below you can read why.

Continue reading “MS just did not port Windows 8 to ARM”

AMD OpenCL Presentation as OpenDocument

You remember AMD’s OpenCL University Kit? It was for universities and completely written in PPTX. (For people who are on university: PPTX is a undocumented document-form which claims to be open and actually works well with an editor/viewer of only one vendor). So I took the freedom to convert all documents to ODF, so anybody can open them.

Download it here: AMD OpenCL University Kit as ODF.

It has 13 chapters, covering all the basics you need to know for further study. Say “thanks AMD” and enjoy!

StreamHPC’s Newsletter

stapel_krantenWanting to know what really happens in the world of OpenCL? StreamHPC’s monthly newsletter is the most complete and independent source around the business and techniques around OpenCL. Subscribe, because the written news doesn’t always end up on this blog.

StreamHPC hates spam and will use the subscription-information only for the newsletter.

Sign up for our Newsletter
* = required field

I hope you enjoy it!

WebCL – a next step

WebGL is already secured to be a success; only IE-users will not have the 3D-web without plugin. But once sites like Wikipedia starts to offer 3D-imagery of the human body and buildings (as we know in Google Earth’s KML-format), things can go really fast in favour of the WebGL-supported browsers. This is important, because the balance between the computers/smartphones and the servers (you know: internet) just got somewhat more connected. I was first somewhat critical, because I want the web to have content (text and images) and not be “an ultimate experience” – luckily it turned out to be good for the content. I’m looking forward to Wikipedia and hardware accelerated services like Streetview!

A possible next step would be WebCL. But is it technically possible? And what would the internet-landscape be to be ready for such thing? Khronos did mention to be working on such technique, according to this article. But not much attention was given to it. So I was happy to see a GSOC11 proposal WebCL-plugin for Firefox by Adrien Plagnol. They even have some code. But it was already finished for Firefox 4 (Windows and Linux), I learnt about a week ago.

WebCL by Nokia

It is very simple: it is a Javascript-version of the host-specific OpenCL code. Kernels are just kernels as we know them.

Nokia has put together a very nice WebCL homepage, which contains tutorials. And at lesson one we see how it looks like:

function detectCL() {
  // First check if the WebCL extension is installed at all

  if (window.WebCL == undefined) {
    alert("Unfortunately your system does not support WebCL. " +
          "Make sure that you have both the OpenCL driver " +
          "and the WebCL browser extension installed.");
    return false;

  // Get a list of available CL platforms, and another list of the
  // available devices on each platform. If there are no platforms,
  // or no available devices on any platform, then we can conclude
  // that WebCL is not available.

  try {
    var platforms = WebCL.getPlatformIDs();
    var devices = [];
    for (var i in platforms) {
      var plat = platforms[i];
      devices[i] = plat.getDeviceIDs(WebCL.CL_DEVICE_TYPE_ALL);
    alert("Excellent! Your system does support WebCL.");
  } catch (e) {
    alert("Unfortunately platform or device inquiry failed.");

As you can see this is very understandable code, if you know the basics of OpenCL and JavaScript. It is built for stability, so it seems to crash less easily than I expected.

I’ve written/tweeted a lot about OpenCL wrappers and how I think the OpenCL-ecosphere advances mainly by the growing up of the wrappers. Complaints about the far too long initialisation of OpenCL-software can easily be put in just a few lines of code. We now start from scratch again, but I will not be wonder-struck if there will be a jQuery-plugin released soon.


In the first place, think real-time encryption which can be adapted per user without the browser knowing. There are many more reasons all going back to the demand to have a browser-based computer (like Google is trying with its ChromeOS). All OS-APIs need to be available in a HTML5-like language and this is exactly that.

What are you still doing here? Install the Opencl-plugin for Firefox 4 and try Nokia’s online OpenCL-sandbox now! +1 for crashing it, +2 for sending in a bug-report.

The history of the PC from 2000 – 2012

After IBM-compatible clones took over from Apple, Atari and ZX Spectrum, we just got used to that a PC is an X86 with MS Windows and Office on it. Around a decade ago Apple fought back with OSX on which Windows 7 (launched in 2009) was the first real answer. Meanwhile Apple switched to Intel, since IBM was not fast enough with the development of the POWER-processor – a huge operation, which seemed a one-time-only step for Apple at the time. SemiAccurate now speaks of Intel being replaced by ARM on Apple’s laptops.

A few weeks ago I asked Computer Science students if they knew ARM. Not even 1% had heard of it, but lots more knew there was a Samsung-chip in their smartphone. So what’s going on without us knowing it?

I’ll try to describe the market for a few key-years and then try to put the big names in it. There is a lot going on between i.e. Nvidia, Samsung, Texas Instruments and Imagination Technologies in the ARM-market, but I’ll leave that out of the story. Also not mentioned are the game-consoles and servers, but they did have big influences on the home-PC market.

In the picture at the right you see an idea of how fast the markets would have grown from a 2006 perspective. (Click on it for the full report). You see that the explosive growth of smartphones was not expected; the other detail is that the cloud also was not foreseen here.

After reading you understand why Nvidia focuses so much on HPC and mobile.

After IBM-compatible clones took over from Apple, Atari and ZX Spectrum, we just got used to that a PC is an X86 with MS Windows and Office on it. Around a decade ago Apple fought back with OSX on which Windows 7 (launched in 2009) was the first real answer. Meanwhile Apple switched to Intel, since IBM was not fast enough with the development of the POWER-processor – a huge operation, which seemed a one-time-only step for Apple at the time. SemiAccurate now speaks of Intel being replaced by ARM on Apple’s laptops.

A few weeks ago I asked Computer Science students if they knew ARM. Not even 1% had heard of it, but lots more knew there was a Samsung-chip in their smartphone. So what’s going on without us knowing it?

I’ll try to describe the market for a few key-years and then try to put the big names in it. There is a lot going on between i.e. Nvidia, Samsung, Texas Instruments and Imagination Technologies in the ARM-market, but I’ll leave that out of the story. Also not mentioned are the game-consoles and servers, but they did have big influences on the home-PC market.

In the picture at the right you see an idea of how fast the markets would have grown from a 2006 perspective. (Click on it for the full report). You see that the explosive growth of smartphones was not expected; the other detail is that the cloud also was not foreseen here.

After reading you understand why Nvidia focuses so much on HPC and mobile.

Continue reading “The history of the PC from 2000 – 2012”

Molybdenite and graphene to the helping hand?

The rabbit in “The Last Mimzy” was very special. What material was it made of?

You might have read about Molybdenite a few months ago. It is more efficient than Graphene which is in turn more efficient than good old Silicon, most notable energy-wise. Magazine ‘Nature’ had an article on it, which is summarised by Psychorg, so check it out. The claim it is 100 000 times more efficient than Silicon (and more efficient than the already very promising Graphene). This fan-free Silicon-replacer would be a major disaster for the cooling-industry!

But what would change for us? We are now on the edge to move to ARM (started by the smartphone- and tablet-industry), but is al this needed if the energy-costs drop to prices comparable to the costs to keep ice-cream cold on the North-Pole (20 years ago). This technique would give huge potential to Fusion-chips which now have a long way to go, to solve the heat-problem. But since it would take several years (and thus decades in hi-tech years) to get these chips on the market, no assumptions for market-share can be made based on what will happen in a few years.

Low-power ARM and Molybdenite X86

So this is European ARM (and licensees around the world) vs US Intel and AMD. The sarcastic joke among me and a few friends make, is that the fight of the past 20, 30 years between the economic US and EU is actually about who has the money to hire the most Asians, to develop the revolutionising devices. But as long as the US and EU have the feeling we are actually the equation of the competition as we are a massive 12% of the world-population, I won’t be behind the facts too much.

Since batteries don’t evolve as fast as processors, the power-problem needed to get slashed differently. A mayor reason for choosing ARM is that it uses less energy than X86, just like LCD/TFT is replaced by e-ink and organic LEDs and memory is non-volatile in portable devices.

In case we get a big reduction for CPU and memory, then the efficiency of the architecture is less of a problem. So then Intel and AMD can re-enter the market again, but then with much more powerful devices. Until then ARM-licensees like NVIDIA and ImTec have a better market if it comes to near-future devices. As I expected more tablet-manufacturers come up with docking-stations to replace the PC with a tablet. AMD and Intel have to keep surprising (and probably protect their market) the coming years to avoid losing from ARM. In other words: the coming years will be exciting how the consumer-market looks like and which companies deal in it. When thinking about these years, keep in mind what Windows XP has thought us: computers are fast enough for what average Joe wants to do with it. Hey, I use my laptop for OpenCL and the big screen, for the rest I use my mobile phone.

Hybrid chips

While I did not see it as a serious problem last year, the heat-problem for a GPU+CPU on one chip is quite a challenge. Waiting for the Molybdrenite or Graphene chips to mature will be like digging your own grave. Each step forward will result in two new products: one which is more power and/or heat efficient, and one which is more powerful. Since the competition from ARM-companies is heavy, the chances that the focus will be on more powerful Hybrid CPUs is bigger. As I stated above the losses are in the low-power area. Intel and AMD are very aware of this challenge.

Have you checked the differences between DirectX 10 and 11 games? Just check the discussions on the growing side of not needing to support DirectX 11, because 10 is good enough. Also here, the demand is higher to have the same graphics-quality for less money on more portable devices. Hybrid CPUs will eat the GPU-market for sure.

ARM-processors are hybrid processors. That’s all I tell, so you can -in combination with all stated above- formulate your own conclusions. I was very surprised NVIDIA started targeting ARM with their high-end GPUs, but was this a real bad idea?

Device vs Data-centre

Reduction of energy-costs for processors will reduce the head-less servers in the data-centre enormously. Internet costs loads of energy, both the transport and the servers – this will reduce the server-part of energy-consumption-sum with quite some factors. All positive news.

But if it all this becomes true, that chips don’t use much energy anymore and actually mobile internet and other radios take the most, what will happen to the cloud? Will you upload your video to get it processed or put your mobile in the sun to charge it while waiting a shorter period?

Current developments, future needs

We need arithmetic, media-processing and input/output; we all have that. We need long battery-life, a good screen and a fast way to input our data and commands; we get more of that each day. But heat-production is Silicon limits a lot, so we get the perfect electronic device the moment we can replace Silicon. Getting rid of the heat could give us square chips, with challenges like reinventing the socket and multi-multi-layerness.

So the question to you: is in The Last Nimzy sequel (you know, the movie with the molybdenite rabbit) a logo of Intel, AMD, ARM or another company found?

PathScale ENZO

My todo-list gets too large, because there seem so many things going on in GPGPU-world. Therefore the following article is not really complete, but I hope it gives you an idea of the product.

ENZO was presented as the alternative to CUDA and OpenCL. In that light I compared them to Intel Array Building Blocks a few weeks ago, not that their technique is comparable in a technical way. PathScale’s CTO mailed me and tried to explain what ENZO really is. This article consists mainly of what he (Mr. C. Bergström) told me. Any questions you have, I will make sure he receives them.


ENZO is a complete GPGPU solution and ecosystem of tools that provide full support for NVIDIA Tesla (kernel driver, runtime, assembler, code generation, front-end programming model and various other things to make a developer’s life easier).

Right now it supports the HMPP C/Fortran programming model. HMPP is an open standard jointly developed by CAPS and PathScale. I’ve mentioned HMPP before, as it can translate Fortran and C-code to OpenCL, CUDA and other languages. ENZO’s implementation differs from HMPP by using native front-ends and does hardware optimised code generation.

You can learn ENZO in 5 minutes if you’ve done any OpenMP-like programming in the past. For example the Fortran-code (sorry for the missing indenting):

!$hmpp simple codelet, target=TESLA1

subroutine add(n, a, b, c)
implicit none
integer, intent(in) :: n
real, intent(in) :: a(n), b(n)
real, intent(out) :: c(n)
integer :: i

do i=1, n
if (a(i) > 5) then
c(i) = 1
c(i) = 2

end subroutine add

subroutine test
integer, parameter :: n=10
real :: a(n), b(n), c(n)
integer :: i

do i=1, n
a(i) = i
b(i) = i

!$hmpp simple callsite
call add(n, a, b, c)
end subroutine test

This is somewhat different we know from OpenCL, mostly because we don’t need a specific kernel. This is because with just a few hints, the compiler does a lot for you. Like in OpenMP you tell the compiler with directives/pragmas which parts you want to be parallelised. More explanation can be found in the user manual [PDF]. You can try it out yourself for free if you have a Tesla-card; future versions of ENZO will support more architectures.

OpenCL Developer support by NVIDIA, AMD and Intel

There was some guy at Microsoft who understood IT very well while being a businessman: “Developers, developers, developers, developers!”. You saw it again in the mobile market and now with OpenCL. Normally I watch his yearly speech to see which product they have brought to their own ecosphere, but the developers-speech is one to watch over and over because he is so right about this! (I don’t recommend the house-remixes, because those stick in your head for weeks.)

Since OpenCL needs to be optimised for each platform, it is important for the companies that developers start developing for their platform first. StreamComputer is developing a few different Eclipse-plugins for OpenCL-development, so we were curious what was already there. Why not share all findings with you? I will keep this article updated – know this article does not cover which features are supported by each SDK.

Continue reading “OpenCL Developer support by NVIDIA, AMD and Intel”

Support matrix of Compute SDKs

Multi-Core Processors and the SDKs

The empty boxes tell IBM and ARM have a lot of influence. With NVIDIA’s current pace with introducing new products (hardware and CUDA), they could also take on ARM.

The matrix is restricted to current better-known compute technologies OpenCL, CUDA, Intel ArrBB, Pathscale ENZO, MS DirectCompute and AccelerEyes JacketLib.

X = All OSes, including MAC
D = Developer (private alpha or private beta)
P = Planned (as i.e. stated in Intel’s Q&A)
U = Unofficial (IBM’s OpenCL-SDK is promoted for their POWER-line)
L = Linux-only
W= Windows-only
? = Unknown if planned

Continue reading “Support matrix of Compute SDKs”

Disruptive Technologies

Steve Streeting tweeted a few weeks ago: “Remember, experts are always wrong about disruptive tech, because it disrupts what they’re experts in.”. I’m happy I evangelise and work with such a disruptive technology and it will take time until it is bypassed by other technologies. And that other technologies will be probably be source-to-OpenCL-source compilers. At StreamHPC we therefore keep track of all these pre-compilers continuously.

Steve’s tweet got me triggered, since the stability-vs-progression-balance make changes quite hard (we see it all around us). Another reason was heard during the opening-speech of engineering world 2011 about “the cloud”, with a statement which went something like: “80% of today’s IT will be replaced by standardised cloud-solutions”. Most probably true; today any manager could and should click his/her “data from A to B”-report instead of buying a “oh, that’s very specialised and difficult” solution. But at the other side companies try to let their business live as long as possible. It’s therefore an intriguing balance.

So I came up with the idea to play my own devil’s advocate and try to disrupt GPGPU. I think it’s important to see what can disrupt the current parallel-kernel-execution model of OpenCL, CUDA and the others.

Continue reading “Disruptive Technologies”

Engineering World 2011: OpenCL in the Cloud

[Dutch] Op het Sogeti Engineering World 2011 heb ik een presentatie gehouden over OpenCL in de cloud, in het Nederlands. Om the coolheidsfactor te verhogen heb ik gebruik gemaakt van Prezi als contrast met de standaard dia-show-presentaties. Het resultaat treft u hier beneden, maar kan helaas onmogelijk het hele verhaal vertellen dat ik gedeeld heb tijdens de presentatie. Wilt u ergens iets meer van afweten, vraag gewoon of zet een comment onderaan dit artikel. Ik luister naar mijn lezers via Twitter.

De presentation bestaat uit 4 delen: een introductie, uitleg van OpenCL, Mobiele apparaten en and Data-centres. De laatste twee vormen cloud-computing.

[English] At the Sogeti Engineering World 2011 I presented about OpenCL in the cloud, in Dutch. To increase the relative cool-factor I made sure I had the only Prezi-presentation between the standard sheet-flip presentations. The result you can see below, but can impossibly tell all I shared at the presentation. If you want to know more, just ask or put an comment under this article. I listen to my readers via Twitter.

The presentation has four parts: an introduction, explanation of OpenCL, Mobile devices and data centres. The last two form a segment cloud-computing I want to focus on.

Continue reading “Engineering World 2011: OpenCL in the Cloud”

Waiting for Mobile OpenCL – Q1 2011

About 5 months ago we started waiting for Mobile OpenCL. Meanwhile we had all the news around ARM on CES in January, and of course all those beta-programs made progress meanwhile. And after a year of having “support“, we actually want to see the words “SDK” and/or “driver“. So who’s leading? Ziilabs, ImTech, Vivante, Qualcomm, FreeScale or newcomer nVIDIA?

Mobile phone manufacturers could have a big problem with the low-level access to the GPU. While most software can be sandboxed in some form, OpenCL can crash the phone. But at the other side, if the program hasn’t taken down the developer’s test-phone, the chances are low it will take any other phone. And also there are more low-level access-points to the phone. So let’s check what has happened until now.

Note: this article will be updated if more news comes from MWC ’11.


For mobile devices Khronos has specified a profile, which is optimised for (ARM) phones: OpenCL Embedded Profile. Read on for the main differences (taken from a presentation by Nokia).

Main differences

  • Adapting code for embedded profile
  • Added macro __EMBEDDED_PROFILE__
  • CL_PLATFORM_PROFILE capabilityreturns the string EMBEDDED_PROFILE if only the embedded profile is supported
  • Online compiler is optional
  • No 64-bit integers
  • Reduced requirements for constant buffers, object allocation, constant argument count and local memory
  • Image & floating point support matches OpenGL ES 2.0 texturing
  • The extensions of full profile can be applied to embedded profile

Continue reading “Waiting for Mobile OpenCL – Q1 2011”

Benchmarks Q1 2011

February Benchmark Month. The idea is that you do at least one of the following benchmarks and put the results on the Khronos Forum. If you encounter any technical problems or you think the benchmark favours a certain brand, discuss it below this post. If I missed a benchmark, please put a comment under this post.

Since OpenCL works on all kinds of hardware, we can find out which is the fastest: Intel, AMD or NVIDIA.I don’t think all benchmarks are fit for IBM’s hardware, but I hope to see results of some IBM Cells too. If all goes well, I’ll show the first results of the fastest cards posted in April . Know that if the numbers are off too much, I might want to see further proof.

Happy benchmarking!

Continue reading “Benchmarks Q1 2011”

Felix Fernandez's "More, More, More"

SSEx, AVX, FMA and other extensions through OpenCL

Felix Fernandez's "More, More, More"This discussion is about a role OpenCL could play in a diversifying processor-market.

Both AMD and Intel have added parallel instruction-sets for their CPUs to accelerate in media-operations. Each time a new instruction-set comes out, code needs to be recompiled to make use of it. But what about support for older processors, without penalties? Intel had some troubles with how to get support for their AVX-instructions, and choose for both their own Array Building Blocks and OpenCL. What I want to discuss here are the possibilities available to make these things easier. Also I want to focus on if a general solution “OpenCL for any future extensions” could hold. I make an assumption that most extensions target mostly parallelisation with media in mind, most notable embedded GPUs on upcoming hybrid processors. I talked about this subject before in “The rise of the GPGPU compiler“.

Virtual machines

Java started in 1996 with the idea that end-point optimisation should be done by compiling intermediate code to the target-platform. The idea still holds and there are many possibilities to optimise intermediate code for SSE4/5, AVX, FMA, XOP, CLMUL and any other extension. Same is of course for dotNET.

Disadvantage is the device-models that are embedded in such compilers, which have not really take specialised instructions into account. So if I have a normal loop, I’m not sure it will work great on processors launched this year. C has pragmas for message-protocols, Java needs extensions. See Neal Gafter’s discussion about concurrent loops from 2006 for a nice discussion.

Smart Compilers

With for instance LLVM and Intel’s fast compilers, a lot can be done to get code optimised for all current processors. A real danger is that too many specialised processors will arrive the coming years; how to get maximum speed at all processors? We already have 32 and 64 bit; 128 bit is really not the only direction there is. Multi-target compilers can be something we should be getting used to, for which no standard is created for yet – only Apple has packed 32 and 64 bits together.

Years ago when CPUs started to have support for the multiply-add operation, a part of the compiled code had to be specially for this type of processor – giving a bigger binary. With any new type of extension, the binary gets bigger. It has to, else the potential of your processor will not be used and sales will drop in favour of cheaper chips. To sell software with support for each new extension, it takes time – in most cases reserved only for major releases.

Because not everybody has Gentoo (A Linux-distribution which compiles each piece of software targeting the user’s computer for maximum optimisation), it takes at least a year to get full use of the processor for most software.


So where does OpenCL fit in this picture? Virtual machines are optimised for threads and platform-targeting compilers are slow in distribution. Since drivers for CPUs are part of the OS-updating system, OpenCL-support in those drivers can get the new extensions utilised soon after market-introduction. The coming year more will be done for automatic optimisation for a broad range of processor-types – more about that later. This focus from the compiler to an OpenCL-library for handling optimal kernel-launching will get an optimum somewhere in between.

The coming time we will see OpenCL is indeed a more stable solution than for instance Intel’s Array Building Blocks, seen from the light of recompiling. If OpenCL can target all kinds of parallel extensions, it will offer the demanded flexibility the market demands in this diversifying processor-market. I used the word ‘demand’, because the consumer (being it an individual or company) who buys a new computer, wants his software to be faster, not potentially faster. What do you think?

Gedit OpenCL Syntax Highlighting

Update 17-06-2011: updated version of opencl.lang and added opencl_host.lang.

When learning a language it is nice to do it the hard way, so you take the default txt-file editor provided with your OS. No colours, not help, no nothing, pure hard-core learning. But in Linux-desktop Gnome the default editor Gedit is quite powerful without doing too much, has an official Windows-port and has a OSX Darwin-port. It took just a few hours to understand how highlighting in Gedit works and to get it implemented. I got some nice help from the work done at the cuda-highlighter by Hüseyin Temucin (for showing how to extend the c-highlighter the best way) and the VIM OpenCL-highlighter by Terence Ou (for all the reserved words). This is work in progress; I will tell about updates via Twitter.

Get it

Windows-users first need to download Gedit for Windows. OSX-folks can check Darwin-ports. Then the files opencl.lang (.cl-files) and opencl_host.lang (extension of c to highlight OpenCL-keywords) needs to be put in /usr/share/gtksourceview-2.0/language-specs/ (or in ~/.local/share/gtksourceview-2.0/language-specs/ for local usage only), or for Window in C:Program Filesgeditsharegtksourceview-2.0language-specs or for OSX in /Applications/ Make sure all Gedit-windows are closed so the configuration will be re-read, and then open a .cl-file with Gedit. If you have opened cl-files as C or Cuda, you have to set the highlighting to OpenCL manually (under view -> highlighting). For host-code you always need to set the highlighting manually to “OpenCL host”. You might want to associate cl-files with Gedit.





StreamHPC is working on Eclipse-support and I’ve understood also work is done for Netbeans-support. Let me know if there are more alternatives.

ImageJ and OpenCL

For a customer I’m writing a plugin for ImageJ, a toolkit for image-processing and analysis in Java. Rick Lentz has written an OpenCL-plugin using JOCL. In the tutorial step 1 is installing the great OS Ubuntu, but that would not be the fastest way to get it going, and since JOCL is multi-platform this step should be skippable. Furthermore I rewrote most of the code, so it is a little more convenient to use.

In this blog-post I’ll explain how to get it up and running within 10 minutes with the provided information.

Continue reading “ImageJ and OpenCL”