X86 Systems-on-a-Chip and GPGPU

Reading Time: 3 minutes

The System-on-a-chip (SoC) for X86 will be a revolution for GPGPU. Why? Because currently a big problem is transferring data from CPU-memory to GPU-memory and back, which will be solved with SoCs. Below you can read this architecture-target is very possible.

With AMD+ATI, Intel and its future high-end GPUs, and NVidia with the rumours around its X86-chips, we will certainly get changes in the field. If it is the way to go, what is probable?

  1. Get both CPU and high-end GPU on 1 chip, separated memory
  2. Techniques for sharing memory
  3. Translating OpenCL from and to C on the fly

ARM-processors are combined with GPUs a lot of times, but they don’t have current support for a common shader-languages (read: OpenCL) to make GPGPU in reach. We’ve asked ourselves many times why ARM & friends are involved in OpenCL since the beginning, but still don’t have any public and promoted driver-support. More on ARM, once there is more news on multi-core ARM-CPUs or OpenCL drivers.

1: One chip for everything

The biggest problem with split CPU/GPU-functionality is the bus-speed between the two is limited. The higher this speed, the more useful GPGPU can be. The highest speeds are possible when the signal does not have to leave the chip and there are no concessions made to the architecture of the graphics-card, in other words: glueing CPU and GPU together, but leave the memory-buses the same.

Currently there is Intel’s Nehalem and AMD’s Fusion, but they use DDR3 for both GPU and CPU; this will not really unlock the GPGPU-possibilities of high-end GPUs. It seems these products were designed with lower costs in mind.

But the chances high-end GPUs will be integrated on the CPU is rising. Going to 32nm gives room for more functionality, such as GPUs. Other choices can be smaller chips, more cores and integrating functionality of the north/south-bridge of the motherboard. If GPU-cores can be turned off when not working optimally when being tested in the factory (just like they do with mult-core CPUs), integrating high-end GPU-cores will even become a save choice.

Another way it could go is using optical buses between the GPU and CPU. It’s unknown if it will really see mainstream markets soon enough.

2: Shared memory – new style

Some levels of cache and all memory should be easy accessible by both types of cores. Why? Because eventually you want to switch between CPU- and GPU-instructions continuously. CUDA has a nice feature already, which keeps objects synchronised between CPU and GPU; one step further is leaving out the need of synchronising.

The problem is that video-memory is accessed more parallel to provide higher data-speeds (GDDR5), so we don’t want to limit the GPU by attaching them to slower (=lower bandwidth) DDR3. Doing it the other way would then be the best solution: giving CPUs direct access to GDDR. There is always a probable option that a new type of (replaceable) memory will be used, which has a dual-bus by design.

The hard part is memory-protection; since now more devices get control to memory, the overhead of controlling/arranging the spots can increase enormously and might need a separate core for it – just like the Cell-processor. This need-for-control is a reason I don’t expect access to each other memory before there will be a fast bus between GPU and CPU, since then the access to GDDR via the GPU’s memory-manager will be much faster and maybe even fast enough.

3: Grown up software

If software would be able to easily select devices and use the same code for each device, then we’ve made a giant step forwards. Software has always been one step behind hardware; so when you do not develop such techniques, you just have to wait a while.

Translating OpenCL into normal C and back will be possible in all kinds of ways, once there is more acceptance of (and thus demand for) GPGPU. AMD’s OpenCL-implementation for CPUs is also a way to merge the fields of CPU and GPU. It’s hard to tell how these techniques will merge, but it will certainly happen. Think of situations that some instructions will be sent to the GPU by the OS even when the (OpenCL) programmer did not think of it. Or do you expect to be an ARM-processor integrated in a near-future CPU, when you write an OpenCL-kernel now?

See our article on the bright future of GPGPU to read more about it.

What’s next?

In case this is the way it goes, there will be a lot possible for both OpenCL and CUDA – depending on market demands. Some possibilities will be discussed in an upcoming article about FPGAs, but also let me hear what you think about X86-SoCs. Comment or send an e-mail.

Difference between CUDA and OpenCL 2010

Reading Time: 5 minutes

THIS ARTICLE IS VERY OUTDATED AND NOW SIMPLY UNTRUE FOR CERTAIN PARTS! NEW ARTICLE COMING UP.

Most GPGPU-enthusiasts have heard of both OpenCL and CUDA. While there are more solutions, these have the most potential. Both techniques are very comparable like a BMW and a Mercedes, but there are some differences. Since the technologies will evolve, we’ll take a look at the differences again next year. We’ve discussed this difference in a with a focus on marketing earlier this year.

Disclaimer: we have a strong focus on OpenCL (but actually for reasons explained in this article).

Terminology

If you have seen kernels of OpenCL and CUDA, you see the biggest difference might be the prefix “cl_” or the prefix “cu_”, but there is also a difference in terminology.

Matt Harvey (developer of Cuda2OpenCL-translator Swan) has summed up the differences in a presentation “Experiences porting from CUDA to OpenCL” (PDF):

CUDA termOpenCL term
GPUDevice
MultiprocessorCompute Unit
Scalar coreProcessing element
Global memoryGlobal memory
Shared (per-block) memoryLocal memory
Local memory (automatic, or local)Private memory
kernelprogram
blockwork-group
threadwork item

As far as I know, the kernel-program is also called a kernel in OpenCL. Personally I like Cuda’s terms “thread” and “per-block memory” more. It is very clear CUDA targets the GPU only, while in OpenCL it an be any device.

Edit 2011-01-15: In a talk by Sami Rosendahl the differences are also discussed.

Speed-comparison

We would like to present you a benchmark between OpenCL and CUDA with full comparison, but we don’t have enough hardware in-house to do a full benchmark. Below information is what we’ve found on the net and a little bit based on our own experience.

On NVidia hardware, OpenCL is up to 10% slower (see Matt Harvey’s presentation); this is mainly because OpenCL is implemented on top of CUDA-architecture (this shouldn’t be a reason, but to say NVidia has put more energy in CUDA is just a wild guess also). On ATI 4000-series OpenCL is just slow, but gives very comparable to NVidia if compared to the 5000-series. The specialised streaming processors NVidia’s Tesla and AMD’s FireStream really bite each other, while the Playstation 3 unbelievably still wins on some tasks.

The architecture AMD/ATI-hardware is very different from NVidia’s and that’s why a kernel written with a specific brand or GPU in mind just performs better than a version which is not optimised. So if you do a benchmark, it really depends on which kernels you use for it. To be more precise: any benchmark can be written in favour of a specific architecture. Fine-tuning the software to work a maximum speed in current and future(!) hardware for different kinds of datasets is (still) a specialised task for that reason. This is also one of the current problems of GPGPU, but kernel-optimisers will get better.

If you like pictures, Hugh Merz comes to the rescue, who compared CUDA-FFT against FFTW (“the fastest FFT in the West”). The page is offline now, but you it was clear that the data-transfer from and to the GPU is a huge bottleneck and Hugh Merz was rather sceptical about GPU-computing in 2007. He extended his benchmark with the PS3 and a Tesla-s1070 and now you see bigger differences. Since CPUs go multi-multi-core, you cannot tell how big this gap will be in the future; but you can tell the gap will be bigger and CPUs will more and more be programmed like GPUs (massively parallel).

What we learn from this is 1) that different devices will improve if the demands are more clear, and 2) that it will be all about specialisation, since different manufacturers will hear different demands. The latest GPUs from AMD works much better with OpenCL, the next might beat all others in a many or only specific areas in 2011 – who knows? IBM’s Cell-processor is expected to enter the ring outside the home-brew PS3 render-farms, but with what specialised product? NVidia wants to enter high in the HPC-world, and they might even win it. ARM is developing multiple-core CPUs, but will it support OpenCL for a better FLOP/Watt than competitors?

It’s all about the choices manufacturers make, which way CUDA en OpenCL will develop.

Homogeneous vs Heterogeneous

For us the most important reason to have chosen for OpenCL, even if CUDA is more mature. While CUDA only targets NVidia’s GPUs (homogeneous), OpenCL can target any digital device that has an input and an output (very heterogeneous). AMD/ATI and Intel are both on the path of making architectures that are heterogeneous; just like Systems-on-a-Chip (SoCs) based on an ARM-architecture. Watch for our upcoming article about ARM & SoCs.

While I was searching for more information about this difference, I came across a blog-item by RogueWave, which claims something different. I think they switched Intel’s architectures with NVidia’s or he knew things were going to change. In the near future could bring us an x86-chip from NVidia. This will change a lot in the field, so more about this later. They already have an ARM-chip in their Tegra mobile processor, so NVidia/CUDA still has some big bullets.

Missing language-features

Like Java and .NET are very comparable, developers from both side know very well that their favourite feature is missing at the other camp. Most time such a feature is an external library, just built in. Or is it taste? Or even a stack of soapboxes?

OpenCL has:

  • Task-parallel execution mode (to be used on CPUs) – not needed on NVidia’s GPUs.

CUDA has unique features too:

  • FFT library – so in OpenCL you need to have your own kernels for it.
  • Atomic operations – which make double-write threads easier to implement.
  • Hardware texture interpolation – OpenCL has to fall back to a larger kernel or OpenGL.
  • Templating – in openCL you have to create new kernels for every data-type.

In short CUDA certainly has made a lot of things just easier for the developer, but OpenCL has its potential in support for more than just GPUs. All differences are based on this difference in focus-area.

I’m pretty sure this list is not complete at all, and only explains the type of differences. So please come to the LinkedIn GPGPU Users Group to discuss this.

Last words

THIS ARTICLE IS VERY OUTDATED AND NOW SIMPLY UNTRUE FOR CERTAIN PARTS! NEW ARTICLE COMING UP.

As it is done with more shared standards, there is no win and no gain to promote it. If you promote it, a lot of companies thank you, but the Rreturn-on-Investments is lower than when you have your own standard. So OpenCL is just used-as-it-is-available, while CUDA is highly promoted; for that reason more people invest in partnerships with NVidia to use CUDA instead of non-profit organisation Khronos. And eventually CUDA-drivers can be ported to IBM’s Cell-processors or to ARM, since it is very comparable to OpenCL. It really depends on the profit NVidia will make with such deals, so who can tell what will happen.

We still think OpenCL will win eventually on consumer-markets (desktop and mobile) because of support for more devices, but CUDA will stay a big player in professional and scientific markets because of the legacy software they are currently building up and the more friendly development-support. We hope they will both exist and help each other push forward, just like OpenGL vs DirectX, nVidia vs ATI, Europe vs the USA vs Asia, etc. Time will tell what features will eventually end up in each technology.

Update August 2012: due to higher demand StreamHPC is explicitly offering CUDA to OpenCL porting.

Does GPGPU have a bright future?

Reading Time: 2 minutes

This post has a focus towards programmers. The main question “should I invest in learning CUDA/OpenCL?”

Using the video-processor for parallel processing is actually possible since beginning 2006; you just had to know how to use the OpenGL Shader Language. Not long after that (end 2006) CUDA was introduced. A lot has happened after that, which resulted in the introduction of OpenCL in fall 2008. But actually the acceptance of OpenCL is pretty low. Many companies which do use it, want to have it as their own advantage and don’t tell the competition they just saved hundreds of thousands of Euros/Dollars because they could replace their compute-cluster with a single computer which cost them €10 000,- and a rewrite of the calculation-core of their software. Has it become a secret weapon?

This year a lot of effort will be put to integrate OpenCL within the existing programming languages (without all the thousands of tweak-options visible). Think about wizards around pre-built kernels and libraries. Next year everything will be around kernel-development (kernels are the programs which do the actual calculations on the graphics processor). The year after that, the peak is over and nobody knows it is built in their OS or programming-language. It’s just like current programmers use security-protocols, but don’t know what it actually is.

If I want to slide to the next page on modern mobile phones, I just make a call to a slide-function. A lot is happening when the function is called, such building up the next page in a separate part of memory, calling the GPU-functions to show the slide, possibly unloading the previous page. The same is with OpenCL; I want to calculate a FFT with specified precision and I don’t want to care on which device the calculation is done. The advantage of building blocks (like LEGO) is that we keeps the focus of development on the end-target, while we can tweak it later (if the customer has paid for this extra time). What’s a bright future if nobody knows it?

Has it become a secret weapon?

Yes and no. Companies want to brass about their achievements, but don’t want the competitors to go the same way and don’t want their customers to demand lower prices. AMD and NVidia are pushing OpenCL/CUDA, so it won’t stop growing in the market, but actually this pushing is the biggest growth in the market. NVidia does a good job with marketing their CUDA-platform.

What’s a bright future if nobody knows it?

Everything that has market-wide acceptation has a bright future. It might be replaced by a successor, but acceptance is the key. With acceptance there always will be a demand for (specialised) kernels to be integrated in building blocks.

We also have the new processors with 32+ cores, which actually need to be used; you know the problem with dual-core “support”.

Also the mobile market is growing rapidly. Once that is opened for OpenCL, there will be a huge growth in demand for accelerated software.

My advise: if high performance is very important for your current or future tasks, invest in learning how to write kernels (CUDA or OpenCL, whatever your favourite is). Use wrapper-libraries which make it easy for you, because once you’ve learned how to use the OpenCL-calls they are completely integrated in your favourite programming language.

All the members of the OpenCL working group 2010

Reading Time: 9 minutes

(If you’re searching for companies who offer OpenCL-products and services, please visit OpenCL:Pro)

You probably have heard AMD is on the OpenCL working group of Khronos; but there are many more and they possibly all have plans to use it. Here is an overview, so you can make your own conclusions about the future that lays ahead. Is your company on “the list”?

We’re specially interested in the less known companies, so most information is about the companies you and us possibly have not heard from before. We’ve made assumptions what the companies use OpenCL for, so we need your feedback if you think we’re wrong! Most of these companies have not openly written about their (future) accelerated products, so we had to make those guesses.

Disclaimer: All brand and product names are or may be trademarks of, and are used to identify products or services of, their respective owners.

Last updated 6-Oct-2010.

GPU Manufacturers

GPUs being the first products targeted by OpenCL, we blast away with a list of CPU-manufacturers. You might see some unknown companies and now know which companies missed the train; it is pretty clear why GPU-manufacturers have interest in OpenCL.
We skip the companies who have a GPU-stack built upon ARM-techology and only focus on pure GPU-manufacturers in this category.

AMD

We’ve already discussed the biggest fan of OpenCL several times. While having better GPU-cards than NVIDIA (arguable per quarter of the year), they put their bets completely on OpenCL. They even get credits like “AMD’s OpenCL” when compared with NVIDIA’s CUDA.

The end of 2010, beginning of 2011 they will ship their Fusion-product having a CPU and GPU on one chip. The first Fusion-chips will not have a high-end GPU because of heating problems, is told to PC-store employees.

NVIDIA

AMD’s biggest competitor with the very well marketed similar product CUDA. Currently they have the most specialised products in market for servers. While they put more energy in their own technology CUDA, it must be said that they have adopted OpenCL more than any other hardware vendor.

Intel

The biggest part of the CPU-market is for Intel en guess once, who has the biggest GPU-market in hands? Correct: onboard-GPUs are Intel’s speciality, but their high-end GPU Larrabee might once see the market. Just like AMD they have the technology (and products) to have an integrated CPU/GPU which will be very interesting for the upcoming OpenCL-market.

They are openly interested in OpenCL. Here is a nice interview which explains how a CPU-designer looks at GPU-designs.

Vivante

Vivante manufactures GPU-chips. They claim their OpenGL ES 2.0-compliant silicon footprint is the smallest on the market. There is a lot of talk about OpenGL Shader Language (OpenCL’s grandpa), for which their products are very well suited for. Quote: “The recent trend in graphics hardware has been to replace fixed functionality with programmability in areas that have grown exceedingly complex, such as vertex processing and fragment processing. The OpenGL® Shading Language was designed to allow application programmers to express the processing that occurs at those programmable points of the OpenGL pipeline. Independently compilable units written in this language are called shaders. A program is a set of shaders that are compiled and linked together.”

Takumi

Japanese corporation Takumi manufactures the GSHARK, a 2D/3D hardware accelerator. The focus is on shaders, like Vivante.

Imagination Technologies (ImTech)

From their homepage: >>POWERVR enables a powerful and flexible solution for all forms of multimedia processing, including 3D/2D/vector graphics and general purpose processing (GP-GPU) including image processing.

POWERVR’s unique tile-based, deferred rendering/shading architecture allows a very small area of a die to deliver higher performance and image quality at lower power consumption than all competing technologies. All major APIs are supported including OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX9/10.1 and OpenCL.<<

Currently all ARM-based OpenCL-capable devices have POWERVR-technology.

Toshiba

Like other huge Japanese everything-factories, you don’t know what else they make. Besides rice cookers they also make multimedia chips.

S3

Once they were big in the consumer-market of graphics cards, but S3 still exists as a more business-oriented manufacturer of graphics products.

CPU Manufacturers

We miss the Power Architecture, but IBM and Freescale are members of this group.

Intel

While AMD tries to make OpenCL available for the CPU, we have not heard of a similar product from Intel yet. They see a future for multi-core CPUs, as seen in these slides.

ARM

Most known for its same-named low-power processor, not supported by MS Windows. You can read below how many companies have a license on their technology. Together with POWERVR-technology they power all the embedded OpenCL devices of the coming year.

IBM

Currently they are most known for their Cell-processor (co-developed with Toshiba and Sony) and have a license to build PowerArchitecture-CPUs. The Cell has full OpenCL-support as first non-GPU. Older types of PS3s (without the latest firmware) ad IBM’s servers can use the power of OpenCL. End of June 2010 Khronos conformed their “Development Kit for Linux” for Power VMX and PowerXCell8i processors.

Freescale

Once a Motorola-division, they make lots of different CPUs. Besides ARM- and PowerArchitecure-based ones, they also have it’s own ‘Coldfire’. We cannot say for which architecture they are interested in OpenCL, but we really would like to hear something from them since they can open many markets for OpenCL.

Systems on a Chip (SoC)

While it is cool to have a GPU-card in your pc, more and more the Graphics-functionality is integrated onto a CPU. Especially in the mobile/embedded/gadget-market you’ll find such System-on-a-Chip solutions, which are actually all ARM- or PowerArchitecture based.

3DLABS (ZiiLabs)

Creators of embedded hardware with focus on handhelds. They have partners of Khronos for a long time, having built the first merchant OpenGL GPU, the GLINT 300SX. They have just released a multimedia-processor, which is an ARM-processor with pretty interesting graphic capabilities.

They have an “early access program for OpenCL” for their ZMS product line.

Movidia

On their Technology overview-page they imply they have flexible accelerators in their designs, which *could* in the future be controlled by OpenCL-kernels. They manufacture mobile GPUs-plus-loads-of-extras which are quite impressive.

Texas Instruments

Besides ARM-based processors they also have DSPs. We watch them, for which product they have OpenCL in mind.

Qualcomm

They might be most famous for their ARM-based Snapdragon-chipset. They have much more products, but we think they start with Snapdragon before building OpenCL in other products.

Apple

The Apple A4 powers their new products, the iPad. It becomes more and more clear Apple has really learned that you cannot rely on one supplier, after waiting for IBM’s G6. With OpenCL Apple can now make software that works on ARM, all kind of GPUs and CPUs.

Samsung

They make anything that is fed by batteries, so for that reason they should be in the “other” category: mobile phones, mp3-players, photo-cameras, camcorders, laptops, TVs, DVD-players and Bluray-players. All products where OpenCL can wield.

A good reason to make their own semi-conductors, ARM-based.

In the beginning of June 2010 they have launched their own Linux-based OS for mobiles: Bada.

Broadcom

Manufactures networking and communications ICs for data, voice, and video applications. They could use OpenCL for their mobile multimedia processors.

Seaweed

Since September acquired by Presagis. We cannot be sure they continue the OpenCL-business of Seaweed, but at least GPGPU is mentioned once.

Presagis is “the worldwide leader in embedded graphics solutions for mission-critical display applications. The company has provided human-machine interface (HMI) graphical modeling tools, drivers and devices for embedded systems for over 20 years. Presagis pioneered both the prototyping of display graphics and automatic code generation for embedded systems in the 1990s. Since then, code generated by its flagship HMI modeling products has been deployed to hundreds of aircraft worldwide and its software has been certified on over 30 major aircraft programs worldwide. Presagis is your trusted partner for reliable, high-performance embedded graphics products and services.”

ST Microelectronics

ST has many products: “Singapore Technologies Electronics is a leader in ICT. It has main businesses in Enterprise, Satellite Communications and Interactive Digital Media. It is divided into several Strategic Business Units consisting of Info-Comms, Info-Software, Training and Simulation, Electro-Optics, Large Scale Group, Satcom & Sensor Systems.”

We think they’ve shown interest for OpenCL for use with their Imaging processors. Together with Ericsson they have a joint-venture in de mobile market, ST-Ericsson.

Handheld Manufacturers

While most companies will find it hard to make OpenCL-business in the consumer-market, consumer-products of other companies make sales a little bit warmer.

Apple

At least the iPad and iPhone have hardware-capabilities of running OpenCL. It is expected that it will come available in the next major release of the iPhone-OS, iOS 4. We’re waiting for more news.

Nokia

The largest manufacturer of mobile phones from Finland has a lot of technology. Besides smartphones, possibly a netbook (in cooperation with Intel) they also have Symbian and the QT-library. Since a while QT has support for OpenCL. We think the support of OpenCL in programming languages (in a more high-level way) is very important. See these slides to read some insights of the company.

Motorola

They have consumer products like mobile phones and business products like networking. It is not clear where they are going to use OpenCL for, since they mostly use other companies’ technologies.

Super-computers

While OpenCL can revive old computers once upgraded with a new GPU, imagine what they can do with Super-computers.

IBM

IBM builds super-computers based on different technologies. With OpenCL-support for their Power VMX and PowerXCell8i processors, it is already possible to use OpenCL with IBM-hardware.

Fujitsu

They have many products, but they also make super-computers which use GPGPU.

Los Alamos National Laboratory

They build super-computers and really can use the extra power.

A job-post talks about heterogeneous architectures and OpenCL.

Petapath

Petapath, founded in 2008, focuses on delivering innovative hardware and software solutions into the high performance computing (HPC) and embedded markets. As can be seen from their homepage they build grids.

NVIDIA

As a newcomer in the super-computer business, they do very well having helped to build the #2 HPC. Many clusters are upgraded with their streaming-processors.

Other Hardware

We don’t know what they are actually doing with the technology, purely because they are to big to make assumptions.

GE

US-based electronics-giant General Electronics builds everything there is, fed by electricity and now also GPGPU-powered solutions as can be found on their GPGPU-page. They probably switched to CUDA.

ST-Ericsson

Ericsson together with ST they have a joint-venture in de mobile market, ST-Ericsson. Ericssson is big in (mobile) networking. It also builds mobile phones with Sony. It is unclear what the joint-venture wants to do with the technology, but it must be mobile.

Software Developers

While OpenCL is very close to hardware, we have to talk software too. Did anybody say there is a strict line between hardware and software?

Graphic Remedy

Builders of debugging software. You will hear later more from us about this company soon. See something about debugging in this presentation.

RapidMind

RapidMind provided a software product that aims to make it simpler for software developers to target multi-core processors and accelerators (GPUs). It was acquired by Intel in august 2009.

HI

Japanese corporation HI has a product MascotCapsule, which is a real-time 3D rendering engine (native library) that runs on embedded devices. We see names of other companies, except SMedia. If you’re not familiar with mobile GPUs, here you have a list.

This is another big hint, OpenCL will have a big future on mobile devices.

MascotCapsule V4 product specification

Operating
environment
CPUARM: ARM9 or above
Freescale: i.MX Series
Marvell: XScale
Qualcomm: MSM6280/6550/7200/7500 etc.
Renesas Technology: SH-Mobile etc.
Texas Instruments: OMAP
32-bit 150 MHz or above is recommended
(Capable of running without a floating-point hardware)
Code sizeApprox. 200 KB
Engine
work area
2 MB or more is recommended, including data load area
Note: The actual required work area varies depending on the content
3D hardware
accelerator
ATI: Imageon
Imagination Technologies: PowerVR MBX/MBX Lite/SGX
NVIDIA: GoForce
SMedia: Glamo
TAKUMI: GSHARK
Toshiba: T4G/T5G
Other OpenGL ES compliant 3D accelerators
OS/platformsBREW, iPhone, iPod touch, ITRON, Java, Linux, Symbian OS, Windows CE, Windows Mobile
3D authoring tools3ds Max 9.0/2008/2009/2010
Maya 8.5/2008/2009/2010
LightWave3D 7.5 or later
SOFTIMAGE|XSI 5.x/6.x/7.0

Codeplay

They are most famous for their compilers for the Playstation. They also make code-analysis software.

QNX

From their homepage: “Middleware, development tools, realtime operating systemsoftware and services for superior embedded design”. Their real-time OS in all kinds of embedded products and they might want to see ways to support specialised low-power chips.

RIM acquired QNX in april 2010.

Fixstars

Newcomer in the list 2010. Famous for their PS3-Linux and for their OpenCL-book. They also have FOXC, Fixstars OpenCL Cross Compiler. They have written one of the few books for OpenCL.

Kestrel Institute

http://www.kestrel.edu/ does not show anything GPGPU. We’ll probably hear from them when the next version of their Specware-product is finished.

Game Designers

Physics-calculations and AI are too demanding to do on a CPU. The game-industry keeps pushing the GPU-industry, but now on a different way than in the 90’s.

Electronic Arts

This game-studio builds loads and loads of games with impressive AI. See these slides to see what EA thinks GPGPU can do.

Activision Blizzard

Yes, they are one company now, so now they are together famous for best-selling hit “World of Warcraft”. Currently not much is known where they use OpenCL for, but probably the same as EA.

Thank you for your interest in this article

If you know more about OpenCL at these companies or job-posts, please let us know via comment or via e-mail.

We’ve made some assumptions about what these companies use OpenCL for – we need your feedback!

nVidia’s CUDA vs OpenCL Marketing

Reading Time: 3 minutes

Please read this article about Microsoft and OpenCL, before reading on. The requested benchmarking is done by the people of Unigine have some results on differences between the three big GPGPU-APIs: part I, part II and part III with some typical results.
The following read is not about the technical differences of the 3 APIs, but more about the reason behind why alternate APIs are being kept maintained while OpenCL does the trick. Please comment, if you think OpenCL doesn’t do the trick

As was described in the article, it is important for big companies (such as Microsoft and nVidia) to defend their market-share. This protection is not provided through an open standard like OpenCL. As it went with OpenGL – which was sort of replaced by DirectX to gain market-share for Windows – now nVidia does the same with CUDA. First we will sum up the many ways nVidia markets Cuda and then we discuss the situation.

The way nVidia wanted to play the game, was soon very clear: it wanted to market CUDA to be seen as the better alternative for OpenCL. And it is doing a very good job.

The best way to get better acceptance is giving away free information, such as articles and courses. See Cuda’s university courses as an example. Also sponsoring helps a lot, so the first main-stream oriented books about GPGPU discussed Cuda in the first place and interesting conferences were always supported by Cuda’s owner. Furthermore loads of money is put into an expensive webpage with very complete information about GPGPU there is to find. nVidia does give a choice, by also having implemented OpenCL in its drivers – it does just not have big pages on how-to-learn-OpenCL.

AMD – having spent their money on buying ATI, could not put this much effort in the “war” and had to go for OpenCL. You have to know, AMD has faster graphics-cards for lower prices than nVidia at the moment; so based on that, they could become the winners on GPGPU (if that was to only thing to do). Intel saw the light too late and is even postponing their High-end GPU, the Larrabee. The only help for them is that Apple demands to have OpenCL on nVidia’s drivers – but for how long? Apple does not want strict dependency on nVidia, since it i.e. also has a mobile market. But what if all Apple-developers create their applications on CUDA?

Most developers – helped by the money&support of nVidia – see that there is just little difference between Cuda and OpenCL and in case of a changing market they could translate their programs from one to the other. For now a demand to have a high-end videocard of nVidia can be rectified, a card which actually many people have or easily could buy within the budget of their current project. The difference between Cuda and OpenCL is comparable with C# and Java – the corporate tactics are also the same. Possibly nVidia will have better driver-support for Cuda than OpenCL and since Cuda does not work on AMD-cards, the conclusion is that Cuda is faster. There can then be a situation that AMD and Intel have to buy Cuda-patents, since OpenCL does not have the support.

We hope that OpenCL will stay the main-stream GPGPU-API, so the battle will be on hardware/drivers and support of higher-level programming-languages. We do really appreciate what nVidia already has done for the GPGPU-industry, but we hope they will solely embrace OpenCL for the sake of the long-term market-development.

What we left out of this discussion is Microsoft’s DirectCompute. It will be used by game-developers in the Windows-platform, which just need physics-calculations. When discussing the game-industry, we will tell more about Microsoft’s DirectX-extension.
Also come back and read our upcoming article about FPGAs + OpenCL, a combination from which we expect a lot.

OpenCL – the battle, part II

Reading Time: 3 minutes

Part II: the software-companies

It is very clear what’s at stake for the hardware-companies; we’ve also discussed the operating systems. But what should the software companies do? For companies which make i.e. encoding-software or databases it is very simple: support OpenCL or be years behind (what marketing can’t fix). For most other software there is a dependency on the programming language since OpenCL is a very specialised way of programming which (most times) is too different from in-house knowledge and can therefore be too expensive.

This article is somewhat brief, since most of the material will be discussed further in later-to-be-released articles.

Video-encoding and rendering

Why we had easily 60 frames per second in games but rendering an image of our own house would take minutes? You had the feeling there was a gap between worlds which needed to be closed. OpenGL/DirectX did a lot (also see our next article about OpenGL, OpenCL and DirectX), but was not able to help us in outside games. Apple did a lot to the desktop by integrating hardware-acceleration (later copied by Linux and Windows), but somehow GPU-processed results were not regarded professional and maybe seen more as an intermediate result (to see how it would look like).

Elemental Technologies was first with its H.264/AVC encoder; Nero and nVidia joined forces somewhat later. Both are based on CUDA and not OpenCL. Since rendering is close to what we already expected to come out of a GPU, we think this market is very soon recovering introducing the same product, based on OpenCL.

A few months ago nVidia has released its GPU-based ray-tracing engine, OptiX. On Youtube you can find the demo of VRay‘s accelerated ray-tracing engine.

We expect a lot of news from the graphics-world, since they already know how to program with shaders. A lot of artists will love the free speed-up, but it’s not breaking news this would be possible.

Programming languages

C and C++ are official bindings of OpenCL. And thereby Objective-C (used on i.e. on the iPhone) has native support.

As we described last week, we think that Oracle/Sun is taking OpenCL more serious now, Several wrappers exist for Java, but native support is missing; we would suggest writing the OpenCL-part in C or C++ when using Java, even if this breaks the beauty of the multi-platform-language.

It is very clear Microsoft had a better view with being an early adopter with Visual Studio integration trough profilers (created by AMD and nVidia). You already see higher-level implementations, such as the C#-toolkit OpenTK has included support that goes beyond the default dll-bindings. Also here programming parts in native C would be best.

Python is famous for its endless wrappers around anything, so it was to be expected to find an OpenCL-binding. Python has always been the safe choice for scientific programming, because of its enthusiastic community.

A binding for OpenCL in languages like PHP and Perl is completely absent. Most times this is not a problem, as C-libraries can easily be called.

RapidMind had en product which provided higher-level programming on the GPU, but after its acquisition by Intel, we don’t see the product any more. So we can conclude we just have to wait for native support in other languages than C, C++ and objective-C, to have better support.

Databases

We will cover databases later, when projects are more mature. In short, currently is investigated how GPUs can do, what SUN’s UltraSparcs already did. Since the memory-bandwidth is only great when using the onboard-memory, this is not as promising as it looks. Index-searches can be sped up, but these are not the real bottle-neck in database-performance. We think it is very important to invest in OpenCL-research in this competing market.

Operating Systems

Apple has had good GPU-support since OSX and therefore a good understanding of graphic-cards. Apple started the project OSX and already has updated several core libraries with OpenCL.

Microsoft has built DirectCompute in DirectX 11. This is OpenCL-technology put in a MS-jacket, as we’ve seen the company do many times before. Coming up is an article which discusses the differences.

Linux (Desktop and HPC) have not great support for OpenCL, but it works well enough to have most large OpenCL/CUDA-upgraded clusters on its name. Due to the flexibility of the OS and the strong competition between nVidia and AMD, a lot of research is done. Nevertheless there are no core-libraries in Linux which support OpenCL. We expect i.e. visualisation-libraries to support OpenCL this year.

Mathematical software

Matlab, Octave, Mathematica, R and Maple will all have a big advantage by using the GPU. Matlab has the most support by external libraries: CUDA, Jacket, gpuMat, etc. Mathematica will soon release a CUDA-version of Mathematica. R is still in discussion, Octave has a few partial/abandoned implementations of some libraries; since there is a lot of money to make by selling these products we can only expect full open-source implementations. Maple refers to its external call routines, so we still have to wait a while until we can have GPU-support.

Conclusion

This short overview gives an idea of where to expect to find OpenCL-powered solutions. When we find more markets the coming weeks, we’ll update this post.

Clear winners cannot be pinpointed, since the door has just opened. Maybe Nero, since it will now sell more of its encoding-products to owners of nVidia GPUs.

SUN jumping on the OpenCL-train?

Reading Time: 2 minutes

Edit (27 May 2010): until now Oracle/SUN has not shown anything that would validate this rumour, and the job-posting is not there anymore. Follow us on Twitter to be the first to know if Oracle/SUN will have better support for GPGPU for Solaris and/or Java and/or its hardware.

Job Description:The Power to change your world begins with your work at Sun!
This is a software staff engineering position requiring the ability to design, test, implement and maintain innovative and advanced graphics software. The person in this role is expected to identify areas for improvement and modification of Sun’s platform products and contribute to Sun’s overall product strategy. This person will work closely with others within the team and, as required, across teams to accomplish project objectives. May assume a leadership role in projects, including such activities as leading projects, participator in product planning and technology evaluation and related activities. May use technical leadership and influence to negotiate product design features or applications, both internally, and with open source groups as needed.
Requirements:* Excellent problem solving, critical thinking, and communication
skills
* Excellent knowledge of the C/C++ and Java programming languages
* Thorough working knowledge of 3D graphics, GPU architecture, and
3D APIs, such as OpenGL & Direct3D
* Thorough working knowledge of shader-level languages such as
GLSL, HLSL, and/or Cg
* Experience designing cross-platform, public APIs for developers
(Windows/MacOS/Linux)
* Experience with multi-threaded programming and debugging techniques
* Experience with operating systems level engineering
* Experience with performance profiling, analysis, and optimization
Education and Experience:Univ degree in computer sciene or engineering plus 5 years direct experience

In other words, a specialist in everything graphics-cards and Java, in a completely new area. Since OpenGL is a already known area not needing such a specialist for, there is a very good chance it will target OpenCL. A good choice.

Sun has all reasons to jump the train with Java, since Microsoft is already integrating loads of OpenCL-tools into its Visual Studio product (created by AMD and nVidia). Java has still more than 3 times the market share than C#, but with this late jump the gap will be closer in favour of Microsoft. Remember C# can easily call C-functions (which is the language OpenCL is written in); Java has a far more difficult task when it comes to calling C-functions without hazards, which is sort of implemented here and here. If OpenCL would not be implemented by Java, C, C++ and C# will make a jump a hole in Java’s share.

Besides Java, also Sun’s super-multi-threaded Sparc-servers will be in trouble since the graphic-cards of nVidia and AMD are now serious competitors. There is no official support of Sparc-processors for OpenCL, wile AMD has included X86-support and IBM PowerPC-support (also working on Cell) a few months ago.

Then we have the databases Oracle and MySQL; Oracle depends on Java a lot. While we see experiments speeding up competitor PostgreSQL with GPU-power, Oracle might become the slow turtle in a GPU-ruled database-world. MySQL has the same development-speed and also “bleeding edge” releases, but Oracle might slow down its official support. Expect Microsoft to have SQL-server fully loaded in its next major release.

If Oracle/Sun jumped the train today, expect no OpenCL-products from Oracle/Sun before Q2-2011.

OpenCL – the battle, part I

Reading Time: 4 minutes

Part I: the Hardware-companies and Operating Systems

(Part II will be about programming languages and software-companies, part III about the gaming-industry)

OpenCL is the new, but already de-facto standard of stream-computing; but how it got there so fast is somewhat strange. A few years ago there were many companies and research-groups seeing the power of using the GPU, such as:

And the fight is really not over, since we are talking about a big shift in the super-computing industry. Just think of IBM BlueGene, which will lose lots of market to nVidia and AMD. Or Intel, who hasn’t acquired a GPU-creator as AMD did. Who had expected the market to change this rigorous? If we’re honest, we could have seen it coming (when looking at the turbulence around PhysX and Havok), but “normally” this new techniques would be introduced slowly.

The fight is about market-shares. For operating-systems, the user wants to have their movies encoded in 20 minutes just like their neighbour. For HPC-computing, since clusters can be updated for a far lower price than was possible with the old-fashioned way; here it is mostly between Linux HPC and windows HPC (which still has a very small market-share), but also database-engines which rely on high-performance hardware/software.
The most to gain is in the processor-market. The extremely large consumer-market is declining since 2004, since most users do not need more than a netbook and have bought a separate gaming-computer for the more demanding games. We don’t only see Intel and AMD anymore, but IBM’s powerful Cell- en Power-processors, very power-efficient ARM-processors, etc. Now OpenCL could make it more interesting to buy an average processor and a good graphics-card, Intel (and AMD) have no choice then to take the battle with nVidia.

Background: Why Apple made OpenCL

Short answer: pure frustration. All those different implementations would or get a share or fight for being named the standard; Apple wanted to bet on the right horse and therefore took the lead in creating an open standard. Money would be made by updating software and selling more hardware. For that reason Apple’s close partners Intel and nVidia were easily motivated to help developing the standard. Currently Apple’s only (public) reasons for giving away such an expensive and specialised project is publicity and to be ahead of the competition. Since it will not be a core-business of Apple, it does not need to stay in lead, but which companies do?

Acquisitions, acquisition, acquisitions

No time to lose for the big companies, so they must get the knowledge in-house as soon as possible. Below are some examples.

  • Microsoft: Interactive Supercomputing (22-Sept-2009): made Star-P, software which allowed users to perform scientific, engineering or analytical computation on array or matrix-based data to use parallel architectures such as multi-core workstations, multi-processor systems, distributed memory clusters or utility/cloud-based environments. This is completely in the field of OpenCL, which Microsoft needs to strengthen its products as Apple already did, such as SQL-server and Windows HPC.
  • nVidia: Ageia technologies (22-Febr-2008): made specialized PC-cards and software for calculating complicated physics in games. They made the first commercial product aiming at the masses (gamers). PhysX-code could by integrated in nVidia-drivers to be used with modern nVidia-GPUs.
  • AMD: ATI (24-juli-2006): graphics chip specialist. Although the price was too high, it saved AMD from being bought out by Intel and even stay ahead (if they had kept running).
  • Intel: Havok (17-Sept-2007): builds games-tools, such as a physics-engine. After Ageia was captured, the only good company out there to buy; AMD was too late, which spent all its money on ATI. Wind River (4-June-2009): a company providing embedded systems, development tools for embedded systems, middleware, and other types of software. Also read this interesting article. Cilk (31-July-2009): offers parallel extensions that are tightly tied into a compiler. RapidMind (19-Aug-2009): created a high-level language Sh, which had an OpenCL-backend. Intel has a lead in CPU-compilers, which it wants to broaden to multi-core- and GPU-compilers. Intel discovered it was in the group of “old fashioned compiler-builders” and had lots to learn in a short time.

If you know more acquisitions of interest, please let us know.

Winners

Apple, Intel and NVidia are the winners for 2009 and 2010. They have currently the most knowledge in house and have their marketing-machine running. NVidia has the best insight for new markets.

Microsoft and Game-developers are second; they took the first train by joining the OpenCL-consortium and taking it very serious. At the end of 2010 Microsoft will be at Apple’s level of expertise, so we will see then who has the best novelties. The game-developers, of which most already have experience with physics-calculations, all had a second chance when they had misjudged the Physics-engines. More on gaming in part III.

AMD is currently actually a big loser, since it does not seem to take it all seriously enough. But AMD can afford to be late, since OpenCL makes it easy to switch. We hope the best for AMD, since it has the technology of both CPU and GPU, and many years of experience in both fields. More on the competition between marketing-monster nVidia and silent AMD will be discussed in a blog-item, next week.

Another possible loser is Linux, which has lots to lose on HPC-market; OpenBSD-based Apple and Windows HPC can actually win market-share now. Expect most from hardware-manufacturers Intel, AMD and nVidia to give code to the community, but also from universities who do lots of research on the ever-flexible Linux. At the end it all depends on OpenCL-adaptation of (Linux-specific) programming-languages, which will be discussed in part II.

ARM is a member of the OpenCL-group but does not seem to invest in it; they seem to target another growing market: the low-power mobile devices. We will write on OpenCL and the mobile market later and why ARM currently can be relaxed about OpenCL.

We hope you have more insights in this new market; please contact us for more specific information and feel free to give your comments. Please stay tuned for part II and III, which will be released the next few weeks.

Starting up

Reading Time: 1 minute

The first steps have been made to create this company. The target is clear: serve customers in the more specialised fields (more than the usual office-applications) where mathematics and metrics are of importance. Most of our customers will have the expertise in a specialised field such as mathematics, physics, chemistry, economics and need extra power to use/sell their modelling-software.

Hardest target in the current market is finding those customers; since we can provide super-computers and -techniques at a tenth of the price it will be easier than our colleagues in the more expensive segment. Nevertheless the field is and will stay a specialised one.

Target-date is March 2010 to be open for customers. Hard work, but that is what we are made of.