Learning both OpenCL and CUDA

Posted by Vincent Hindriksen on 23 September 2010 with 2 Comments

Be sure to read Taking on OpenCL where I’ve put my latest insights – also for CUDA.

The two¹ “camps” OpenCL and CUDA both claim you should first learn their language first, after which the other would be easy to learn. I’m from the OpenCL-camp, so I say you should learn OpenCL first, but with a strong emphasis on hardware-architecture understanding. If I had chosen for CUDA I would have said the opposite, so in other words it does not matter which you do first. But psychology tells us that you probably like the first language more since there is where you discovered the magic; also most people do not like to learn a second language which is much alike and does not add a real difference. Most programmers just want to get the job done and both camps know that. Be aware of that.

NVIDIA is very good in marketing their products, AMD has – to say it modest – a lower budget for GPGPU-marketing. As a programmer you should be aware of this difference.

The possibilities of OpenCL are larger than those of CUDA, because of task-parallel programming and support for far more different architectures. At the other side CUDA is much more user-friendly and has a lot of convenience built-in.

Continue reading “Learning both OpenCL and CUDA” →

Khronos OpenCL presentation at SIGGRAPH 2010

Posted by Vincent Hindriksen on 9 September 2010 with 1 Comment

Here you find the videos uploaded by Khronos of their presentation about OpenCL. I added the time-line, so you can scroll to the more interesting parts easily. The presentation by Ofer Rosenberg of Intel and Cliff Woolly of NVIDIA were not uploaded (yet). Please note that for non-American people the speech of Affi Munchie is hard to hear; luckily his sheets explain most.

For the first two presentations the sheets can be downloaded from the Khronos-website. The time-line has the sheet-numbers mentioned.

0:00 [sheet 1] Presentation by the president of Khronos and chair of the session: Neill Trevett of NVIDIA.
0:06 [sheet 2] Welcome and a quick overview
1:12 [sheet 3] The prizes for the attendees (not us, online viewers)
1:40 [4] Overview of all members of Khronos. Khronos does not only take care of OpenCL but also the more famous OpenGL and projects like Collada.
2:26 [5] Processor Parallelism. CPUs are getting more parallel and GPUs more programmable. The overlapping area is called Heteregenous Computing and there is where OpenCL pops up.
3:10 [6] OpenCL timeline from version 1.0 to 1.1.
4:44 [7] OpenCL workinggroup with only 30 logos. He mentions missing logos like the one from Apple.
5:18 [8] The Visual Computing Ecosystem, where OpenCL interoperability with other standards are shown. The talk is not complete, so I don;t know if he talks about DirectX.

Continue reading “Khronos OpenCL presentation at SIGGRAPH 2010” →

To think about: an OpenCL Wizard

Posted by Vincent Hindriksen on 2 September 2010

A part of reaction on my earlier post was “VB Programmers span the whole range from hobbyists to scientists and professions. There is no reason that VB programmers should be left out of the loop in GPU programming. (…) CUDA and OpenCL are fundamentally lower level languages that do not have the same level of user-friendlyness that VB gives, and so they require a bit more care to program. (…)“. By selecting parts, I probably put the emphasis very wrong, but it brought me an idea: “how a Visual Basic Wizzard for OpenCL” would look like.

It is actually very interesting to think how to get GPGPU-power to a programmer who’s used to “and then and then and then”; how do you get parallel processing into the mind of an iterative thinker? Simplification is not easy!

By thinking about an OpenCL-wizard, you start to understand the bottlenecks of, and need for OpenCL initialisation.

Actually it could better be built into an existing framework like OpenMP (or something alike in Python, Java, etc) or the IDE could give hints that the standard-function could be replaced by a GPGPU-version. But we just want a wizard which generates “My First OpenCL Program”, which puts a smile on the face of programmers who use their mouse a lot.

Continue reading “To think about: an OpenCL Wizard” →

OpenCL 1.1 changes compared to 1.0

Posted by Vincent Hindriksen on 23 June 2010 with 1 Comment

This blog-entry is of interest for you, if you don’t want to read the whole new specifications [PDF] for OpenCL 1.1, but just want an overview of the most important changes differences with 1.0.

The news-release sums up the changes for 1.1 like this:

New datatypes including 3-component vectors and additional formats
Handling command from multiple hosts and processing buffers across multiple devices
Operations on regions of a buffer including read, write and copy of 1D, 2D and 3D rectangular regions
Enhanced use of events to drive and control command execution
Additional OpenCL C built-in functions such as integer clamp, shuffle and asynchronous strided copies
Improved OpenGL interoperability through efficient sharing of images and buffers by linking OpenCL and OpenGL events.

Furthermore we can read the update is completely backwards-compatible with version 1.0. The obvious macro‘s CL_VERSION_1_0 and CL_VERSION_1_1 are added to handle the versioning, but what’s more? This blog-post discusses most changes and with some subjective opinions added to it.

Additional Formats

3-component vectors

We only had 2-, 4-, 8 or 16-component vectors, but not 3 which actually was somewhat strange. The functions vload3, vload_half3, vloada_half3 and vstore3, vstore_half3, vstorea_half3 have been added to the family. Watch out, that for the half-functions the offset is calculated somewhat different compared to the even-sized vectors. In version 1.0 you could have chosen for a 4-component vector when using a lot of calculations, or a struct. If you see the new function vec_step below, it seems that it is not more memory-efficient to use this vector instead of a 4-component vector.

RGB with Padding

We have support CL_RGB, CL_RGBa (= RGB with an alpha-channel) and now also RGBx (with padding-channel). The same variants are there for CL_R and CL_RG. Good for graphics-programmers, or for easier reading of 32 bpp BMPs.

Cloud Computing / Multi-user Environments

The support for different hosts gives possibilities for cloud-computing. Side-note: cloud-computing is another word for multi-user environments, with some promotion for big data-centres. All API-functions except clSetKernelArg are thread-safe now; but only when kernels are not shared between hosts; see appendix A.2 for more information. The important part is that you think clearly about how to design your software if you now can assume others can take your resources now too. OpenCL already needed a lot of planning when claiming and releasing resources, so you’re probably already mastering it; now just check more often how much resources are available.

Region-specific Operations

Regions make it possible to split a big buffers to form a queue without having to keeping track of dimensions and offsets during operations, run from host. See clCreateSubBuffer for more information. A real convenience, but watch out when writing to overlapping buffers. The functions clEnqueueCopyBufferRec, clEnqueueReadBufferRect en clEnqueueWriteBufferRect helps synchronising commands to copy, read to or write from a region.

Enhanced Control-room

My favourite description of the host is “the control-room”, since you are not over there on the device but in Houston. The more control and information, the better. The new events are clSetMemObjectDestructorCallback, clCreateUserEvent, clSetUserEventStatus and clSetEventCallback. The first event-listener lets you know when resources a freed, so you can keep track. User-events can be put in the event_wait_list in various functions just like the built-in events; the function will start when all events are CL_COMPLETE. With clSetEventCallback immediate actions-on-events can be programmed; combined with the user-events the programmer got some powerful tools. See the example at clSetUserEventStatus for how to use the user-events.

OpenGL events and Direct3D support

The function clCreateEventFromGLsyncKHR links a CL-event to a GL-event by just giving the name of the OpenGL-event. See gl_sharing for more info.

OpenCL has now support for Direct3D 10, which is great! This might also be a good step to make DirectCompute lighter. See cl_khr_d3d10_sharing for more info. Welcome DirectX-developers! One favour: please be aware that DirectX works on Windows only, not on Apple OSX or iOS, (Embedded) Linux or Symbian. If you use clean calls, it will be more easy to port to other platforms.

Other New Kernel-functions

The following new functions were added to the kernel:

get_global_offset: returns the offset of the enqueued kernels.
minmag and maxmag: returns the argument with the minimum or maximum distance to zero, falls back to fmin and fmax if distance is equal or an argument is NaN. Example: maxmag(-5, 3) = -5, minmag(-3, 3) = -3.
clamp: returns boundary-values if the given number is not between the boundaries.
vec_step: returns the number of elements in a scalar or a vector. A scalar returns 1, a vector 2, 4, 8 or 16. If the size is 3, the function returns 4.
shuffle and shuffle2: shuffles one or two vectors given another vector with the indices of the new order. Indeed plain old permutations.
async_workgroup_strided_copy: buffers between global and local memory on the device. When used correctly, this can overcome some of the hassle when you need to work on global memory objects, but need more speed. Correct usage is described in the reference.

The functions min and max now also work component-wise with a vector as first argument and a scalar as second. Min({2, 4, 6, 8}, 5) will give {2, 4, 5, 5}.

Conclusion

While the many revisions of OpenCL 1.0 were really minor and not a lot attention was paid to them, 1.1 is a big step forward. If you see what has been done to multi-user environments, NVidia and AMD have a lot of work to do with their drivers.

You can read in revision 33 there has been some heated discussion and there was pressure on the decision:

>>Should this extension be KHR or EXT?
PROPOSED: KHR. If this extension is to be approved by Khronos then it should be KHR, otherwise EXT. Not all platforms can support this extension, but that is also true of OpenGL interop.
RESOLVED: KHR.<<

The part “not all platforms” is very politically written down, since exactly one platform supports this specific extension. I have seen too many of these pressured discussions and I hope Khronos is stronger than i.e. ISO and OpenCL will remain as open as OpenGL.

I’m very happy with the new version, since there is more control with loads of extra events, now multiple hosts are possible, and the forgotten 3-component vector was added. Now let me know in the comments what you think of the new version.

By the way, not all is new. Deprecated are clSetCommandQueueProperty and the __ROUNDING_MODE__ macro.

Difference between CUDA and OpenCL 2010

Posted by Vincent Hindriksen on 22 April 2010 with 11 Comments

THIS ARTICLE IS VERY OUTDATED AND NOW SIMPLY UNTRUE FOR CERTAIN PARTS! NEW ARTICLE COMING UP.

Most GPGPU-enthusiasts have heard of both OpenCL and CUDA. While there are more solutions, these have the most potential. Both techniques are very comparable like a BMW and a Mercedes, but there are some differences. Since the technologies will evolve, we’ll take a look at the differences again next year. We’ve discussed this difference in a with a focus on marketing earlier this year.

Disclaimer: we have a strong focus on OpenCL (but actually for reasons explained in this article).

Terminology

If you have seen kernels of OpenCL and CUDA, you see the biggest difference might be the prefix “cl_” or the prefix “cu_”, but there is also a difference in terminology.

Matt Harvey (developer of Cuda2OpenCL-translator Swan) has summed up the differences in a presentation “Experiences porting from CUDA to OpenCL” (PDF):

CUDA term	OpenCL term
GPU	Device
Multiprocessor	Compute Unit
Scalar core	Processing element
Global memory	Global memory
Shared (per-block) memory	Local memory
Local memory (automatic, or local)	Private memory
kernel	program
block	work-group
thread	work item

As far as I know, the kernel-program is also called a kernel in OpenCL. Personally I like Cuda’s terms “thread” and “per-block memory” more. It is very clear CUDA targets the GPU only, while in OpenCL it an be any device.

Edit 2011-01-15: In a talk by Sami Rosendahl the differences are also discussed.

Speed-comparison

We would like to present you a benchmark between OpenCL and CUDA with full comparison, but we don’t have enough hardware in-house to do a full benchmark. Below information is what we’ve found on the net and a little bit based on our own experience.

On NVidia hardware, OpenCL is up to 10% slower (see Matt Harvey’s presentation); this is mainly because OpenCL is implemented on top of CUDA-architecture (this shouldn’t be a reason, but to say NVidia has put more energy in CUDA is just a wild guess also). On ATI 4000-series OpenCL is just slow, but gives very comparable to NVidia if compared to the 5000-series. The specialised streaming processors NVidia’s Tesla and AMD’s FireStream really bite each other, while the Playstation 3 unbelievably still wins on some tasks.

The architecture AMD/ATI-hardware is very different from NVidia’s and that’s why a kernel written with a specific brand or GPU in mind just performs better than a version which is not optimised. So if you do a benchmark, it really depends on which kernels you use for it. To be more precise: any benchmark can be written in favour of a specific architecture. Fine-tuning the software to work a maximum speed in current and future(!) hardware for different kinds of datasets is (still) a specialised task for that reason. This is also one of the current problems of GPGPU, but kernel-optimisers will get better.

If you like pictures, Hugh Merz comes to the rescue, who compared CUDA-FFT against FFTW (“the fastest FFT in the West”). The page is offline now, but you it was clear that the data-transfer from and to the GPU is a huge bottleneck and Hugh Merz was rather sceptical about GPU-computing in 2007. He extended his benchmark with the PS3 and a Tesla-s1070 and now you see bigger differences. Since CPUs go multi-multi-core, you cannot tell how big this gap will be in the future; but you can tell the gap will be bigger and CPUs will more and more be programmed like GPUs (massively parallel).

What we learn from this is 1) that different devices will improve if the demands are more clear, and 2) that it will be all about specialisation, since different manufacturers will hear different demands. The latest GPUs from AMD works much better with OpenCL, the next might beat all others in a many or only specific areas in 2011 – who knows? IBM’s Cell-processor is expected to enter the ring outside the home-brew PS3 render-farms, but with what specialised product? NVidia wants to enter high in the HPC-world, and they might even win it. ARM is developing multiple-core CPUs, but will it support OpenCL for a better FLOP/Watt than competitors?

It’s all about the choices manufacturers make, which way CUDA en OpenCL will develop.

Homogeneous vs Heterogeneous

For us the most important reason to have chosen for OpenCL, even if CUDA is more mature. While CUDA only targets NVidia’s GPUs (homogeneous), OpenCL can target any digital device that has an input and an output (very heterogeneous). AMD/ATI and Intel are both on the path of making architectures that are heterogeneous; just like Systems-on-a-Chip (SoCs) based on an ARM-architecture. Watch for our upcoming article about ARM & SoCs.

While I was searching for more information about this difference, I came across a blog-item by RogueWave, which claims something different. I think they switched Intel’s architectures with NVidia’s or he knew things were going to change. In the near future could bring us an x86-chip from NVidia. This will change a lot in the field, so more about this later. They already have an ARM-chip in their Tegra mobile processor, so NVidia/CUDA still has some big bullets.

Missing language-features

Like Java and .NET are very comparable, developers from both side know very well that their favourite feature is missing at the other camp. Most time such a feature is an external library, just built in. Or is it taste? Or even a stack of soapboxes?

OpenCL has:

Task-parallel execution mode (to be used on CPUs) – not needed on NVidia’s GPUs.

CUDA has unique features too:

FFT library – so in OpenCL you need to have your own kernels for it.
~~Atomic operations – which make double-write threads easier to implement.~~
Hardware texture interpolation – OpenCL has to fall back to a larger kernel or OpenGL.
Templating – in openCL you have to create new kernels for every data-type.

In short CUDA certainly has made a lot of things just easier for the developer, but OpenCL has its potential in support for more than just GPUs. All differences are based on this difference in focus-area.

I’m pretty sure this list is not complete at all, and only explains the type of differences. So please come to the LinkedIn GPGPU Users Group to discuss this.

Last words

THIS ARTICLE IS VERY OUTDATED AND NOW SIMPLY UNTRUE FOR CERTAIN PARTS! NEW ARTICLE COMING UP.

As it is done with more shared standards, there is no win and no gain to promote it. If you promote it, a lot of companies thank you, but the Rreturn-on-Investments is lower than when you have your own standard. So OpenCL is just used-as-it-is-available, while CUDA is highly promoted; for that reason more people invest in partnerships with NVidia to use CUDA instead of non-profit organisation Khronos. And eventually CUDA-drivers can be ported to IBM’s Cell-processors or to ARM, since it is very comparable to OpenCL. It really depends on the profit NVidia will make with such deals, so who can tell what will happen.

We still think OpenCL will win eventually on consumer-markets (desktop and mobile) because of support for more devices, but CUDA will stay a big player in professional and scientific markets because of the legacy software they are currently building up and the more friendly development-support. We hope they will both exist and help each other push forward, just like OpenGL vs DirectX, nVidia vs ATI, Europe vs the USA vs Asia, etc. Time will tell what features will eventually end up in each technology.

Update August 2012: due to higher demand StreamHPC is explicitly offering CUDA to OpenCL porting.

Does GPGPU have a bright future?

Posted by Vincent Hindriksen on 15 April 2010 with 4 Comments

This post has a focus towards programmers. The main question “should I invest in learning CUDA/OpenCL?”

Using the video-processor for parallel processing is actually possible since beginning 2006; you just had to know how to use the OpenGL Shader Language. Not long after that (end 2006) CUDA was introduced. A lot has happened after that, which resulted in the introduction of OpenCL in fall 2008. But actually the acceptance of OpenCL is pretty low. Many companies which do use it, want to have it as their own advantage and don’t tell the competition they just saved hundreds of thousands of Euros/Dollars because they could replace their compute-cluster with a single computer which cost them €10 000,- and a rewrite of the calculation-core of their software. Has it become a secret weapon?

This year a lot of effort will be put to integrate OpenCL within the existing programming languages (without all the thousands of tweak-options visible). Think about wizards around pre-built kernels and libraries. Next year everything will be around kernel-development (kernels are the programs which do the actual calculations on the graphics processor). The year after that, the peak is over and nobody knows it is built in their OS or programming-language. It’s just like current programmers use security-protocols, but don’t know what it actually is.

If I want to slide to the next page on modern mobile phones, I just make a call to a slide-function. A lot is happening when the function is called, such building up the next page in a separate part of memory, calling the GPU-functions to show the slide, possibly unloading the previous page. The same is with OpenCL; I want to calculate a FFT with specified precision and I don’t want to care on which device the calculation is done. The advantage of building blocks (like LEGO) is that we keeps the focus of development on the end-target, while we can tweak it later (if the customer has paid for this extra time). What’s a bright future if nobody knows it?

Has it become a secret weapon?

Yes and no. Companies want to brass about their achievements, but don’t want the competitors to go the same way and don’t want their customers to demand lower prices. AMD and NVidia are pushing OpenCL/CUDA, so it won’t stop growing in the market, but actually this pushing is the biggest growth in the market. NVidia does a good job with marketing their CUDA-platform.

What’s a bright future if nobody knows it?

Everything that has market-wide acceptation has a bright future. It might be replaced by a successor, but acceptance is the key. With acceptance there always will be a demand for (specialised) kernels to be integrated in building blocks.

We also have the new processors with 32+ cores, which actually need to be used; you know the problem with dual-core “support”.

Also the mobile market is growing rapidly. Once that is opened for OpenCL, there will be a huge growth in demand for accelerated software.

My advise: if high performance is very important for your current or future tasks, invest in learning how to write kernels (CUDA or OpenCL, whatever your favourite is). Use wrapper-libraries which make it easy for you, because once you’ve learned how to use the OpenCL-calls they are completely integrated in your favourite programming language.

nVidia’s CUDA vs OpenCL Marketing

Posted by Vincent Hindriksen on 8 February 2010 with 6 Comments

Please read this article about Microsoft and OpenCL, before reading on. The requested benchmarking is done by the people of Unigine have some results on differences between the three big GPGPU-APIs: part I, part II and part III with some typical results.
The following read is not about the technical differences of the 3 APIs, but more about the reason behind why alternate APIs are being kept maintained while OpenCL does the trick. Please comment, if you think OpenCL doesn’t do the trick

As was described in the article, it is important for big companies (such as Microsoft and nVidia) to defend their market-share. This protection is not provided through an open standard like OpenCL. As it went with OpenGL – which was sort of replaced by DirectX to gain market-share for Windows – now nVidia does the same with CUDA. First we will sum up the many ways nVidia markets Cuda and then we discuss the situation.

The way nVidia wanted to play the game, was soon very clear: it wanted to market CUDA to be seen as the better alternative for OpenCL. And it is doing a very good job.

The best way to get better acceptance is giving away free information, such as articles and courses. See Cuda’s university courses as an example. Also sponsoring helps a lot, so the first main-stream oriented books about GPGPU discussed Cuda in the first place and interesting conferences were always supported by Cuda’s owner. Furthermore loads of money is put into an expensive webpage with very complete information about GPGPU there is to find. nVidia does give a choice, by also having implemented OpenCL in its drivers – it does just not have big pages on how-to-learn-OpenCL.

AMD – having spent their money on buying ATI, could not put this much effort in the “war” and had to go for OpenCL. You have to know, AMD has faster graphics-cards for lower prices than nVidia at the moment; so based on that, they could become the winners on GPGPU (if that was to only thing to do). Intel saw the light too late and is even postponing their High-end GPU, the Larrabee. The only help for them is that Apple demands to have OpenCL on nVidia’s drivers – but for how long? Apple does not want strict dependency on nVidia, since it i.e. also has a mobile market. But what if all Apple-developers create their applications on CUDA?

Most developers – helped by the money&support of nVidia – see that there is just little difference between Cuda and OpenCL and in case of a changing market they could translate their programs from one to the other. For now a demand to have a high-end videocard of nVidia can be rectified, a card which actually many people have or easily could buy within the budget of their current project. The difference between Cuda and OpenCL is comparable with C# and Java – the corporate tactics are also the same. Possibly nVidia will have better driver-support for Cuda than OpenCL and since Cuda does not work on AMD-cards, the conclusion is that Cuda is faster. There can then be a situation that AMD and Intel have to buy Cuda-patents, since OpenCL does not have the support.

We hope that OpenCL will stay the main-stream GPGPU-API, so the battle will be on hardware/drivers and support of higher-level programming-languages. We do really appreciate what nVidia already has done for the GPGPU-industry, but we hope they will solely embrace OpenCL for the sake of the long-term market-development.

What we left out of this discussion is Microsoft’s DirectCompute. It will be used by game-developers in the Windows-platform, which just need physics-calculations. When discussing the game-industry, we will tell more about Microsoft’s DirectX-extension.
Also come back and read our upcoming article about FPGAs + OpenCL, a combination from which we expect a lot.

OpenCL – the battle, part I

Posted by Vincent Hindriksen on 28 January 2010 with 2 Comments

Part I: the Hardware-companies and Operating Systems

(Part II will be about programming languages and software-companies, part III about the gaming-industry)

OpenCL is the new, but already de-facto standard of stream-computing; but how it got there so fast is somewhat strange. A few years ago there were many companies and research-groups seeing the power of using the GPU, such as:

Ageia technologies’ PhysX (acquired by nVidia)
Havok Physics (acquired by Intel)
Stanford University’s Brook
PeakStream (acquired by Google)
ATI’s Close-To-Metal (abandoned, ATI acquired by AMD)
nVidia’s CUDA
AMD’s Brook-extensions Brook+ (completely replaced by OpenCL)
IBM’s InfoSphere

And the fight is really not over, since we are talking about a big shift in the super-computing industry. Just think of IBM BlueGene, which will lose lots of market to nVidia and AMD. Or Intel, who hasn’t acquired a GPU-creator as AMD did. Who had expected the market to change this rigorous? If we’re honest, we could have seen it coming (when looking at the turbulence around PhysX and Havok), but “normally” this new techniques would be introduced slowly.

The fight is about market-shares. For operating-systems, the user wants to have their movies encoded in 20 minutes just like their neighbour. For HPC-computing, since clusters can be updated for a far lower price than was possible with the old-fashioned way; here it is mostly between Linux HPC and windows HPC (which still has a very small market-share), but also database-engines which rely on high-performance hardware/software.
The most to gain is in the processor-market. The extremely large consumer-market is declining since 2004, since most users do not need more than a netbook and have bought a separate gaming-computer for the more demanding games. We don’t only see Intel and AMD anymore, but IBM’s powerful Cell- en Power-processors, very power-efficient ARM-processors, etc. Now OpenCL could make it more interesting to buy an average processor and a good graphics-card, Intel (and AMD) have no choice then to take the battle with nVidia.

Background: Why Apple made OpenCL

Short answer: pure frustration. All those different implementations would or get a share or fight for being named the standard; Apple wanted to bet on the right horse and therefore took the lead in creating an open standard. Money would be made by updating software and selling more hardware. For that reason Apple’s close partners Intel and nVidia were easily motivated to help developing the standard. Currently Apple’s only (public) reasons for giving away such an expensive and specialised project is publicity and to be ahead of the competition. Since it will not be a core-business of Apple, it does not need to stay in lead, but which companies do?

Acquisitions, acquisition, acquisitions

No time to lose for the big companies, so they must get the knowledge in-house as soon as possible. Below are some examples.

Microsoft: Interactive Supercomputing (22-Sept-2009): made Star-P, software which allowed users to perform scientific, engineering or analytical computation on array or matrix-based data to use parallel architectures such as multi-core workstations, multi-processor systems, distributed memory clusters or utility/cloud-based environments. This is completely in the field of OpenCL, which Microsoft needs to strengthen its products as Apple already did, such as SQL-server and Windows HPC.
nVidia: Ageia technologies (22-Febr-2008): made specialized PC-cards and software for calculating complicated physics in games. They made the first commercial product aiming at the masses (gamers). PhysX-code could by integrated in nVidia-drivers to be used with modern nVidia-GPUs.
AMD: ATI (24-juli-2006): graphics chip specialist. Although the price was too high, it saved AMD from being bought out by Intel and even stay ahead (if they had kept running).
Intel: Havok (17-Sept-2007): builds games-tools, such as a physics-engine. After Ageia was captured, the only good company out there to buy; AMD was too late, which spent all its money on ATI. Wind River (4-June-2009): a company providing embedded systems, development tools for embedded systems, middleware, and other types of software. Also read this interesting article. Cilk (31-July-2009): offers parallel extensions that are tightly tied into a compiler. RapidMind (19-Aug-2009): created a high-level language Sh, which had an OpenCL-backend. Intel has a lead in CPU-compilers, which it wants to broaden to multi-core- and GPU-compilers. Intel discovered it was in the group of “old fashioned compiler-builders” and had lots to learn in a short time.

If you know more acquisitions of interest, please let us know.

Winners

Apple, Intel and NVidia are the winners for 2009 and 2010. They have currently the most knowledge in house and have their marketing-machine running. NVidia has the best insight for new markets.

Microsoft and Game-developers are second; they took the first train by joining the OpenCL-consortium and taking it very serious. At the end of 2010 Microsoft will be at Apple’s level of expertise, so we will see then who has the best novelties. The game-developers, of which most already have experience with physics-calculations, all had a second chance when they had misjudged the Physics-engines. More on gaming in part III.

AMD is currently actually a big loser, since it does not seem to take it all seriously enough. But AMD can afford to be late, since OpenCL makes it easy to switch. We hope the best for AMD, since it has the technology of both CPU and GPU, and many years of experience in both fields. More on the competition between marketing-monster nVidia and silent AMD will be discussed in a blog-item, next week.

Another possible loser is Linux, which has lots to lose on HPC-market; OpenBSD-based Apple and Windows HPC can actually win market-share now. Expect most from hardware-manufacturers Intel, AMD and nVidia to give code to the community, but also from universities who do lots of research on the ever-flexible Linux. At the end it all depends on OpenCL-adaptation of (Linux-specific) programming-languages, which will be discussed in part II.

ARM is a member of the OpenCL-group but does not seem to invest in it; they seem to target another growing market: the low-power mobile devices. We will write on OpenCL and the mobile market later and why ARM currently can be relaxed about OpenCL.

We hope you have more insights in this new market; please contact us for more specific information and feel free to give your comments. Please stay tuned for part II and III, which will be released the next few weeks.

Tag: OpenCL