Separation of Compute and Transfer from the rest of the code.

What if trees had the roots, trunk and crown were mixed up? Would it still have the advantage over other plants?

In the beginning of 2012 I spoke with Patrick Viry, former CEO of Ateji – now out-of-business. We shared ideas on GPGPU, OpenCL and programming in general. While talking about the strengths of his product, he came with a remark which I found important and interesting: separation of transfer. This triggered me to think further – those were the times when you could not read on modern computing, but had to define it yourself.

Separation of focus-areas are known to increase effectiveness, but are said to be for experts only. I disagree completely – the big languages just don’t have good support for defining the separations of concerns.

For example, the concepts of loops is well-known to all programmers, but OpenCL and CUDA have broken with that. Instead of using huge loops, those languages describe what has to be done at one location in the data and what the data is to be processed. From what I see, this new type of loop is getting abandoned in higher level languages, while it is a good design pattern.

I would like to discuss separation of compute and transfer from the rest of the code, to show that this will improve the quality of code. Continue reading “Separation of Compute and Transfer from the rest of the code.”

Theoretical transfer speeds visualised

There are two overviews I use during my training, and I would like to share with you. Normally I write them on a whiteboard, but it has advantages having it in a digital form.

Transfer speeds per bus

The below image gives an idea of theoretical transfer speeds, so you know how a fast network (1GB of data in 10 seconds) compares to GPU-memory (1GB of data in 0.01 seconds). It does not show all the ins and outs, but just give an idea how things compare. For instance it does not show that many cores on a GPU need to work together to get that maximum transfer rate. Also I have not used very precise benchmark-methods to come to these views.

We zoom into the slower bus-speeds. So all the good stuff is at the left and all buses to avoid are on the right. What should be clear is that a read from or write to a SSD will make the software very slow if you use write-trough instead of write-back.

What is important to see that localisation of data makes a big difference. Take a look at the image and then try to follow with me. When using GPUs the following all can increase the speed on the same hardware: not using hard-disks in the computation-queue, avoiding transfers to and from the GPU and increasing the computations per byte of data. When an algorithm needs to do a lot of data-operations such as transposing a matrix, then it’s better to have a GPU that has high memory-access. When the number of operations is important, then clock-speed and cache-speed is most important.

Continue reading “Theoretical transfer speeds visualised”

Do your (X86) CPU and GPU support OpenCL?

Does your computer have OpenCL-capable hardware? Read on and find out if your computer is compatible…

If you want to know what other non-PC hardware (phones, tablets, FPGAs, DSPs, etc) is running OpenCL, see the OpenCL SDK page.

For people who only want to run OpenCL-software and have recent hardware, just read this paragraph. If you have recent drivers for your GPU, you can be sure OpenCL is already supported and you can run OpenCL-capable software. NVidia has support for OpenCL 1.1 since drivers 280.13, so if you need OpenCL 1.1, then make sure you have this version or later. If you want to use Intel-processors and you don’t have an AMD GPU installed, you need to download the runtime of Intel OpenCL.

If you want to know if your X86 device is supported, you’ll find answers in this article.

Often it is not clear how OpenCL works on CPUs. If you have a 8 core processor with double threading, then it mostly is understood that 16 pipelines of instructions are possible. OpenCL takes care of this threading, but also uses parallelism provided by SSE and AVX extension. I talked more about this here and here. Meaning that an 8-core processor with AVX can compute 8 times 32 bytes (8*8 floats or 8*4 doubles) in parallel. You could see it as parallelism of parallelism. SSE is designed with multimedia-operations in mind, but has enough to be used with OpenCL. The minimum requirement for OpenCL-on-a-CPU is SSE 4.2, though.

A question I see often is what to do if you have more devices. There is no OpenCL-package for all the available devices, so you then need to install drivers for each device. CPU-drivers are often included in the GPU-drivers.

Read on to find out exactly which processors are supported.

Continue reading “Do your (X86) CPU and GPU support OpenCL?”

Differences from OpenCL 1.1 to 1.2

This article will be of interest if you don’t want to read the whole new specifications [PDF] for OpenCL 1.2.

As always, feedback will be much appreciated.

After many meetings with the many members of the OpenCL task force, a lot of ideas sprouted. And every 17 or 18 months a new version comes out of OpenCL to give form to all these ideas. You can see totally new ideas coming up and already brought outside in another product by a member. You can also see ideas not appearing at all as other members voted against them. The last category is very interesting and hopefully we’ll see a lot of forum-discussion soon what should be in the next version, as it is missing now.

With the release of 1.2 there was also announced that (at least) two task forces will be erected. One of them will target integration in high-level programming languages, which tells me that phase 1 of creating the standard is complete and we can expect to go for OpenCL 2.0. I will discuss these phases in a follow-up and what you as a user, programmer or customer, can expect… and how you can act on it.

Another big announcement was that Altera is starting to support OpenCL for a FPGA-product. In another article I will let you know everything there is to know. For now, let’s concentrate on the actual differences in this version software-wise, and what you can do with it. I have added links to the 1.1 and 1.2 man-pages, so you can look it up.

Continue reading “Differences from OpenCL 1.1 to 1.2”

Basic Concepts: online kernel compiling

Typos are a programmers worst nightmare, as they are bad for concentration. The code in your head is not the same as the code on the screen and therefore doesn’t have much to do with the actual problem solving. Code highlighting in the IDE helps, but better is to use the actual OpenCL compiler without running your whole software: an Online OpenCL Compiler. In short is just an OpenCL-program with a variable kernel as input, and thus uses the compilers of Intel, AMD, NVidia or whatever you have installed to try to compile the source. I have found two solutions, which both have to be built from source – so a C-compiler is needed.

  • CLCC. It needs the boost-libraries, cmake and make to build. Works on Windows, OSX and Linux (needs possibly some fixes, see below).
  • OnlineCLC. Needs waf to build. Seems to be Linux-only.

Continue reading “Basic Concepts: online kernel compiling”

Basic Concepts: OpenCL Convenience Methods for Vector Elements and Type Conversions

In the series Basic Concepts I try to give an alternative description to what is said everywhere else. This time my eye fell on alternative convenience methods in two cases which were introduced there to be nice to devs with i.e. C/C++ and/or graphics backgrounds. But I see it explained too often from the convenience functions and giving the “preferred” functions as a sort of bonus which works for the cases the old functions don’t get it done. Below is the other way around and I hope it gives better understanding. I assume you have read another definition, so you see it from another view not for the first time.



Continue reading “Basic Concepts: OpenCL Convenience Methods for Vector Elements and Type Conversions”

Installing both NVidia GTX and AMD Radeon on Linux for OpenCL

August 2012: article has been completely rewritten and updated. For driver-specific issues, please refer to this article.

Want to have both your GTX and Radeon working as OpenCL-devices under Linux? The bad news is that attempts to get Radeon as a compute device and the GTX as primary all failed. The good news is that the other way around works pretty easy (with some luck). You need to install both drivers and watch out that isn’t overwritten by NVidia’s driver as we won’t use that GPU for graphics – this is also the reason why it is impossible to use the second GPU for OpenGL.

Continue reading “Installing both NVidia GTX and AMD Radeon on Linux for OpenCL”

OpenCL Potentials: Investment-industry

This is the second in the series “OpenCL potentials“. I chose this industry because it is the finest example where you are always late, even if you were first. So it always must be faster if you want to make the better analyses. Before I started StreamHPC I worked for an investment-company, and one of the things I did was reverse engineering a few megabytes of code with the primary purpose of updating the documentation. I then made a proof-of-concept to show the data-processing could be accelerated with a factor 250-300 using Java-tricks only and no GPGPU. That was the moment I started to understand that real-time data-computation was certainly possible. Also that IO is the next bottle-neck after computional power. Though I am more interested in other types of research, I do have my background and therefore try to give an overview for this sector and why it matters.
Continue reading “OpenCL Potentials: Investment-industry”

AMD OpenCL coding competition

The AMD OpenCL coding competition seems to be Windows 7 64bit only. So if you are on another version of Windows, OSX or (like me) on Linux, you are left behind. Of course StreamHPC supports software that just works anywhere (seriously, how hard is that nowadays?), so here are the instructions how to enter the competition when you work with Eclipse CDT. The reason why it only works with 64-bit Windows I don’t really get (but I understood it was a hint).

I focused on Linux, so it might not work with Windows XP or OSX rightaway. With little hacking, I’m sure you can change the instructions to work with i.e. Xcode or any other IDE which can import C++-projects with makefiles. Let me know if it works for you and what you changed.

Continue reading “AMD OpenCL coding competition”

The current state of WebCL

Years ago Microsoft was in court as it claimed Internet Explorer could not be removed from Windows without breaking the system, while competitors claimed it could. Why was this so important? Because (as it seems) the browser would get more important than the OS and internet as important as electricity in the office and at home. I was therefore very happy to see the introduction of WebGL, the browser-plugin for OpenGL, as this would push web-interfaces as the default for user-interfaces. WebCL is a browser-plugin to run OpenCL-kernels. Meaning that more powerful hardware-devices are available to JavaScript. This post is work-in-progress as I try to find more resources! Seen stuff like this? Let me know.

Continue reading “The current state of WebCL”

Interest in OpenCL

Since more than a year I have this blog and I want to show the visitors around the world. Why? Then you know where OpenCL is popular and where not. I chose an unknown period, so you cannot really reverse engineer how many visitors I have – but the nice thing is that not much changes between a few days and a month. Unluckily Google Analytics is not really great for maps (Greenland as big as Africa, hard to compare US states to EU countries, cities disappear at world-views, etc), so I needed to do some quick image-editing to make it somewhat clearer.

At the world-view you see that the most interest comes from 3 sub-continents: Europe, North America and South-East Asia. Africa is the real absent continent here, except some Arab countries and South-Africa only some sporadic visits from the other countries. What surprises me is that the Arab countries are among my frequent visitors – this could be a language-issue, but I expected about the same number of visitors as from i.e. China. Latin America has mostly only interest from Brazil.

Continue reading “Interest in OpenCL”

Is OpenCL coming to Apple iOS?

Answer: No, or not yet. Apple tested Intel and AMD hardware for OSX, and not portable devices. Sorry for the false rumour; I’ll keep you posted.

Update: It seems that OpenCL is on iOS, but only available to system-libraries and not for apps (directly). That explains part of the responsiveness of the system.

At the thirteenth of August 2011 Apple askked the Khronosgroup to test 7 unknown devices if they are conformant with OpenCL 1.1. As Apple uses OpenCL-conformant hardware by AMD, NVidia and Intel in their desktops, the first conclusion is that they have been testing their iOS-devices. A quick look at the list of available iOS devices for iOS 5 capable devices gives the following potential candidates:

  • iPhone 3GS
  • iPhone 4
  • iPhone 5
  • iPad
  • iPad 2
  • iPod Touch 4th generation
  • Apple TV
If OpenCL comes to iOS soon (as it is already tested), iOS 5 would be the moment. iOS 5 processors are all capable of getting speed-up by using OpenCL, so it is no nonsense-feature. This could speed up many features among media-conversion, security-enhancements and data-manipulation of data-streams. Where now the cloud or the desktop has to be used, in the future it can be done on the device.

Continue reading “Is OpenCL coming to Apple iOS?”

Power to the Vector Processor

Reducing energy-consumption is “hot”

After reading this article “Nvidia is losing on the HPC front” by The Inquirer which mixes up the demand for low-power architectures with the other side of the market: the demand for high performance. It made me think that it is not that clear there are two markets using the same technology. Also Nvidia has proven it to be not true, since the super-computer “Nebuale” uses almost half the watts per flop as the #1. How come? I quote The Register from an article of one year old:

>>When you do the math, as far as Linpack is concerned, Jaguar takes just under 4 watts to deliver a megaflops at a cost of $114 per megaflops for the iron, while Nebulae consumes 2 watts per megaflops at a cost of $39 per megaflops for the system. And there is little doubt that the CUDA parallel computing environment is only going to get better over time and hence more of the theoretical performance of the GPU ends up doing real work. (Nvidia is not there yet. There is still too much overhead on the CPUs as they get hammered fielding memory requests for GPUs on some workloads.)<<

Nvidia is (and should) be very proud. But actually I’m already looking forward when hybrids get more common. They will really shake up the HPC-market (as The Register agrees) in lowering latency between GPU and CPU and lowering energy-consumption. But where we can find a bigger market is the mobile market.

Continue reading “Power to the Vector Processor”

Keep The Hardware Focus

The real Apu

If you buy a car, the first choice is not often the kind of fuel. You first select on the engine-properties, the looks, the interior, the brand and for sure the total cost of ownership. The costs can be a reason to choose for a certain type of fuel though. In the parallel computation world it is different. There the fuel (CUDA or OpenCL) is the first decision and then the hardware is chosen. I think this is wrong and therefore speak a lot about CUDA-vs-OpenCL, while I think NVidia is a good choice for a whole list of algorithms.

If we give advise during a consult, we want to give the best advice. In case of CUDA, that would be based on budget to go for Tesla or the latest GTX; in case of OpenCL we can give much better advice on hardware. But actually starting with the technique is the worst thing you can do: focus on the hardware and then pick the technique that suits best.

IMPORTANT. The following is for understanding some concepts and limits only! It is pure theoretically, so I don’t claim any real-world results. Also what not is taken into account is how well different processors handle control-instructions (for, while, if, case, etc), which has quite some influence on actual performance.

Continue reading “Keep The Hardware Focus”

Qt Creator OpenCL Syntax Highlighting

With highlighting for Gedit, I was happy to give you the convenience of a nice editor to work on OpenCL-files. But it seems that one of the most popular IDEs for C++-programming is Qt Creator. So you receive another free syntax highlighter. You need at least Qt Creator 2.1.0.

The people of Qt have written everything you need to know about their Syntax highlighting, which was enough help to create this file. You see that they use the system of Kate, so logically this file works with this editor too.

In this article there is all you need to know to use Qt Creator with OpenCL.


First download the file to your computer.

Under Windows and OSX you need to copy this file to the directory shareqtcreatorgeneric-highlighter in the Qt installation dir (i.e. c:Qtqtcreator-2.2.1shareqtcreatorgeneric-highlighter). Under Linux copy this file to ~/.kde/share/apps/katepart/syntax or to /usr/share/kde4/apps/katepart/syntax (all users). That’s all, have fun!

Install OpenCL on Debian, Ubuntu and Mint orderly

Libraries – can’t have enough

If you read different types of manuals how to compile OpenCL software on Linux, then you can get dizzy of all the LD-parameters. Also when installing the SDKs from AMD, Intel and NVIDIA, you get different locations for libraries, header-files, etc. Now GPGPU is old-fashioned and we go for heterogeneous programming, the chances get higher you will have more SDKs on your machine. Also if you want to keep it the way you have, reading this article gives you insight in what the design is after it all. Note that Intel’s drivers don’t give OpenCL support for their GPUs, but CPUs only.

As my mother said when I was young: “actually cleaning up is very simple”. I’m busy creating a PPA for this, but that will take some more time.

First the idea. For developers OpenCL consists of 5 parts:

  • GPUs-only: drivers with OpenCL-support
  • The OpenCL header-files
  • Vendor specific libraries (needed when using -lOpenCL)
  • -> a special driver
  • An installable client driver

Currently GPU-drivers are always OpenCL-capable, so you only need to secure 4 steps. These are discussed below.

Please note that in certain 64-bit distributions there is not lib64, but only ‘lib’ and ‘lib32’. If that is the case for you, you can use the commands that are mentioned with 32-bit.

Continue reading “Install OpenCL on Debian, Ubuntu and Mint orderly”

OpenCL vs CUDA Misconceptions

Translation available: Russian/Русский. (Let us know if you have translated this article too… And thank you!)

Last year I explained the main differences between CUDA and OpenCL. Now I want to get some old (and partly) false stories around CUDA-vs-OpenCL out of this world. While it has been claimed too often that one technique is just better, it should be also said that CUDA is better in some aspects, whereas OpenCL is better in others.

Why did I write this article? I think NVIDIA is visionary in both technology and marketing. But as I’ve written before, the potential market for dedicated graphics cards is shrinking and therefore forecasting the end of CUDA on desktop. Not having this discussion opens the door for closed standards and delaying innovation, which can happen on top of OpenCL. The sooner people & companies start choosing for a standard that gives equal competitive advantages, the more we can expect from the upcoming hardware.

Let’s stand by what we have learnt at school when gathering information sources, don’t put all your eggs in one basket! Gather as many sources and references as possible. Please also read articles which claim (and underpin!) why CUDA has a more promising future than OpenCL. If you can, post comments with links to articles you think others should read too. We appreciate contributions!

Also found that Google Insights agrees with what I constructed manually.

Continue reading “OpenCL vs CUDA Misconceptions”

Intel’s OpenCL SDK examples for GCC

Update august 2012: There is a new post for the latest Linux examples.

Note: these patches won’t work anymore! You can learn from the patches how to fix the latest SDK-code for GCC and Linux/OSX.

Code-examples are not bundled with the Linux OpenCL SDK 1.1 beta. Their focus is primarily Windows, so VisualStudio seems to be a logical target. I just prefer GCC/LLVM which you can get to work with all OSes. After some time trying to find the alternatives for MS-specific calls, I think I managed. Since ShallowWater uses DirectX and is quite extensive, I did not create a patch for that one – sorry for that.

I had a lot of troubles getting the BMP-export to work, because serialisation of the struct added an extra short. Feedback (such as a correct BMP-export of a file) is very welcome, since I the colours are correct. For the rest: most warnings are removed and it just works – tested with g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 on 64 bit (llvm-g++-4.2 seems to work too, but not fully tested).


Continue reading “Intel’s OpenCL SDK examples for GCC”

WebCL – a next step

WebGL is already secured to be a success; only IE-users will not have the 3D-web without plugin. But once sites like Wikipedia starts to offer 3D-imagery of the human body and buildings (as we know in Google Earth’s KML-format), things can go really fast in favour of the WebGL-supported browsers. This is important, because the balance between the computers/smartphones and the servers (you know: internet) just got somewhat more connected. I was first somewhat critical, because I want the web to have content (text and images) and not be “an ultimate experience” – luckily it turned out to be good for the content. I’m looking forward to Wikipedia and hardware accelerated services like Streetview!

A possible next step would be WebCL. But is it technically possible? And what would the internet-landscape be to be ready for such thing? Khronos did mention to be working on such technique, according to this article. But not much attention was given to it. So I was happy to see a GSOC11 proposal WebCL-plugin for Firefox by Adrien Plagnol. They even have some code. But it was already finished for Firefox 4 (Windows and Linux), I learnt about a week ago.

WebCL by Nokia

It is very simple: it is a Javascript-version of the host-specific OpenCL code. Kernels are just kernels as we know them.

Nokia has put together a very nice WebCL homepage, which contains tutorials. And at lesson one we see how it looks like:

function detectCL() {
  // First check if the WebCL extension is installed at all

  if (window.WebCL == undefined) {
    alert("Unfortunately your system does not support WebCL. " +
          "Make sure that you have both the OpenCL driver " +
          "and the WebCL browser extension installed.");
    return false;

  // Get a list of available CL platforms, and another list of the
  // available devices on each platform. If there are no platforms,
  // or no available devices on any platform, then we can conclude
  // that WebCL is not available.

  try {
    var platforms = WebCL.getPlatformIDs();
    var devices = [];
    for (var i in platforms) {
      var plat = platforms[i];
      devices[i] = plat.getDeviceIDs(WebCL.CL_DEVICE_TYPE_ALL);
    alert("Excellent! Your system does support WebCL.");
  } catch (e) {
    alert("Unfortunately platform or device inquiry failed.");

As you can see this is very understandable code, if you know the basics of OpenCL and JavaScript. It is built for stability, so it seems to crash less easily than I expected.

I’ve written/tweeted a lot about OpenCL wrappers and how I think the OpenCL-ecosphere advances mainly by the growing up of the wrappers. Complaints about the far too long initialisation of OpenCL-software can easily be put in just a few lines of code. We now start from scratch again, but I will not be wonder-struck if there will be a jQuery-plugin released soon.


In the first place, think real-time encryption which can be adapted per user without the browser knowing. There are many more reasons all going back to the demand to have a browser-based computer (like Google is trying with its ChromeOS). All OS-APIs need to be available in a HTML5-like language and this is exactly that.

What are you still doing here? Install the Opencl-plugin for Firefox 4 and try Nokia’s online OpenCL-sandbox now! +1 for crashing it, +2 for sending in a bug-report.

The history of the PC from 2000 – 2012

After IBM-compatible clones took over from Apple, Atari and ZX Spectrum, we just got used to that a PC is an X86 with MS Windows and Office on it. Around a decade ago Apple fought back with OSX on which Windows 7 (launched in 2009) was the first real answer. Meanwhile Apple switched to Intel, since IBM was not fast enough with the development of the POWER-processor – a huge operation, which seemed a one-time-only step for Apple at the time. SemiAccurate now speaks of Intel being replaced by ARM on Apple’s laptops.

A few weeks ago I asked Computer Science students if they knew ARM. Not even 1% had heard of it, but lots more knew there was a Samsung-chip in their smartphone. So what’s going on without us knowing it?

I’ll try to describe the market for a few key-years and then try to put the big names in it. There is a lot going on between i.e. Nvidia, Samsung, Texas Instruments and Imagination Technologies in the ARM-market, but I’ll leave that out of the story. Also not mentioned are the game-consoles and servers, but they did have big influences on the home-PC market.

In the picture at the right you see an idea of how fast the markets would have grown from a 2006 perspective. (Click on it for the full report). You see that the explosive growth of smartphones was not expected; the other detail is that the cloud also was not foreseen here.

After reading you understand why Nvidia focuses so much on HPC and mobile.

After IBM-compatible clones took over from Apple, Atari and ZX Spectrum, we just got used to that a PC is an X86 with MS Windows and Office on it. Around a decade ago Apple fought back with OSX on which Windows 7 (launched in 2009) was the first real answer. Meanwhile Apple switched to Intel, since IBM was not fast enough with the development of the POWER-processor – a huge operation, which seemed a one-time-only step for Apple at the time. SemiAccurate now speaks of Intel being replaced by ARM on Apple’s laptops.

A few weeks ago I asked Computer Science students if they knew ARM. Not even 1% had heard of it, but lots more knew there was a Samsung-chip in their smartphone. So what’s going on without us knowing it?

I’ll try to describe the market for a few key-years and then try to put the big names in it. There is a lot going on between i.e. Nvidia, Samsung, Texas Instruments and Imagination Technologies in the ARM-market, but I’ll leave that out of the story. Also not mentioned are the game-consoles and servers, but they did have big influences on the home-PC market.

In the picture at the right you see an idea of how fast the markets would have grown from a 2006 perspective. (Click on it for the full report). You see that the explosive growth of smartphones was not expected; the other detail is that the cloud also was not foreseen here.

After reading you understand why Nvidia focuses so much on HPC and mobile.

Continue reading “The history of the PC from 2000 – 2012”

Molybdenite and graphene to the helping hand?

The rabbit in “The Last Mimzy” was very special. What material was it made of?

You might have read about Molybdenite a few months ago. It is more efficient than Graphene which is in turn more efficient than good old Silicon, most notable energy-wise. Magazine ‘Nature’ had an article on it, which is summarised by Psychorg, so check it out. The claim it is 100 000 times more efficient than Silicon (and more efficient than the already very promising Graphene). This fan-free Silicon-replacer would be a major disaster for the cooling-industry!

But what would change for us? We are now on the edge to move to ARM (started by the smartphone- and tablet-industry), but is al this needed if the energy-costs drop to prices comparable to the costs to keep ice-cream cold on the North-Pole (20 years ago). This technique would give huge potential to Fusion-chips which now have a long way to go, to solve the heat-problem. But since it would take several years (and thus decades in hi-tech years) to get these chips on the market, no assumptions for market-share can be made based on what will happen in a few years.

Low-power ARM and Molybdenite X86

So this is European ARM (and licensees around the world) vs US Intel and AMD. The sarcastic joke among me and a few friends make, is that the fight of the past 20, 30 years between the economic US and EU is actually about who has the money to hire the most Asians, to develop the revolutionising devices. But as long as the US and EU have the feeling we are actually the equation of the competition as we are a massive 12% of the world-population, I won’t be behind the facts too much.

Since batteries don’t evolve as fast as processors, the power-problem needed to get slashed differently. A mayor reason for choosing ARM is that it uses less energy than X86, just like LCD/TFT is replaced by e-ink and organic LEDs and memory is non-volatile in portable devices.

In case we get a big reduction for CPU and memory, then the efficiency of the architecture is less of a problem. So then Intel and AMD can re-enter the market again, but then with much more powerful devices. Until then ARM-licensees like NVIDIA and ImTec have a better market if it comes to near-future devices. As I expected more tablet-manufacturers come up with docking-stations to replace the PC with a tablet. AMD and Intel have to keep surprising (and probably protect their market) the coming years to avoid losing from ARM. In other words: the coming years will be exciting how the consumer-market looks like and which companies deal in it. When thinking about these years, keep in mind what Windows XP has thought us: computers are fast enough for what average Joe wants to do with it. Hey, I use my laptop for OpenCL and the big screen, for the rest I use my mobile phone.

Hybrid chips

While I did not see it as a serious problem last year, the heat-problem for a GPU+CPU on one chip is quite a challenge. Waiting for the Molybdrenite or Graphene chips to mature will be like digging your own grave. Each step forward will result in two new products: one which is more power and/or heat efficient, and one which is more powerful. Since the competition from ARM-companies is heavy, the chances that the focus will be on more powerful Hybrid CPUs is bigger. As I stated above the losses are in the low-power area. Intel and AMD are very aware of this challenge.

Have you checked the differences between DirectX 10 and 11 games? Just check the discussions on the growing side of not needing to support DirectX 11, because 10 is good enough. Also here, the demand is higher to have the same graphics-quality for less money on more portable devices. Hybrid CPUs will eat the GPU-market for sure.

ARM-processors are hybrid processors. That’s all I tell, so you can -in combination with all stated above- formulate your own conclusions. I was very surprised NVIDIA started targeting ARM with their high-end GPUs, but was this a real bad idea?

Device vs Data-centre

Reduction of energy-costs for processors will reduce the head-less servers in the data-centre enormously. Internet costs loads of energy, both the transport and the servers – this will reduce the server-part of energy-consumption-sum with quite some factors. All positive news.

But if it all this becomes true, that chips don’t use much energy anymore and actually mobile internet and other radios take the most, what will happen to the cloud? Will you upload your video to get it processed or put your mobile in the sun to charge it while waiting a shorter period?

Current developments, future needs

We need arithmetic, media-processing and input/output; we all have that. We need long battery-life, a good screen and a fast way to input our data and commands; we get more of that each day. But heat-production is Silicon limits a lot, so we get the perfect electronic device the moment we can replace Silicon. Getting rid of the heat could give us square chips, with challenges like reinventing the socket and multi-multi-layerness.

So the question to you: is in The Last Nimzy sequel (you know, the movie with the molybdenite rabbit) a logo of Intel, AMD, ARM or another company found?