Social Media: Facebook, LinkedIn and Twitter

SC-Facebook

Facebook

We have presence on Facebook, via a company page: StreamHPC

Also check out Khronos’ OpenCL fanpage to hear more news on OpenCL.

LinkedIn

Via http://www.linkedin.com/company/StreamHPC you can hear more about company-specific news. It has the news comparable to the newsletter.

Twitter

You can also follow us on Twitter. We have several accounts:
[columns]
[one_half title=”General”]
 StreamHPC.
Our main account. Everything GPGPU, OpenCL and extreme software performance. .
OpenCL:Pro
with focus on jobs and internships. .
OpenCLHPC.
on OpenCL usage in HPC. .
WebCLNews
on the current state of WebCL. .
OpenCLGuru
to answer your questions on OpenCL – at your service. .
[/one_half]
[one_half title=”Hardware specific”]

OpenCLonAMD
on the current state of OpenCL on AMD-processors. .
 OpenCLonARM
on the current state of OpenCL on ARM-processors. .
 OpenCLonFPGAs
on the current state of OpenCL on FPGAs. .
 OpenCLonDSPs
on the current state of OpenCL on DSPs. .
OpenCLonRISC
on the current state of OpenCL on RISC. .
[/one_half]
[/columns]
We hope you enjoy our Twitter channels! If you have suggestions, just tweet us!

Qualcomm Snapdragon 600 & 800 (Adreno 320 & 330)

snapdragon-800-mdps

[infobox type=”information”]

Need a Snapdragon programmer? Hire us!

[/infobox]

There are two Adreno GPUs currently available known to have/get OpenCL support: the 320 and 330, respectively in the Snapdragon 600 and Snapdragon 800.

Qualcomm does not provide a developer’s board, but the Sony Xperia Z is known to have OpenCL. Other phones are expected to have drivers pre-installed too. That is interesting, as new phones with Adreno 330 are shipped soon, such as the LG Optimus G2 LS980, Sony Xperia Z Ultra and a version of the Samsung Galaxy S4.

Drivers are still in beta and are known to have bugs (as of April 2013). This discussion is the most interesting to follow, if you want keep up to date.

There are plenty of tools available, such as the Snapdragon SDK for Android and these Tools and Resources for the Adreno GPU. In the latter you’ll find OpenCL samples you can run too (it is a Windows-installer, for some vague reason, so MAC and Linux users need to do some extracting). You can start building the code from this project.

http://www.youtube.com/watch?feature=player_embedded&v=CaS0kpozyMM

Boards

Focus is on the more recent Snapdragon 800.

Inforce IFC6410 – Snapdragon 600

IFC6410websiteThe Ifc6410 is a $149 costing single-board computer with Adreno 320 and Qualcomm Snapdragon S4 Pro – APQ8064.

Datasheet (PDF)

Order here.

 

Bsquare Mobile Development Boards for Snapdragon 800

Processor: Quad-core Krait 400 CPU at up to 2.3GHz per core (Snapdragon 8974) , Adreno 330 GPU, Hexagon QDSP6 V5. A few highlights: wifi n/ac, bluetooth 4, USB 3.0, NFC, 1280x720p screen (tablet: 1920x1080p), 2GB 800MHZ memory, 12MP+2MP camera. It all runs on Android 4.2 (Jelly Bean), so no Linaro-packages. More info the Qualcomm MDB page and on this Qualcomm blog.

Phone form factor: $799 – Tablet: $1099 – Also check out Bsquare’s information page for these products, but be aware there are some links to the wrong PDFs.

Warning: you cannot call or use your provider’s internet with these devices! The word ‘phone’ only refers to the form factor.

DragonBoard Snapdragon APQ8060A for Snapdragon 800

Some highlight: Snapdragon 8074 quad core processor, 2GB of LPDDR3 RAM, 16GB of eMMC, Wi-Fi, Bluetooth, GPS, HDMI out and qHD LCD with capacitive multi touch, Adreno 330.

Can be ordered via http://mydragonboard.org/db8074/ for $499,-

DB8074_annotated_EAP_v1.1

Sony Xperia Z phones

OpenCL_SonyThe Xperia Z1 and Xperia Z Ultra have OpenCL support and drivers are ready-loaded. Go here for an introduction of OpenCL on these phones.

It needs the Android NDK to run the OpenCL programs.

Sony sees great advantages in using OpenCL on their mobile phones – from the website:

You can also see that the execution speed is much faster using OpenCL on the GPU when compared to the plain single threaded c-code running on the CPU (tested on Sony Xperia Z1). In addition to the speed benefit, you may also find that you decrease energy consumption by utilizing OpenCL on the GPU compared to using standard programming methods on the CPU.

The entanglement of Bitcoins and compute-capabilities

Every now and then I read stories on Bitcoins (Wikipedia-article), as GPUs are used a lot to “mine” Bitcoins. They have some extensive benchmarks, and also their discussions giving me insights in specific parts of accelerators like GPUs. Also is this group very upwards if it comes to accepting new techniques. Today something changed: they are a bank now. One of the thoughts I had with this, I’d like to share with you.

If you look at various types of currencies, you see they all have various goals (trade, power, resources, energy, properties, etc). The inequality and differences are even more important than the amount. Various currencies are entangled to a certain goal or resource, but there is nothing entangled strongly to technology. Here is where Bitcoins come in…

Bitcoins are entangled with compute-power – a current benchmark for technological progress.

In this article I’d like to share how the tech-economy and Bitcoins are entangled, seen from the perspective of computing. I left out a lot of the “rules of economy” and hope you can put these in – the below text is just to guide you through the thought-process only. Disagreement is only good – as we learn all from it.

Continue reading “The entanglement of Bitcoins and compute-capabilities”

Happy New Year!

About a year ago this site was launched and a half year ago StreamHPC as a company was official for the Chamber of Commerce. It has been a year of hard work, but the reason for this all started after seeing the cover of a book about bore-outs. The result is there with a growing number of visitors from all over the world (from 62 countries since 23-Dec-2010) and new twitter-followers every week. Now some mixed news for 2011:

  • We are soon going to release a few plugins for Eclipse, both free and paid, to simplify your development.
  • 2011 will be the year of hybrid processors (Intel SandyBridge and AMD Fusion), which will make OpenCL much more popular.
  • 2011 is also going to be the year of the smart-phone (prognosis: in 2011 more smart-phones will be sold than PCs). So even more OpenCL-potential.
  • At 31-Dec-2010 we migrated the site to a faster server to reduce waiting-time also online.
  • The book will be released in parts, to avoid more delays.
  • There will be around ten (short) articles published in January. Both developers and managers will be served.
  • Our goal is to expand. We have shown you our vision, but we want to show you more.

In a few words: 2011 is going to be exciting! We wish all our readers, business-partners, friends, family and (new) customers a super-accelerated 2011!

StreamHPC – we accelerate your computations

Image Processing

Vd-SharpVd-Blur2Vd-Edge3At StreamHPC, there is broad experience in the parallel, high-performance implementation of image filters. We have significantly improved the performance of various image processing software. For example, we have supported Pixelmator in achieving outstanding processing speeds on large image data, and users frequently praise the software’s speed in comparisons with competing software products.

StreamHPC is currently hosting an educational initiative that supports interested individuals in their efforts of porting algorithms from the open-source GEGL image processing framework to fast parallel versions based on OpenCL. GEGL is used by the popular image manipulation software Gimp as well as other free software. For more information on this project, look at our website OpenCL.org, which we dedicate to spreading knowledge on OpenCL.

Google blocked OpenCL on Nexus with Android 4.3

renderscript-eats-openclImportant: this is only for Google-branded Nexus phones – other brands are free to do what they want, and they most-probably will.

Also important: this doesn’t mean that OpenCL on Android devices will be over, but that there is a bump in the road now Google tries to lock-in customers to their own APIs.

The big idea behind OpenCL is that higher level languages and libraries can be built on top of it. This is exactly what was done under Android: RenderScript Compute (a higher-level language)  was implemented using OpenCL for ARM Mali GPUs.

Having OpenCL drivers on Android has several advantages, such that OpenCL can directly be used on Android and that there is room for other high-level languages that have OpenCL as back-end. Especially the latter is what probably made Google decide to cripple the OpenCL-drivers.

Google seems to be afraid of competition, and that’s a shame, as competition is the key factor that drives innovation. The OpenCL community is not the only one complaining about Google’s intentions concerning Android. Read page 3 of that article to understand how Google is controlling handset-vendors and chip-makers.

Google’s statement

In February OpenCL drivers were discovered on two Nexus tablets using a MALI T604 GPU. Around the same time there was one public answer from Google employee Tim Murray (twitter) why Google did not want to choose OpenCL: Continue reading “Google blocked OpenCL on Nexus with Android 4.3”

An OpenCL-on-FPGAs presentation in a bar

What do you do when you want to explain OpenCL and FPGAs and OpenCL-on-FPGAs to a beer drinking crowd in just 15 minutes? Well, you simply can’t go deep into the matter. On a Thursday evening, 5 November 2015,  I was standing on a chair for a beer-loving group of Hackers and Founders with my laser-powered presenter, trying not to loose everybody. It was not the first time I stood on that particular chair – some years ago I presented about OpenCL-on-GPUs.

Below you find the full presentation – feel free to use and change the slides for yourself (PDF here).

Do you want us to present OpenCL, accelerators or performance engineering in a talk tailored for your audience? Just give us a call.

OpenCL integer rounding in C

Square_rounding
Square pant rounding can simply be implemented with “return (NAN);“.

Getting about the same code in C and OpenCL has lots of advantages, when maximum optimisations and vectors are not needed. One thing I bumped into myself was that rounding in C++ is different, and decided to implement the OpenCL-functions for rounding in C.

The OpenCL-page for rounding describes many, many functions with this line:

destType convert_destType<_sat><_roundingMode>(sourceType)

So for each sourceType-destType combination there is a set of functions: 4 rounding modes and an optional saturation. Easy in Ruby to define each of the functions, but takes a lot more time in C.

The 4 rounding modes are:

Modifier Rounding Mode Description
_rte Round to nearest even
_rtz Round towards zero
_rtp Round toward positive infinity
_rtn Round toward negative infinity

The below pieces of code should also explain what the functions actually do.

Round to nearest even

This means that the numbers get rounded to the closest number. In case of 3.5 and 4.5, they both round to the even number 4. Thanks for Dithermaster, for pointing out my wrong assumption and clarifying how it should work.

inline int convert_int_rte (float number) {
   int sign = (int)((number > 0) - (number < 0));
   int odd = ((int)number % 2); // odd -> 1, even -> 0
   return ((int)(number-sign*(0.5f-odd)));
}

I’m sure there is a more optimal implementation. You can fix that in Github (see below).

Round to zero

This means that positive numbers are rounded up, negative numbers are rounded down. 1.6 becomes 1, -1.6 also becomes 1.

inline int convert_int_rtz (float number) {
   return ((int)(number));
}

Effectively, this just removes everything behind the point.

Round to positive infinity

1.4 becomes 2, -1.6 becomes 1.

inline int convert_int_rtp (float number) {
   return ((int)ceil(number));
}

Round to negative infinity

1.6 becomes 1, -1.4 becomes 2.

inline int convert_int_rtp (float number) {
   return ((int)floor(number));
}

Saturation

Saturation is another word for “avoiding NaN”. It makes sure that numbers are between INT_MAX and INT_MIN, and that NaN returns 0. If not used, the outcome of the function can be anything (-2147483648 in case of convert_int_rtz(NAN) on my computer). Saturation is more expensive, so therefore it’s optional.

inline float saturate_int(float number) {
  if (isnan(number)) return 0.0f; // check if the number was already NaN
  return (number>MAX_INT ? (float)MAX_INT : number

Effectively the other functions become like:

inline int convert_int__sat_rtz (float number) {
   return ((int)(saturate_int(number)));
}

Doubles, longs and getting started.

Yes, you need to make functions for all of these. But you could ofcourse also check out the project on Github (BSD licence, rudimentary first implementation).

You’re free to make a double-version of it.

Verify your OpenCL and CUDA kernels online for race conditions

gpuverifyGPUVerify is a tool for formal analysis of GPU kernels written in OpenCL and CUDA. The tool can prove that kernels are free from certain types of defect, such as data races and bugs. This is quite useful feedback for any GPU-programmer.

Below you find a online version of the tool (please don’t break it!). Play around and test your kernels. Be aware the number of groups is the global worksize divided by local worksize.

For demo-purposes some values have been pre-filled with a simple kernel – press “Check my OpenCL kernel” to find the results. Did you expect this from this kernel? Can you explain the result?

After the LEAP-conference I’ll extend this article – till then I’m too time-limited. For now I wanted to share the online version with you, especially with the people who will attend the tutorial at LEAP. Be sure to check out the GPUVerify website and paper to learn more about this fantastic tool! Continue reading “Verify your OpenCL and CUDA kernels online for race conditions”

Copyright

All content, media, theme and blogs are copyright 2010-2012 StreamHPC and Vincent Hindriksen, unless otherwise stated. For questions about using material for your own business, blog or personal usage, please contact us to ask for permission. We protect our copyrights by any means necessary.

OpenCL is a trademark of Apple Computers Inc.

We work a lot with open source software, such as WordPress and Eclipse. The brochures are created with Inkscape and svgslides. We believe that the base of innovation should be for everybody, so everybody can build on top of that. That’s why you get all the information from the blog for free, as sharing information could give us necessary information back in return.

Used photos

Many photos link to the origin and sometimes tell a story or show the webpage of an artist. The images bellow are not linked, as they are used in the slider.

All other photos and images are bought from paid services: 123RF and Big Stock Photo. Please contact us if you would like a to know the link of a certain photo or image.

Online Tutorials are here

46188854 - beautiful smiling female student using online education service. young woman looking in laptop display watching training course and listening it with headphones. modern study technology concept
Online training

We’re going online with our presentations and tutorials. This makes it easy to reach more people and make our trainings more flexible.

We’re starting with short introductory trainings, but we have bigger plans. Keep an eye on our events (shared on Twitter, LinkedIn, this blog and the newsletter) to see what the offerings are. And you’re very welcome to join!

On 4 October (new date) there will be an OpenCL 101 of two hours for free. Target timezone is East-America and Europe.

Agenda Online OpenCL 101

  • Introductions (20 minutes)
    • StreamHPC
    • GPUs and paralellism
    • OpenCL
  • By example: Getting started with OpenCL (30 minutes)
  • By example: Porting a simple program to OpenCL (30 minutes)
  • Q&A in parallel (30 minutes). Ask us any question, for instance:
    • General OpenCL.
    • OpenCL on GPUs.
    • OpenCL on FPGAs.
    • What algorithms work well with GPUs, CPUs and FPGAs.
    • StreamHPC services.
  • The next steps (5 minutes).
  • Closing words (5 minutes).

Read more here…

Tutorial server

You can already test if the tutorial server works for you by looking around in our demo room. The tutorial itself will be in another room. Use your own name and password “ap“.

The room linked to this resource is not configured correctly.

See you soon!

Why this new AMD FirePro Cluster is important for OpenCL

FirePro cluster
FirePro S9150 cluster

Then it hit the doormat:

AMD is proud to collaborate with ASUS, the Frankfurt Institute for Advanced Studies, (FIAS) and GSI to support such important physics and computer science research,” said David Cummings, senior director and general manager, professional graphics, AMD. “This installation reaffirms AMD’s leading role in HPC with the implementation of the AMD FirePro S9150 server GPUs in this three petaFLOPS supercomputer cluster. AMD and ASUS are enabling OpenCL applications for critical science research usage for this cluster. We’re committed to building our HPC leadership position in the industry as a foremost provider of computing applications, tools and technologies.

You read more here and the official news here.

Why is this important?

It could be that there is more flops for the same price, as AMD hardware is cheaper? Nice, but secondary.

That it runs OpenCL? We like that, but from a broader perspective this is not the most important.

It is important because it creates more diversity in the world of HPC. Currently there are a few XeonPhi clusters and only one big AMD FirePro S10000 cluster. The rest is NVidia Tesla or CPU only. With more AMD clusters the HPC market is democratised. That means that more software will be written in vendor-neutral software like OpenCL (with high-level software/libraries on top), and prices of HPC accelerators will not be kept high.

How to further democratise the HPC world?

We started with porting Gromacs to OpenCL, and we will continue to port large projects to OpenCL. This software will simply run on XeonPhi, Tesla and FirePro with just little porting time, reducing costs in many ways. We can not do it alone, but together we can. Start by telling us which software needs to be ported from OpenMP to OpenCL or OpenMP 4, or from CUDA to OpenCL. And if you are porting open source software to OpenCL, drop us a line for free advice and help with testing the software.

And the best you can do to break the monopoly of CUDA, is to simply buy AMD or Intel hardware. The price difference is enough to buy lots of extra FLOPS and to pay for a complete porting project to OpenCL of a large application.

 

An introduction to Grid-processors: Parallella, Kalray and KnuPath

gridWe have been talking about GPUs, FPGAs and CPUs a lot, but there are more processors that can solve specific problems. This time I’d like you to give a quick introduction to grid-processors.

Grid-processors are different from GPUs. Where a multi-core GPU gets its strength from being able to compute lots of data in parallel (SIMD data-parallellism), a grid-processors is able to have each core do something differently (MIMD, task-based parallelism). You could say that a grid-processor is a multi-core CPU, where the number of cores is at least 16, and the cores are only connected to their neighbours. The difference with full-blown CPUs is that the cores are smaller (like the GPU) and thus use less power. The companies themselves categorise their processors as DSPs or Digital Signal Processors, but most popular DSPs only have 1 to 8 cores.

For the context, there are several types of bus-configurations:

  • single bus: like the PCIe-bus in a PC or the iMX6.
  • ring bus: like the XeonPhi till Knights Corner, and the Cell processor.
  • star bus: a central communication core with the compute-cores around.
  • full mesh bus: each core is connected to each core.
  • grid bus: all cores are connected to their direct neighbours. Messages hop from core to core.

Each of them have their advantages and disadvantages. Grid-processors get great performance (per Watt) with:

  • video encoding
  • signal processing
  • cryptography
  • neural networks

Continue reading “An introduction to Grid-processors: Parallella, Kalray and KnuPath”

Computer Vision

Face_detectionComputing demands in computer vision are high, and often real-time processing with low latency is desirable. Computer vision can greatly benefit from parallelization as higher processing speeds can improve object recognition rates while FPGA solutions may reduce energy demands or support the perception of lag-free processing. At StreamHPC, we have supported several customers in optimizing their software to work on a lower power budget and on a higher speed. We can support you in dedicated solutions based on GPUs or FPGAs to meet your demands.

Improving FinanceBench for GPUs Part II – low hanging fruit

We found a finance benchmark for GPUs and wanted to show we could speed its algorithms up. Like a lot!

Following the initial work done in porting the CUDA code to HIP (follow article link here), significant progress was made in tackling the low hanging fruits in the kernels and tackling any potential structural problems outside of the kernel.

Additionally, since the last article, we’ve been in touch with the authors of the original repository. They’ve even invited us to update their repository too. For now it will be on our repository only. We also learnt that the group’s lead, professor John Cavazos, passed away 2 years ago. We hope he would have liked that his work has been revived.

Link to the paper is here: https://dl.acm.org/doi/10.1145/2458523.2458536

Scott Grauer-Gray, William Killian, Robert Searles, and John Cavazos. 2013. Accelerating financial applications on the GPU. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). Association for Computing Machinery, New York, NY, USA, 127–136. DOI:https://doi.org/10.1145/2458523.2458536

Improving the basics

We could have chosen to rewrite the algorithms from scratch, but first we need to understand the algorithms better. Also, with the existing GPU-code we can quickly assess what are the problems of the algorithm, and see if we can get to high performance without too much effort. In this blog we show these steps.

Continue reading “Improving FinanceBench for GPUs Part II – low hanging fruit”

PathScale ENZO

My todo-list gets too large, because there seem so many things going on in GPGPU-world. Therefore the following article is not really complete, but I hope it gives you an idea of the product.

ENZO was presented as the alternative to CUDA and OpenCL. In that light I compared them to Intel Array Building Blocks a few weeks ago, not that their technique is comparable in a technical way. PathScale’s CTO mailed me and tried to explain what ENZO really is. This article consists mainly of what he (Mr. C. Bergström) told me. Any questions you have, I will make sure he receives them.

ENZO

ENZO is a complete GPGPU solution and ecosystem of tools that provide full support for NVIDIA Tesla (kernel driver, runtime, assembler, code generation, front-end programming model and various other things to make a developer’s life easier).

Right now it supports the HMPP C/Fortran programming model. HMPP is an open standard jointly developed by CAPS and PathScale. I’ve mentioned HMPP before, as it can translate Fortran and C-code to OpenCL, CUDA and other languages. ENZO’s implementation differs from HMPP by using native front-ends and does hardware optimised code generation.

You can learn ENZO in 5 minutes if you’ve done any OpenMP-like programming in the past. For example the Fortran-code (sorry for the missing indenting):

!$hmpp simple codelet, target=TESLA1

subroutine add(n, a, b, c)
implicit none
integer, intent(in) :: n
real, intent(in) :: a(n), b(n)
real, intent(out) :: c(n)
integer :: i

do i=1, n
if (a(i) > 5) then
c(i) = 1
else
c(i) = 2
endif
enddo

end subroutine add

subroutine test
integer, parameter :: n=10
real :: a(n), b(n), c(n)
integer :: i

do i=1, n
a(i) = i
b(i) = i
enddo

!$hmpp simple callsite
call add(n, a, b, c)
end subroutine test

This is somewhat different we know from OpenCL, mostly because we don’t need a specific kernel. This is because with just a few hints, the compiler does a lot for you. Like in OpenMP you tell the compiler with directives/pragmas which parts you want to be parallelised. More explanation can be found in the user manual [PDF]. You can try it out yourself for free if you have a Tesla-card; future versions of ENZO will support more architectures.

Hello

Welcome to the webpage of Stream HPC. We’re a company in Europe that work on solving the most difficult HPC problems with emphasis on scaling to GPUs and clusters. We have built up experience in speeding up software, designing performance oriented architectures, writing maintainable low-level code, selecting the best hardware for the job, and building benchmarks. Above all, we’re a customer oriented company, as we want our clients to feel in control, while we do that heavy lifting.

The company is multi-cultural and designed to be a safe space for everybody of our team – from LBGT+ to Asperger’s, we focus on making our differences our strengths. As you can read in the job self-assessment, we have 4 main strengths:

  • CPU development: algorithms, low-level code, architectures for CPU-based software. This includes clusters.
  • GPU development: algorithms, low-level code, architectures for GPU-based software. This includes graphics programming
  • Problem-solving: get from full understanding to full exploration quickly.
  • Self-managed teams: we don’t hire managers, but provide frameworks.

Our customers are all around the world, but especially North-America, West-Europe and East-Asia. We have built many high performance software that run from edge-computers to super-computers. See “What we do” for examples.

Our offices are in:

  • Amsterdam
  • Budapest
  • Barcelona

If you want to know more, feel free to get in contact.

See this page for Netherlands/Belgium, Hungary or Spain.

We more than halved the FPGA development time by using OpenCL

fast-fpga
A flying FPGA board

Over the past year we developed and fine-tuned a project setup for FPGA development that is much faster than any other method, including other high-level languages for making FPGA-based systems.

How we did it

OpenCL makes it easy to use the CPU and GPU and their tools. Our CPU and GPU developers would design software with FPGAs in mind, after which the FPGA developer took over and finalised the project. As we have expertise in the very different phases of such project, we could be much more effective than when sticking to traditional methods.

The bonus

It also works on CPU and GPU. It has to be said, that the code hasn’t been fully optimised for CPUs and GPUs – this can be done in a separate project. In case a decision has to be made on which hardware to use, our solution has the least risk and the most answers.

Our Unique Selling Points

For the FPGA market our USPs are clear:

  • We outperform traditional FPGA development companies in time-to-market and price.
  • We can discuss problems on hardware level, software level and algorithm level. This contrasts with traditional FPGA houses, where there are less bridges.
  • Our software also works on CPUs and GPUs for no additional charge.
  • The latencies of the resulting project are very comparable.

We’re confident we can make a difference in the FPGA market. If you want more information or want to discuss, feel free to contact us.

nVidia’s CUDA vs OpenCL Marketing

Please read this article about Microsoft and OpenCL, before reading on. The requested benchmarking is done by the people of Unigine have some results on differences between the three big GPGPU-APIs: part I, part II and part III with some typical results.
The following read is not about the technical differences of the 3 APIs, but more about the reason behind why alternate APIs are being kept maintained while OpenCL does the trick. Please comment, if you think OpenCL doesn’t do the trick

As was described in the article, it is important for big companies (such as Microsoft and nVidia) to defend their market-share. This protection is not provided through an open standard like OpenCL. As it went with OpenGL  – which was sort of replaced by DirectX to gain market-share for Windows – now nVidia does the same with CUDA. First we will sum up the many ways nVidia markets Cuda and then we discuss the situation.

The way nVidia wanted to play the game, was soon very clear: it wanted to market CUDA to be seen as the better alternative for OpenCL. And it is doing a very good job.

The best way to get better acceptance is giving away free information, such as articles and courses. See Cuda’s university courses as an example. Also sponsoring helps a lot, so the first main-stream oriented books about GPGPU discussed Cuda in the first place and interesting conferences were always supported by Cuda’s owner. Furthermore loads of money is put into an expensive webpage with very complete information about GPGPU there is to find. nVidia does give a choice, by also having implemented OpenCL in its drivers – it does just not have big pages on how-to-learn-OpenCL.

AMD – having spent their money on buying ATI, could not put this much effort in the “war” and had to go for OpenCL. You have to know, AMD has faster graphics-cards for lower prices than nVidia at the moment; so based on that, they could become the winners on GPGPU (if that was to only thing to do). Intel saw the light too late and is even postponing their High-end GPU, the Larrabee. The only help for them is that Apple demands to have OpenCL on nVidia’s drivers – but for how long? Apple does not want strict dependency on nVidia, since it i.e. also has a mobile market. But what if all Apple-developers create their applications on CUDA?

Most developers – helped by the money&support of nVidia – see that there is just little difference between Cuda and OpenCL and in case of a changing market they could translate their programs from one to the other. For now a demand to have a high-end videocard of nVidia can be rectified, a card which actually many people have or easily could buy within the budget of their current project. The difference between Cuda and OpenCL is comparable with C# and Java – the corporate tactics are also the same. Possibly nVidia will have better driver-support for Cuda than OpenCL and since Cuda does not work on AMD-cards, the conclusion is that Cuda is faster. There can then be a situation that AMD and Intel have to buy Cuda-patents, since OpenCL does not have the support.

We hope that OpenCL will stay the main-stream GPGPU-API, so the battle will be on hardware/drivers and support of higher-level programming-languages. We do really appreciate what nVidia already has done for the GPGPU-industry, but we hope they will solely embrace OpenCL for the sake of the long-term market-development.

What we left out of this discussion is Microsoft’s DirectCompute. It will be used by game-developers in the Windows-platform, which just need physics-calculations. When discussing the game-industry, we will tell more about Microsoft’s DirectX-extension.
Also come back and read our upcoming article about FPGAs + OpenCL, a combination from which we expect a lot.

What is Khronos as of today?

The Khronos Group is the organization behind APIs like OpenGL, Vulkan and OpenCL. Over one hundred companies are a member and decide together what your next year phone, camera, computer or media device will be capable of.

We’re at the right, near the bottom.

We work most with OpenCL, but you probably noticed we work with OpenGL, Vulkan and SPIR too. Currently they have the following APIs:

  • COLLADA, a file-format intended to facilitate interchange of 3D assets
  • EGL, an interface between Khronos rendering APIs such as OpenGL ES or OpenVG and the underlying native platform window system
  • glTF, a file format specification for 3D scenes and models
  • OpenCL, a cross-platform computation API.
  • OpenGL, a cross-platform computer graphics API
  • OpenGL ES, a derivative of OpenGL for use on mobile and embedded systems, such as cell phones, portable gaming devices, and more
  • OpenGL SC, a safety critical profile of OpenGL ES designed to meet the needs of the safety-critical market
  • OpenKCam, Advanced Camera Control API
  • OpenKODE, an API for providing abstracted, portable access to operating system resources such as file systems, networks and math libraries
  • OpenMAX, a layered set of three programming interfaces of various abstraction levels, providing access to multimedia functionality
  • OpenML, an API for capturing, transporting, processing, displaying, and synchronizing digital media
  • OpenSL ES, an audio API tuned for embedded systems, standardizing access to features such as 3D positional audio and MIDI playback
  • OpenVG, an API for accelerating processing of 2D vector graphics
  • OpenVX, Hardware acceleration API for Computer Vision applications and libraries
  • OpenWF, APIs for 2D graphics composition and display control
  • OpenXR, an open and royalty-free standard for virtual reality and augmented reality applications and devices
  • SPIR, a intermediate compiler target for OpenCL and Vulkan
  • StreamInput, an API for consistently handling input devices
  • Vulkan, a low-overhead computer graphics API
  • WebCL, a JavaScript binding to OpenCL within a browser
  • WebGL, a JavaScript binding to OpenGL ES within a browser on any platform supporting the OpenGL or OpenGL ES graphics standards

Too few people understand that the organization is very unique, as the biggest processor vendors are discussing collaborations and how to move the market, while they’re normally the fiercest competitors. Without Khronos it would have been a totally different world.

First Khronos Chapter meeting in Amsterdam: WebGL/OpenGL

highres_225764012.jpeg

Thursday 13 February 2014 the first Khronos meetup in will take place. We expect a small group, so the location will be cozy and there will be enough time to talk with a beer. First round is on me, admission is free.

Goal is to learn about open media-standards from Khronos and others. So when OpenCV is discussed, we’ll also talk OpenVX. The target group is programmers and Indy developers who are interested in creating multi-OS and multi-device software.

Program

I am very thrilled to tell that Ton Roosendaal of the Blender Foundation will talk about the releationship between his Blender and Khronos OpenGL.

Second Maarten and Jurjen of ThreeDee Media will talk about WebGL, from a technical and a market view. Is WebGL ready for prime-time?

Then you can show your stuff. For that I’ll bring a good laptop with Windows 8.1 and Ubuntu 13.10 64.

Prepare for Meetup today!

See the Meetup-page for more information. See you there!