General articles on technical subjects.

ARM forums to find useful information for OpenCL development

Posted by Vincent Hindriksen on 11 November 2013

OpenCL on ARM is hot, but it just is getting started. Currently it takes some time to find needed information about the processors concerning

For OpenCL-discussions the best place is the Khronos OpenCL board. So where can you go when you want to ask questions specifically on ARM-based GPUS like MALI, PowerVR, Adreno and Vivante?

ARM’s new community site for all

ARM just launched the Connected Community (ARM CC). It is the place to connect to, when you have general information-needs of ARM-IP, such as ARM MALI, Cortex A9 and Cortex A15.

And here is how ARM themselves explains this initiative on one slide:

Be sure to connect to StreamHPC. We hope this will indeed be the central place for the whole ecosystem, including Imagination, Qualcomm and Vivante.

ARM MALI

The MALI Developer Center has its forums on ARM Connected Community.

Imagination PowerVR

The graphics-section of their developer forums seems to be the best place.

(Not @ ARM CC)

Qualcomm Adreno

Qualcomm has dev-forums too and has a section called Mobile Gaming & Graphics Optimization (Adreno™).

(Not @ ARM CC)

Vivante

Vivante does not have a forum, but Freescale does. The i.MX forums seem to be the best place to ask your questions.

@ARM CC

Others

Where do find a good source to find and share interesting information on mobile GPUs? Share it with the others via the comments – chances increase your questions gets answered when more people visit the forums.

Guest-blog: Accelerating sequential machine vision algorithms with OpenMP and OpenCL

Posted by Vincent Hindriksen on 11 November 2013 with 1 Comment

Guest-blogger Jaap van de Loosdrecht wants to share his thesis with you. He leads the Centre of Expertise in Computer Vision department at NHL University of applied sciences and is the owner of his own company, and still managed to study and write a MSc-thesis. The thesis is interesting because it extensively compares OpenCL with OpenMP, especially chapters 7 an 8.

For those who are interested, my thesis “Acceleration sequential machine vision algorithms using commodity parallel hardware” is available at www.vdlmv.nl/thesis.

Keywords: Computer Vision, Image processing, Parallel programming, Multi-core CPU, GPU, C++, OpenMP, OpenCL.

Many other related research projects have considered using one domain specific algorithm to compare the best sequential implementation with the best parallel implementation on a specific hardware platform. This work was distinctive because it investigated how to speed up a whole library by parallelizing the algorithms in an economical way and execute them on multiple platforms.This work has:

Examined, compared and evaluated 22 programming languages and environments for parallel computing on multi-core CPUs and GPUs.

Chosen to use OpenMP as the standard for multi-core CPU programming and OpenCL for GPU programming.

Re-implemented a number of standard and well-known algorithms in Computer Vision using both standards.

Tested the performance of the implemented parallel algorithms and compared the performance to the sequential implementations of the commercially available software package VisionLab.

Evaluated the test results with a view to assessing:

Appropriateness of multi-core CPU and GPU architectures in Computer Vision.

Benefits and costs of parallel approaches to implementation of Computer Vision algorithms.

Using OpenMP it was demonstrated that many algorithms of a library could be parallelized in an economical way and that adequate speedups were achieved on two multi-core CPU platforms. With a considerable amount of extra effort, OpenCL was used to achieve much higher speedups for specific algorithms on dedicated GPUs.

At the end of the project, the choice of standards was re-evaluated including newly emerged ones. Recommendations are given for using standards in the future, and for future research and development.

Algorithmic improvements are suggested for Convolution and Connect Component Labelling.

Your feedback and/or questions are welcome.

If you put comments here, I’ll make sure Jaap van de Loosdrecht will get to know and answer your questions on the subjects discussed in his thesis.

All the members of the OpenCL working group 2013

Posted by Vincent Hindriksen on 7 November 2013

In the below list are the members of the OpenCL workgroup as of November 2013.

We can expect small changes each year, but this is close to the actual state. I need the rest of Q4 to finalise all the info – any help is appreciated.

This list has also been compiled in 2010, and you can see several differences. If the company has an SDK available, there is a link. That is a whole difference with the last list – this one is much more concrete. Continue reading “All the members of the OpenCL working group 2013” →

Reducing downtime with OpenCL… Ever thought of that?

Posted by Vincent Hindriksen on 3 November 2013

Something that creates extra value for Open CL is the flexibility with which it runs on an important variety of hardware. A famous strategy is running the code on CPUs to find data-races and debug the code more easily. Another is to develop on GPUs and port to FPGAs to reduce the development-cycles.

But there’s one, quite important, often forgotten: replacement of faulty hardware. You can blame the supplier, or even Murphy if you want, but what is almost certain is that there’s a high chance of facing downtime precisely when the hardware cannot be replaced right-away.

Fail to plan is planning to fail

To limit downtime, there are a few options:

Have a good SLA in place for 24/7 hardware-replacement.
Have spare-hardware in stock.
Have over-capacity on your compute-servers.

But the problem is that all three are expensive in some form if you’re not flexible enough. If you use professional accelerators like Intel XeonPhi, NVidia Tesla or AMD FirePro, you risk having unexpected stock shortage at your supplier.

With OpenCL the hardware can be replaced by any accelerator, whereas with vendor-specific solutions this is not possible.

Flexibility by OpenCL

I’d like to share with you one example how to introduce flexibility in your hardware-management, but there are various others which are more tailored to your requirements.

To detect faulty hardware, you can think of a server with three GPUs and let selected jobs be run by all three – any hardware-problem will be detected and pin-pointed. Administrating which hardware has done which job completes the mechanism. Exactly this can be used to replace faulty hardware with any accelerator: let the replacement-accelerator run the same jobs as the other two as an acceptance-test.

If you need your software to be optimised for several accelerators, you’re in the right place. We can help you with both machine and hand optimizations. That’s a plan that cannot fail!

Products using OpenCL on ARM MALI are coming

Posted by Vincent Hindriksen on 26 October 2013

The past year you might not have heard much from OpenCL-on-ARM, besides the Arndale developer-board. You have heard just a small portion of what has been going on.

Yesterday the (Linux) OpenCL-drivers for the Chromebook (which contains an ARM MALI T604) the have been released and several companies will launch products using OpenCL.

Below are a few interviews with companies who have built such products. This will give an idea of what is possible on those low-power devices. To first get an idea of what this MALI T604 GPU can do if it comes to OpenCL, here a video from the 2013-edition of the LEAP-conference we co-organised.

Understand that the whole board takes less than ~11.6 Watts – that is including the CPU, GPU, memory , interconnects, networking, SD-card, power-adapter, etc. Only a small portion of that is the GPU. I don’t know the exact specs as this developer-board was not targeted towards energy-optimisation goals. I do know this is less than the 225 Watts of a discrete GPU alone.

Interviews with ARM partners Continue reading “Products using OpenCL on ARM MALI are coming” →

Basic Concepts: Writing OpenCL code for single and double precision

Posted by Vincent Hindriksen on 17 October 2013

Support for double precision floating-point type double in OpenCL kernels requires an extension. AMD provides cl_khr_fp64 for newer high-edn hardware, but also a non-fully compliant cl_amd_fp64 extension for other hardware. NVIDIA and Intel support the cl_khr_fp64, so no exceptions need to be made for those drivers.

The code you see bellow these lines is based on a page you can find on Bealto and it was written by Eric Bainville. I added extra typedefs, removed a constant and added DOUBLE_SUPPORT_AVAILABLE for easier fallback.

#if CONFIG_USE_DOUBLE

#if defined(cl_khr_fp64)  // Khronos extension available?
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#define DOUBLE_SUPPORT_AVAILABLE
#elif defined(cl_amd_fp64)  // AMD extension available?
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#define DOUBLE_SUPPORT_AVAILABLE
#endif

#endif // CONFIG_USE_DOUBLE

#if defined(DOUBLE_SUPPORT_AVAILABLE)

// double
typedef double real_t;
typedef double2 real2_t;
typedef double3 real3_t;
typedef double4 real4_t;
typedef double8 real8_t;
typedef double16 real16_t;
#define PI 3.14159265358979323846

#else

// float
typedef float real_t;
typedef float2 real2_t;
typedef float3 real3_t;
typedef float4 real4_t;
typedef float8 real8_t;
typedef float16 real16_t;
#define PI 3.14159265359f

#endif

A macro is defined by the OpenCL C compiler for each available extension, which is cl_khr_fp64 in this example. This macro can be tested to enable the extension with #pragma OPENCL EXTENSION cl_khr_fp64 : enable.

Now, you need to use the defined constant(s) and real_t, real2_t types instead of float or double. The definition of CONFIG_USE_DOUBLE is passed as compilation option to clBuildProgram to make the switch between double and single precision. If there is no double-support, it falls back to single precision.

Enjoyed this post? Share it!

Basic Concepts: out of resources with clEnqueueReadBuffer

Posted by Vincent Hindriksen on 15 October 2013

“Oops! The best way to learn, when you love trial-on-error”™

In the series “Basic Concepts” various basics of GPGPU and OpenCL are discussed. This time we go into a typical one: when an error does not imply the actual problem. It is therefore good to have an overview of all errors with their descriptions.

When you get an out-of-resources error or when you get a crash when using clEnqueReadBuffer, you are sort of left in the dark. What does it mean? And how can you solve it?

Typical: one driver crashes/segfaults and another one gives this error.

Officially the error is defined as:

CL_OUT_OF_RESOURCES if there is a failure to allocate resources required by the OpenCL implementation on the device.

Which means that there can more reasons than the device being out of resources. A better name would have been CL_RESOURCE_ALLOCATION_ERROR. It can be thrown by various functions, but we focus on this one function. It cannot by thrown by clEnqueWriteBuffer, as that depends on the limits of the host.

Finding out the cause

The oldest trick of ‘m all: try to use the CPU and check what the error is then. CPUs are great to detect data-races (correct on CPU, not on GPU) and CPUs are a bit more stable when you have buggy code plus have more RAM. Be sure to install both Intel’s and AMD’s drivers.

Calling clFinish at each line, helps you pinpoint the actual line it happens or to get an error instead of a crash.

Then you have the following options:

9 out of 10 times you have a pointer problem at the host or are writing out of bounds. So you try to write to an illegal memory location, or try to cram in an 35×35 float* into 10x10x10 float* space (buffer-overflow). Double check the host memory-sizes, and if the host-pointers are correct.
You read out of bounds on the device. Double-check the used memory-sizes.
You might have hit a limit of the driver, such as the 5s timeout if the NVidia card is also being used as a display. Rule out you have used up all memory by using both smaller and larger(!) objects. Also note down memory object sizes over time. Be sure you clean up non-used objects. Fragmentation of device-memory can also be the problem it eventually goes wrong.

The last one I have not encountered myself, but found on the Nvidia forums. I recently had this error (type 1), because I had introduced clear naming in the code I was working on. When I introduced the standard ‘h_‘ and ‘d_‘ prefixes for all variables, I immediately found the cause.

Hope it has helped you understand the resource allocation error. If you found other reasons, please share via the comments and I’ll add it. If you have requests what to discuss in this series, let me know via Twitter or the comments.

Help write the book “Numerical Computations with GPUs”

Posted by Vincent Hindriksen on 24 September 2013

There is an interesting book coming up: “Numerical Computations with GPUs” – a book explaining various numerical algorithms with code in CUDA or OpenCL.

edit: At the moment there are 21 articles to be included in the book.

edit 2: book should be out in July

edit 3: Order via Springer International or Amazon US.
TOC:

Accelerating Numerical Dense Linear Algebra Calculations with GPUs.
A Guide to Implement Tridiagonal Solvers on GPUs.
Batch Matrix Exponentiation.
Efficient Batch LU and QR Decomposition on GPU.
A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems.
Sparse Matrix-Vector Product.
Solving Ordinary Differential Equations on GPUs.
GPU-based integration of large numbers of independent ODE systems.
Finite and spectral element methods on unstructured grids for flow and wave propagation problems.
A GPU implementation for solving the Convection Diffusion equation using the Local Modified SOR method.
Pseudorandom numbers generation for Monte Carlo simulations on GPUs: Open CL approach.
Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA.
GPU-Accelerated computation routines for quantum trajectories method.
Monte Carlo Simulation of Dynamic Systems on GPUs.
Fast Fourier Transform (FFT) on GPUs.
A Highly Efficient FFT Using Shared-Memory Multiplexing.
Increasing parallelism and reducing thread contentions in mapping localized N-body simulations to GPUs.

Continue reading “Help write the book “Numerical Computations with GPUs”” →

Mobile Processor OpenCL drivers (Q3 2013) + rating

Posted by Vincent Hindriksen on 5 September 2013 with 1 Comment

For your convenience: an overview of all ARM-GPUs and their driver-availability. Please let me know if something is missing.

I’ve added a rating, to friendly push the vendors to get to at least an 7. Vendors can contact me, if they think the rating does not reflect reality.

ZiiLabs

SDK-page@StreamHPC

Drivers can be delivered by Creative, when you pledge to order ZMS-40 processors. Mail us for a contact at Creative. Minimum order size is unknown.

This device can therefore only be used for custom devices.

[usr=4]

Vivante

SDK-page@StreamHPC

They are found on public devices. Android-drivers that work on FreeScale processors are openly available and can be found here.

[usr=8]

Even though the processors are not that powerfull, Vivante/FreeScale offers the best support.

Qualcomm

SDK-page@StreamHPC

Drivers are not shipped on devices, according various sources. Android-drivers are in the SDK-drivers though, which can be found here.

[usr=7]

Rating will go up, when drivers are publicly shipped on phones/tablets.

ARM MALI

Samsung SDK-page@StreamHPC

There are lots of problems around the drivers for Exynos, which only seem to work on the Arndale-board when the LCD is also ordered.Android-drivers can be downloaded here.

[usr=5]

All is in execution – half-baked drivers don’t do it. It is unclear whom to blame, but it certainly has had influence on creating a new version of Exynos 5, the octa.

Imagination Technologies

SDK-page@StreamHPC

TI only delivers drivers under NDA. Samsung has one board coming up with OpenCL 1.1 EP drivers.

[usr=5]

Rating will go up, when drivers from TI come available without obstacles, or Samsung delivers what they failed to do with the previous Exynos 5.

Exciting times coming up

Mostly because of a power-struggle between Google and the GPU-vendors, there is some hesitation to ship OpenCL drivers on phones and tablets. Unfortunately, Google’s answer to OpenCL RenderScript Compute, does not provide the needs wanted by developers. Google’s official answer is that it does not want fragmentation nor code that is optimised for a certain GPU. The interpreted answer is that Google wants vendor-lockin and therefore blocks the standard. Whatever the reason is, OpenCL is used as sword to show teeth who has a say about the future of Android – only the advertisement-company Google or also the group of named processor-makers and various phone/tablet-vendors?

In H2 2014 Nvidia will ship CUDA-drivers with their Tegra 5 GPUs, making the soap complete.

There are rumours Apple will intervene and will make OpenCL available on iOS. This would explain why there is put so much effort in showing OpenCL-results by Imagination and Qualcomm

And always keep a close watch on POCL, the vendor-independent OpenCL implementation.

[bordered_box border_color=” background_color=’#C1DAD6′]

Need a programmer for any of the above devices? Hire us!

[/bordered_box]

Cancelled: StreamHPC at Mosaic3DX in Cambridge, UK

Posted by Vincent Hindriksen on 5 September 2013

Update: we are very sorry to tell that due to a deadline in a project we were forced to cancel Vincent’s talk.

StreamHPC will be at Mosaic3DX in Cambridge, UK, on 30+31 October. The brand new conference managed to get big names on-board, I’m happy to be amongst. Mosaic3DX describes itself as:

an international event comprising a conference, an exhibition, and opportunities for networking. Our intended audience are users as well as developers of Imaging, Visualisation, and 3D Digital Graphics systems. This includes researchers in Science and Engineering subjects, Digital Artists, as well as Software Developers in different industries.

Continue reading “Cancelled: StreamHPC at Mosaic3DX in Cambridge, UK” →

“That is not what programmers want”

Posted by Vincent Hindriksen on 14 August 2013 with 1 Comment

the-miracle-middle-colour2 — “*I think you should be more explicit here in step two*” (original print)

This post is part of the series Programming Theories, in which we discuss new and old ways of programming.

When discussing the design of programming languages or the extension of existing ones, the question What concepts can simplify the tasks of the programmer? always triggers lots of interesting debates. After that, when an effective solution is found, inventors are cheered, and a new language is born. Up ’till this point all seems ok, but the problem comes with the intervention of the status quo: C, C++, Java, C#, PHP, Visual Basic. Those languages want the new feature implemented in the way their programmers expect it. But this would be like trying to implement the advantages of a motorcycle into a car without paying attention to the adjustments needed by the design of the car.

I’m in favor of learning concepts instead of doing new things the old way… but only when the latter has proven to be better than the former. The lean acceptance of i.e. functional languages tells a lot about how it goes in reality (with great exceptions like LINQ). That brings a lot of trouble when moving to multi-core. So, how do we get existing languages to change instead of just evolve?

High Level Languages for Multi-Core

Let’s start with a quote from Edsger Dijkstra:

Projects promoting programming in “natural language” are intrinsically doomed to fail.

In other words: a language can be too high level. A programmer needs the language to be able to effectively micro-manage what is being done. We speak of concerns for a reason. Still, the urge to create the highest programming language is strong.

Don’t get me wrong. A high-level language can be very powerful once its concepts define both ways. One way concerns the developer: does the programmer understand the concept and the contract of the command or programming style being offered? The other concerns the machine: can it be effectively programmed to run the command, or could a new machine be made to do just that? This two-side contract is one of the reasons why natural languages are not fit for programming.

And we have also found out that binary programming is not fit for humans.

The cartoon refers to this gap between what programmers want and what computers want.

Continue reading ““That is not what programmers want”” →

AMD OpenCL Programming Guide August 2013 is out!

Posted by Vincent Hindriksen on 6 August 2013

AMD has just released an update to their AMD programming guide.

~~Download the guide (PDF) August version~~

Download the guide (PDF) November version

Download TOC (PDF)

For more optimisation guides, see the tutorials page of the knowledge base.

Chapter 1 OpenCL Architecture and AMD Accelerated Parallel Processing

1.1 Software Overview
1.1.1 Synchronization

1.2 Hardware Overview for Southern Islands Devices

1.3 Hardware Overview for Evergreen and Northern Islands Devices

1.4 The AMD Accelerated Parallel Processing Implementation of OpenCL Continue reading “AMD OpenCL Programming Guide August 2013 is out!” →

A list of Desktop GPU architectures

Posted by Vincent Hindriksen on 5 August 2013 with 7 Comments

UPDATED in February 2017

Some optimisation tricks work really well on one architecture, and are useless on others. And even with better drivers, the older architectures need some help. In other words, it helps to know what architecture the GPU has. Therefore you get some help from your friends at StreamHPC.

Below you’ll find a list of the architecture names of all OpenCL-capable GPU models of Intel, NVIDA and AMD. It does not contain the professional lines for now – first we are focusing on getting the general models right.

Understand it took a lot of time to gather the below information, and normally we share such information only with our clients.

Continue reading “A list of Desktop GPU architectures” →

Google blocked OpenCL on Nexus with Android 4.3

Posted by Vincent Hindriksen on 1 August 2013 with 8 Comments

Important: this is only for Google-branded Nexus phones – other brands are free to do what they want, and they most-probably will.

Also important: this doesn’t mean that OpenCL on Android devices will be over, but that there is a bump in the road now Google tries to lock-in customers to their own APIs.

The big idea behind OpenCL is that higher level languages and libraries can be built on top of it. This is exactly what was done under Android: RenderScript Compute (a higher-level language) was implemented using OpenCL for ARM Mali GPUs.

Having OpenCL drivers on Android has several advantages, such that OpenCL can directly be used on Android and that there is room for other high-level languages that have OpenCL as back-end. Especially the latter is what probably made Google decide to cripple the OpenCL-drivers.

Google seems to be afraid of competition, and that’s a shame, as competition is the key factor that drives innovation. The OpenCL community is not the only one complaining about Google’s intentions concerning Android. Read page 3 of that article to understand how Google is controlling handset-vendors and chip-makers.

Google’s statement

In February OpenCL drivers were discovered on two Nexus tablets using a MALI T604 GPU. Around the same time there was one public answer from Google employee Tim Murray (twitter) why Google did not want to choose OpenCL: Continue reading “Google blocked OpenCL on Nexus with Android 4.3” →

OpenCL 2.0 book on Indiegogo

Posted by Vincent Hindriksen on 25 July 2013

Edit: the project unfortunately did not get enough funding on Indiegogo

Launching a book takes a lot of effort. By using crowd funding, we hope to get the book be published much earlier and for a lower price.

~~[button text=”Pre-order via Indiegogo – only in August 2013″ url=”http://igg.me/at/opencl20manual” color=”orange” target=”_blank”]~~

What you’ll get

You will get the first OpenCL 2.0 book on market. Fully updated with the latest function-references and power-tips. Also usable for OpenCL 1.1/1.2, to help you write backward-compatible software.

Reference pages for quick access of all OpenCL function – available online and offline. This has nothing to do with Khronos reference pages of OpenCL 2.0, as this is a complete rewrite and redesign of the description of each function-definition.

Reference pages of functions

A lot of energy goes into completely revising the original OpenCL reference pages, to create real value for you. This is not just a small upgrade, but an alternative (and more complete) explanation of all the functions. Expect it to contain twice as much information.

Each function will be explained in a clear language with full explanation of background-knowledge and an example. If the function can be used in more contexts, more examples are given.

At one glance you can see what is new per OpenCL version. Also all functions are extensively tagged and grouped, so you can easily find similar functions.

Basic concepts and programming theories

Various new additions to the series of basic concepts and the series on programming theories will only be available in the book, not on the blog. These chapters will help you connect the dots and get a better overview of how OpenCL works.

This content is unique and not found anywhere else. It has its foundation in hundreds of articles and research papers, combined with the years of experience in the field as a developer and a trainer.

Hardware and Optimisation guide

An explanation of all OpenCL optimisation techniques. Including a guide how to use auto-tuning to find the best configurations for each optimisation.

How well does each optimisation work on the various architectures? The results of mini-benchmarks will give you a complete overview what helps and what not.

Tools & software

There are various tools out there – both open source and commercial. These tools make it easier to program more efficiently and faster. The top 10 of best OpenCL tools are described, even software not discussed before online.

For all contributors

Reference pages

You get access to the reference pages while I work on it. When finished, you also get a zip-file with html-files in times you don’t have access to internet. You will get updates for all 2.0 updates. You can give feedback at any time and with this you have influence on the direction the manual is going.

E-book

At all times you get a progress report with a TOC. When finished you’ll get the book sent as PDF. After some time of feedback, you’ll receive a new version. People who bought the print, will receive it with the second version.

All prices are including Dutch VAT.

Ways You Can Help

Have you supported this project? Thank you very much for your support!

Please also tell your friends and colleagues, and on Twitter, Facebook, LinkedIn!

Installing and using Portable Computing Language (PoCL)

Posted by Vincent Hindriksen on 8 July 2013 with 1 Comment

Update August’13: 0.8 has been released

PoCL stands for Portable Portable Computing Language and the goal is to make a full and open source implementation of OpenCL 1.2 for LLVM.

This is about installing and using PoCL on Ubuntu 64. If you want to put some effort to build it on Windows, you will certainly help the project. See also this TODO for version 0.8, if you want to help out (or want to know its current state). Not all functionality is implemented, but the project progresses using test-driven development – using the samples in the SDKs as a base.

Backends

They are eager for collaboration, so new backends can be added. For what I’ve seen this project is one of the best starts for new OpenCL-drivers. First because of the work that already has been done (implement by example), second because it’s an active open source project (continuous post-development), third because of the MIT-license (permits reuse within proprietary software). Here at StreamHPC we keep a close eye on the project.

On a normal desktop it has only one device and that’s the CPU. It has backends for several other types of CPUs (check ./lib/kernel in the source):

ARM
Cell SPU
Powerpc
Powerpc64
x86_64

Also the TCE libraries can be used as backend. The maturity of each backend differs.

More support is coming. For instance Radeon via the R600 project, but PoCL first needs to support LLVM 3.3 for that.

It is also a good start to build drivers for your own processor (contact us for letting us assist you in building such backend, see Supporting OpenCL on your own hardware for some related info)

Continue reading “Installing and using Portable Computing Language (PoCL)” →

Starting with Altera OpenCL-on-FPGAs

Posted by Vincent Hindriksen on 3 July 2013 with 2 Comments

Altera has been very busy adding resources and has kicked off the beginning of June with opening up their OpenCL-program for the general public.
Only Stratix V devices are supported, but that could change later.

Below are all pages and PDFs concerning OpenCL I’ve found while searching Altera’s website.

Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms

Altera wanted to know where they could compete with GPUs and CPUs. For a big company their comparisons are quite honest (for instance about their limited access-speed to memory), but they don’t tell everything – like the hours(!) of compilation-time. The idea is that you develop on a GPU and when it’s correct, you port the correctly working software to the FPGA.

If you don’t have any experience working with their FPGAs, best is to ask around.

Medical_OpenCL — Image taken from Altera website.

Continue reading “Starting with Altera OpenCL-on-FPGAs” →

OpenCL 1.2 support on NVIDIA GT 700M series?

Posted by Vincent Hindriksen on 21 June 2013

If the information on the Geforce-website (an official NVIDIA-website) is correct, then the GT 700M series supports OpenCL 1.2.

There were no references of OpenCL 1.2 functions in any driver.

If you have more info, let us know in the comments.

Thanks to Rahul Garg for the tip.

AMD vs NVIDIA – Two figures that can tell a whole story

Posted by Vincent Hindriksen on 16 June 2013 with 11 Comments

Update September ’13: AMD gets their new GPUs “Volcanic Islands” with GCN 2.0 out in October. For this reason the HD 7970’s price has dropped to €250. This shakes up some of the things described in this article.

Update June ’14: It has become clear that Titan is not a consumer device and should be categorised as a “Quadro for compute”. All consumer devices of both AMD and Nvidia show relatively low GFLOPS for dual precision.

Update July’14: Graphs updated with GTX Titan Z and R9 290X.

AMD/ATI has always had the fastest GPU out there. Yes, there were lots of times in which NVIDIA approached the throne, or even held the crown for a while (at least theoretically), but it was Radeon, at the end, the one who had the right claim.

Nevertheless, some things have changed:

AMD has focused more on the new architecture, making it easier to program while keeping the GFLOPS the same.
AMD bets on their A-series APU with integrated GPU.
NVIDIA has increased both memory bandwidth and GFLOPS at a steady pace.
NVIDIA has done the nitro-trick for double precision.

With NVIDIA GTX Titan (see three of them in the image), NVIDIA snatched victory from the jaws of defeat.

I’m not saying you should jump now to CUDA; there’s more than just GFLOPS. We should think also of costs and prevention of vendor-lockin. More particularly, I would like to show how unpredictable the market for accelerator-processors is.

Let’s take a look at the figures. Continue reading “AMD vs NVIDIA – Two figures that can tell a whole story” →

Sheets GPGPU-day 2012 online

Posted by Vincent Hindriksen on 11 June 2013

GPGPU Day Speakers_small.2 — Photos made by Cyrille Favreau

Better now than never. It has almost been a year, but finally they’re online: the sheets of the GPGPU-day Amsterdam 2012.

You can find the sheets at http://www.platformparallel.com/nl/gpgpu-day-2012/abstracts/ – don’t hotlink the files, but link to this page. The abstracts should introduce the sheets, but if you need more info just ask them in the comments here.

PDFs from two unmentioned talks:

Peter Michielse – Welcome and an introduction to GPGPU – GPGPUday12
What GPUs have brought in just a few sheets. This was the welcome by main sponsor SURFsara.
Gert-Jan van den Braak – Introduction to GPGPU and GPU-architectures – GPGPUday12
An introduction to GPGPU and CUDA.

I hope you enjoy the sheets. On 20 June the second edition will take place – see you there!

WebCL Widget for WordPress

Posted by Vincent Hindriksen on 5 June 2013

See the widget at the right showing if your browser+computer supports WebCL?

It is available under the GPL 2.0 license and based on code from WebCL@NokiaResearch (thanks guys for your great Firefox-plugin!)

Download from WordPress.org and unzip in /wp-content/plugins/. Or (better), search for a new plugin: “WebCL”. Feedback can be given in the comments.

I’d like to get your feedback what features you would like to see in the next version.

Continue reading “WebCL Widget for WordPress” →

Category: Technical

ARM’s new community site for all

ARM MALI

Imagination PowerVR

Qualcomm Adreno

Vivante

Others

Fail to plan is planning to fail

Flexibility by OpenCL

Interviews with ARM partners Continue reading “Products using OpenCL on ARM MALI are coming” →

Finding out the cause

ZiiLabs

Vivante

Qualcomm

ARM MALI

Imagination Technologies

Exciting times coming up

Need a programmer for any of the above devices? Hire us!

High Level Languages for Multi-Core

Table of Contents

Chapter 1 OpenCL Architecture and AMD Accelerated Parallel Processing

Google’s statement

What you’ll get

Reference pages of functions

Basic concepts and programming theories

Hardware and Optimisation guide

Tools & software

For all contributors

Reference pages

E-book

Ways You Can Help

Backends

Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms