Cancelled: StreamHPC at Mosaic3DX in Cambridge, UK

Posted by Vincent Hindriksen on 5 September 2013

Update: we are very sorry to tell that due to a deadline in a project we were forced to cancel Vincent’s talk.

StreamHPC will be at Mosaic3DX in Cambridge, UK, on 30+31 October. The brand new conference managed to get big names on-board, I’m happy to be amongst. Mosaic3DX describes itself as:

an international event comprising a conference, an exhibition, and opportunities for networking. Our intended audience are users as well as developers of Imaging, Visualisation, and 3D Digital Graphics systems. This includes researchers in Science and Engineering subjects, Digital Artists, as well as Software Developers in different industries.

Continue reading “Cancelled: StreamHPC at Mosaic3DX in Cambridge, UK” →

AMD OpenCL Presentation as OpenDocument

Posted by Vincent Hindriksen on 23 May 2011 with 1 Comment

You remember AMD’s OpenCL University Kit? It was for universities and completely written in PPTX. (For people who are on university: PPTX is a undocumented document-form which claims to be open and actually works well with an editor/viewer of only one vendor). So I took the freedom to convert all documents to ODF, so anybody can open them.

Download it here: AMD OpenCL University Kit as ODF.

It has 13 chapters, covering all the basics you need to know for further study. Say “thanks AMD” and enjoy!

The OpenCL event of the year: IWOCL 2014 – Bristol, UK, 12 & 13 May

Posted by Vincent Hindriksen on 13 December 2013

Khronos has supported and organised for the second time the International Workgroup on OpenCL (IWOCL, pronounced as “eye-wok-ul”). Last year the event took place at Georgia Tech, Atlanta, Georgia, in the United States. This year the event will be held in Europe: Bristol University, Bristol, England, UK.

IWOCL 2013 Presentations

Last year there was a varying programme:

Porting a Commercial Application to OpenCL: A Case Study
Demonstrating Performance Portability of a Custom OpenCL Data Mining Application to the Intel Xeon Phi Coprocessor
Parallelization of the Shortest Path Graph Kernel on the GPU
OpenCL-based Approach to Heterogeneous Parallel TSP Optimization
clMAGMA: High Performance Dense Linear Algebra with OpenCL
Multi-Architecture ISA-Level Simulation of OpenCL
Optimizing OpenCL Applications on the Intel Xeon Phi

You can see and download these presentations here. This year the organisation tries to offer a equally exciting programme.

Workshop means it’s an active event

It’s all about sharing, but not just by letting you sit and listen. Below you’ll find some of the options.

Present your work

Did you use OpenCL in your software or research? You are very welcome to present your experience and results. IWOCL is the premier forum for the presentation and discussion of new designs, trends, algorithms, programming models, software, tools and ideas for OpenCL.

Abstract Submission Deadline: Friday 31 January, 2014

It can be in the form of:

Research paper
Technical presentation
Workshops and Tutorial
Poster

(StreamHPC’s Vincent Hindriksen is on the Conference Sessions Committee)

Communicate with the workgroup

20-P1020816 — Khronos booth at SC13 – some you will see again at IWOCL

The OpenCL workgroup likes to communicate with OpenCL’s users. IWOCL provides a formal channel for community feedback to the Khronos Group’s OpenCL workgroup. This is one of the best moments to be heard, discuss a hack/bug or share a great idea that should be in the next version of OpenCL.

Meet OpenCL developers and enthusiasts

During the breaks, social events and during presentations, you can discuss all your ideas and thoughts on on-topic and off-topic subjects, or you can also join existing talks.

If you are new into compute acceleration, you’ll find many people who are willing to explain what it does and add their personal view.

Test-drive software

We will bring some hardware, on which you can test your kernels. (We’ll put more info about this later!)

Sponsor and present your product

There will be booths available for the sponsors, where you can show your product to the public.

Stay up to date on the event

We will try keep you up-to-date as much as possible, but IWOCL has some channels to keep you informed:

We’ll put on a link when tickets are ready to be sold.

Let others know you plan to be on the event by saying hi in the comments.

Hope to see you there!

AMD Hawaii power-management fix on Linux

Posted by Vincent Hindriksen on 4 February 2015 with 1 Comment

The new Hawaii-based GPUs from AMD (Radeon R9 2xx, FirePro W9100 and Firepro S9150) have a lot of improvements, one being a new OverDrive 6 (AMD’s version of NVIDIA GPU Boost). Problem is that it’s not supported yet in the Linux drivers and you will get too low performance – it probably will be solved in the next version. Luckily there is od6config, made by Jeremi M Gosney.

Do the below steps to get the GPU at normal speed.

Download the zip or tar.gz from http://epixoip.github.io/od6config/ and unpack.
Go to the directory where you unpacked the archive.
run:
```
make
```
run:
```
sudo make install
```
check if it’s needed to fix the power management:
```
od6config --get clocks,temp,fan
```
if the values are too low, run:
```
od6config --autofix --set power=10
```
check if it worked:
```
od6config --get clocks,temp,fan
```

Only OverDrive6 devices are set, devices using OverDrive5 will be ignored.

The PowerTune of 10 was what we found convenient for us, but you might find better values for your case. There are several more options, which are on the homepage of 0d6config. You need to run “od6config –autofix –set power=10” on each reboot.

Remember it’s third party software, so no guarantees to you and no “you killed my GPU” to us.

Faster Development Cycles for FPGAs

normal-vs-opencl-fpga-flow — The time-difference between the normal and OpenCL flow is large. The final product is as fast and efficient.

VHDL and Verilog are not the right tools when it comes to developing on FPGAs fast.

It is time-consuming. If the first cycle takes 3 months, then each subsequent cycle easily takes 2 weeks. Time is money.
Porting or upgrading a design from one FPGA device to another is also time-consuming. This makes it essential to choose the final FPGA vendor and family upfront.
Dual-platform development on CPU and FPGA needs synchronisation. The code works on either the CPU or the FPGA, which makes the functional tests made for the CPU-version less trustworthy.

Here is where OpenCL comes in.

Shorter development cycles. Programming in OpenCL is normally much faster than in VHDL or Verilog. If you are porting C/C++ code onto FPGA the development cycles will be dramatically shorter. Think weeks instead of months – as this news article explains. This means a radically reduced investment as well as providing time for architectural exploration.
OpenCL works on both CPUs and FPGAs, so functional tests can be run on either. As a bonus the code can be optimised for GPUs, within a short time-frame.
The performance is equal to VHDL and Verilog, unless FPGA-specific optimisations are used, such as vector-widths not equal to a power of two.
Vendor Agnostic solution. Porting to other FPGAs takes considerably less time and the compiler solves this problem for you.
Both Xilinx and Altera have OpenCL compilers. Altera were the first to come out with an OpenCL offering and have a full SDK, which is an add-on to Quartus II. Xilinx also have a stand-alone OpenCL development environment solution called SDAccel.

Support for OpenCL is strong by both Altera and Xilinx

Both vendors suggest OpenCL to overcome existing FPGA design problems. Altera suggest to use OpenCL to speed-up the process for existing developers. So OpenCL is not a third party tool, you need to trust separately.

OpenCL allows a user to abstract away the traditional hardware FPGA development flow for a much faster and higher level software development flow – Altera

Xilinx suggests that OpenCL can enable companies without the needed developer resources to start working with FPGAs.

Teams with limited or no FPGA hardware resources, however, have found the transition to FPGAs challenging due to the RTL (VHDL or Verilog) development expertise needed to take full advantage of these devices. OpenCL eases this programming burden – Xilinx

Why choose StreamHPC?

There are several reasons to choose letting us to do the porting and protoyping of your product.

We have the right background, as our team consists of CPU, GPU and FPGA developers. Our code is therefore designed with easy porting in mind.
Our costs are lower than having the product done in Verilog/VHDL.
We give guarantees and support for our products on all platforms the product is ported on.
We can port the final OpenCL code to Verilog/VHDL, keeping the same performance. In case you don’t trust a high-level language, we have you covered.
Optionally you can get both the code and a technical report with a detailed explanation of how we did it. So you can learn from this and modify the code yourself.
You get free advice on when (and not) to use OpenCL for FPGAs.

There are three ways to get in contact quickly:

call: +31 854865760 (European office hours)

e-mail: contact@streamhpc.com

Fill in this form – mention when you want to be called back (possible outside normal office hours):

[contact_form]

Want to read more?

We wrote about OpenCL-on-FPGAs on our blog in the previous years.

Why use OpenCL on FPGAs? (2014)
Starting with Altera OpenCL-on-FPGAs (2013)
OpenCL on Altera FPGAs (2012)

Master+PhD students, applications for two PRACE summer activities open now

Posted by Vincent Hindriksen on 24 January 2017

PRACE is organising two summer activities for Master+PhD students. Both activities are expense-paid programmes and will allow participants to travel and stay at a hosting location and learn about HPC:

The 2017 International Summer School on HPC Challenges in Computational Sciences
The PRACE Summer of HPC 2017 programme

The main objective of this programme is to enable HiPEAC member companies in Europe to have access to highly skilled and exceptionally motivated research talent. In turn, it offers PhD students from Europe a unique opportunity to experience the industrial research environment and to work on R&D projects solving real problems.

Below explains both programmes in detail. Continue reading “Master+PhD students, applications for two PRACE summer activities open now” →

Academic hackatons for Nvidia GPUs

Posted by Vincent Hindriksen on 6 April 2019

Are you working with Nvidia GPUs in your research and wish Nvidia would support you as they used to 5 years ago? This is now done with hackatons, where you get one full week of support, to get your GPU-code improved and your CPU-code ported. Still you have to do it yourself, so it’s not comparable to services we provide.

To start, get your team on a decision to do this. It takes preparation and a clear formulation of what your goals are.

When and where?

It’s already April, so some hackatons have already taken place. For 2019, these are left where you can work on any language, from OpenMP to OpenCL and from OpenACC to CUDA. Python + CUDA-libraries is also no problem, as long as the focus is Nvidia.

Continue reading →

The magic of clGetKernelWorkGroupInfo

Posted by Vincent Hindriksen on 22 October 2015

It’s not easy to get the available private memory size – actually it’s impossible to get this information directly from the device/drivers, using the OpenCL API. This can only be explained after you dive deep into clGetKernelWorkGroupInfo – the function that tells you how well your kernel fits on the device. It is strange this function is not often discussed.

Memory sizes

CL_KERNEL_LOCAL_MEM_SIZE

Returns the amount of local memory, in bytes, being used by a kernel (per work-group). Use CL_DEVICE_LOCAL_MEM_SIZE to find out the maximum.

CL_KERNEL_PRIVATE_MEM_SIZE

Returns the minimum amount of private memory, in bytes, used by each work-item in the kernel.

Work sizes

CL_KERNEL_GLOBAL_WORK_SIZE

This answers the question “What is the maximum value for global_work_size argument that can be given to clEnqueueNDRangeKernel?”. The result is of type size_t[3].

CL_KERNEL_WORK_GROUP_SIZE

The is the same for local_work_size. The kernel’s resource requirements (register usage etc.) are used, to determine what this work-group size should be.

CL_KERNEL_COMPILE_WORK_GROUP_SIZE

If __attribute__((reqd_work_group_size(X, Y, Z))) is used, then (X, Y, Z) is returned, else (0, 0, 0).

CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

It returns a performance-hint: if the total number of work-items is a multiple of this number, then you’ll get good results. So no more remembering 32 or 64 for specific GPUs, but simply kick in a call to this function.

Combined with clDeviceInfo’s CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS, you can fine-tune your workgroup-size in case you need the group-size to be as large as possible.

Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks

Posted by Vincent Hindriksen on 29 November 2017 with 4 Comments

When CUDA kept having a dominance over OpenCL, AMD introduced HIP – a programming language that closely resembles CUDA. Now it doesn’t take months to port code to AMD hardware, but more and more CUDA-software converts to HIP without problems. The real large and complex code-bases only take a few weeks max, where we found that solved problems also made the CUDA-code run faster.

The only problem is that CUDA-libraries need to have their HIP-equivalent to be able to port all CUDA-software.

Here is where we come in. We helped AMD make a high-performance Pseudo Random Generator (PRNG) Library, called rocRAND. Random number generation is important in many fields, from finance (Monte Carlo simulations) to Cryptographics, and from procedural generation in games to providing white noise. For some applications it’s enough to have some data, but for large simulations the PRNG is the limiting factor. Continue reading “Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks” →

NVIDIA beta-support for OpenCL 2.0 works on Linux too

Posted by Jakub Szuppe on 6 March 2017

In the release notes for 378.66 graphics drivers for Windows (February 2017), NVIDIA officially spoke about supporting OpenCL 2.0 for the first time. Unfortunately, this is partial support only and, as NVIDIA said, these new [OpenCL 2.0] features are available for evaluation purposes only.

We did our own tests on a GTX 1080 on Windows and could confirm that for Windows the green team is halfway there. NVIDIA still has to implement pipes, enable non-uniform work-group sizes (this happens when in ND-range global_work_size is not divisible by the local_work_size), and fix a few bugs in device side enqueue.

Today we decided to test out NVIDIA latest driver (378.13) for 64-bit Linux and check its support for OpenCL 2.0.

NVIDIA, OpenCL 2.0 and Linux

Just like on Windows, our GTX 1080 reports that it is an OpenCL 1.2 devices. It is understandable since support for OpenCL 2.0 is only in beta stage. In the following table you’ll find an overview of the 2.0 functions supported by this Linux driver.

OpenCL 2.0 feature	Supported	Notes
SVM	Yes	Only coarse-grained SVM is supported. Fine-grained SVM (optional feature) is not.
Device side enqueue	Partially. Surprisingly, it works better than on Windows	Almost OpenCL programs with device side queue we have tested work. Some advanced examples with multi-level device side kernel enqueuing and/or CLK_ENQUEUE_FLAGS_WAIT_WORK_GROUP fail. When using device side queue, it's only possible to use 1D nd-range with uniform work groups (or without specifying local size). 2D and 3D nd-ranges don't work.
Work-group functions	Yes
Pipes	No	Pipe functions are defined in libOpenCL.so in 378.13 drivers, but using them cause run-time errors.
Generic address space	Yes
Non-uniform work-groups	No
C11 Atomics	Partially	Using atomic_flag_* functions cause an CL_BUILD_ERROR error.
Subgroups extension	No

The host-side functions clSetKernelExecInfo(), clCreateSamplerWithProperties() and clCreateCommandQueueWithProperties() are also present and working.

As you can see, the support for OpenCL 2.0 on Linux is almost exactly the same as on Windows. But in contrast with the Windows-drivers, we were able to successfully compile and run several more kernels that use device side queue. It may indicate that this feature is being actively developed and maybe in future drivers it will work much better – for both Linux and Windows.

What you can do to make it better

As NVIDIA only adds new functionality to OpenCL driver when requested, it is very important that they receive these requests. So when you or your employer is a paying customer, do keep requesting the features you need. Know that NVIDIA knows that lacking required functionality will be bad for their sales.

All OpenCL SDKs now in our Knowledge Base

Posted by Vincent Hindriksen on 31 October 2012 with 3 Comments

For who hasn’t seen the latest addition to our knowledge base, we have added a list of all (almost) available OpenCL-SDKs. You can find it in the menu under “Knowledge Base” -> “SDKs…“.

This list shows how important OpenCL is getting, as developers now can write compute-intensive parallel software on CPUs, GPUs, ARM-based accelerators and even FPGAs. This growth of OpenCL-devices is very exciting and important news, and that’s why it has got its own section on the site.

The the current list is (in random order):

AMD GPUs & CPUs
ZiiLabs ARM Tablet
Altera FPGA board – available in Q2/Q3 2013
Adapteva Parallella board – available in Q2/Q3 2013
Intel CPUs
Samsung Exynos 5 board – available in December 2012
IBM POWER-processor

Currently looking into:

Intel Xeon Phi
Nintendo Wii U dev
Sony Playstation 4 Orbis
Vivante
Xilinx
NVidia GPUs
Qualcomm

The SDK of NVIDIA is on the second list, what you maybe did not unexpected. We have to wait until they have put their official statement on what they are going to do with CUDA and OpenCL.

While you are there, also check the other parts of the Knowledge Base:

What is… -> Explanations of terminology. Put your requests in a comment.
Event&Talks -> A list of events which StreamHPC attends, give talks at and helps organise. Interesting for both managers and engineers.
Self Study – The part of the site most visited after the blog. This is for the engineers who want to start learning programming GPUs.

This section will be updated and extended continuously with information not available anywhere else.

StreamHPC has been in the OpenCL business since 2010 as one of the few. We have been the most visible and known OpenCL-specialist ever since.

Double the performance on AMD Catalyst by tweaking subgroup operations

Posted by Jakub Szuppe on 22 March 2017

AMD’s hardware was only used for less than half in case of scan operations in standard OpenCL 2.0.

OpenCL 2.0 added several new built-in functions that operate on a work-group level. These include functions that work within sub-groups (also known as warps or wavefronts). The work-group functions perform basic parallel patterns for whole work-groups or sub-groups.

The most important ones are reduce and scan operations. Those patterns have been used in many OpenCL software and can now be implemented in a more straightforward way. The promise to the developers was that the vendors now can provide better performance using none or very little local memory. However, the promised performance wasn’t there from the beginning.

Recently, at StreamHPC we worked on improving performance of certain OpenCL kernels running specifically on AMD GPUs where we needed OpenGL-interop and thus chose Catalyst-drivers. It turned out that work-group and sub-group functions did not give the expected performance on both Windows and Linux. Continue reading “Double the performance on AMD Catalyst by tweaking subgroup operations” →

Intel promotes OpenCL as THE heterogeneous compute solution

Posted by Vincent Hindriksen on 25 March 2014 with 4 Comments

At Intel they have CPUs (Xeon, Ivy Bridge), GPUs (Isis) and Accelerators (Xeon Phi). OpenCL enables each processor to be used to the fullest and they now promote it as such. Watch the below video and see their view on why OpenCL makes a difference for Intel’s customers.

This is important, because till recently Intel was more pushing OpenMP and their proprietary solutions. I think it has something to do with the specialised processors that can be programmed with OpenCL, such as DSPs and FPGAs. Intel has always made generic processors that solve problems best for most. Customers of OpenCL happen to be the ones that could not be served with generic processors and preferred FPGAs and DSPs, before they tried GPUs. By showing that Intel can do OpenCL, they show they are a trustworthy partner to handle the problems in a few years, when the current problems can be handled by Intel processors.

Of course the Xeon Phi is also a good reason. The latest drivers have shown a huge improvement in performance, and that has increased Intel’s confidence in OpenCL for sure.

At StreamHPC we are very happy that Intel now openly promotes OpenCL and invests in it – this will increase trust in the programming language.

A small side-note. The differences between the Windows-drivers and Linux-drivers are somewhat vague: under Linux, the CPU is visible, but not supported officially. This makes development of multi-processor software not as straightforward as discussed in the video. Probably this will be more extensive in the future, as Intel only officially supports OpenCL on a processor when it’s very stable.

MediaTek’s partners deliver OpenCL on their phones

Posted by Vincent Hindriksen on 6 August 2015

Several Chinese phones bring OpenCL to millions of users, as MediaTek offers their drivers to all phone vendors who use their (recent) chipsets.

Mediatek said that you just need a phone with one of the below chipsets and you can run your OpenCL-app, as they provide the driver-stack with the hardware to their customers. I’ve added a few phone names, but there is no guarantee OpenCL drivers are actually there. So be on the safe side and don’t buy the cheapest phone, but a more respected China-brand. Contact us if you got a phone with the chipset that doesn’t work – then I’ll contact Mediatek. Share you experience with the chipset in the comments.

In case you want to use the phone for actual use, be sure it supports your 4G frequencies. Also check this Gizchina article on the below chipsets. There are more MediaTek-chipsets that support OpenCL, but not openly – they prefer to focus on their latest 64-bit series.

Important note on conformance: Mediatek is an adopter and does conform for a few processors. Of the ones listed below, only MT6795 is certain to have official support. Continue reading “MediaTek’s partners deliver OpenCL on their phones” →

Adapteva Parallella board

A Paralllella board that has been delivered. — The 16-core Parallella board

[infobox type=”information”]

Need a Parallella programmer? Hire us!

[/infobox]

Adapteva is the creator of Parallella developer-boards – a board with an OpenCL-programmable grid-processor with FPGA. The $99 board consists out of:

[list1]

A board with a 16-core Epiphany-III
Full Epiphany SDK
OpenCL compiler based on libcoprthr.
A few code-samples (total number unknown).

[/list1]

A few of the features:

[list1]

32 GFLOPS peak performance
2 Watt TDP (chip only, whole board unkown)
USB plug-and-play ready
Remote access support

[/list1]

Information on their OpenCL compiler stack can be found here. Latest news can be received via Twitter/ParallellaBoard, or via their community forums.

The 64-core Epiphany-IV chip is also available since Q3 2014. More info later.

OpenCL mini buying guide for X86

Posted by Vincent Hindriksen on 17 January 2011 with 6 Comments

Developing with OpenCL is fun, if you like debugging. Having software with support for OpenCL is even more fun, because no debugging is needed. But what would be a good machine? Below is an overview of what kind of hardware you have to think about; it is not in-depth, but gives you enough information to make a decision in your local or online computer store.

Companies who want to build a cluster, contact us for information. Professional clusters need different hardware than described here.

Continue reading “OpenCL mini buying guide for X86” →

Let’s meet at ISC in Frankfurt

Posted by Vincent Hindriksen on 13 June 2016

Vincent Hindriksen will be walking around at ISC from 20 to 22 June. With me I bring our latest brochure, some examples of great optimisations and some Dutch delicacies. Also we will also have some exciting news with an important partner – stay tuned!

It will be a perfect time to discuss how StreamHPC can help you solve tough compute problems. Below is a regularly updated schedule of my time at ISC.

Get in contact to schedule a meeting.

If you’d like to talk technologies and bits&bytes, we’re trying to make a get-together – date&time TBD.

VectorFabrics: 2014 will be parallel

Posted by Vincent Hindriksen on 9 January 2014

Toolmaker VectorFabrics sent 9 predictions for this year in their newsletter. I’d like to share it with you.

Nine predictions for 2014 that prove the programming landscape is changing

It is not hard to predict that this year will see a lot of activity around multicores and manycores. 2014 will be the year that software has to catch up with highly concurrent hardware. So we expect to see some major changes in how people view multicore programming:

Neither Intel, AMD nor Qualcomm releases any new single core processors in 2014. Therefore, it is less and less acceptable to release pure sequentially operating applications.

Intel releases a 15-core Xeon. On a typical 4-socket motherboard your OS sees 120 available cores. OpenMP is the preferred programming paradigm on such a platform for data-intensive shared-memory calculations. You have to deal with performance bottlenecks, including Amdahl’s law, cache performances and memory bandwidth issues.

Next to ARM big.LITTLE systems, 2014 sees the first true octacore cell phones and tablets. At that point, it becomes painfully clear that applications need changes to benefit from so many cores. Both true octacore and big.LITTLE processors see very little adoption in mobile devices as long as software that can benefit is missing.

At least one major mobile phone vendor loses market share because their hardware may be good, but the software (especially the web browser) cannot utilize the hardware to the max.

Both the XBox One and PlayStation 4 feature AMD Jaguar octacore processors with GP-GPUs. Very few games will using all the compute power, and customers will wonder why to upgrade as the performance difference to their existing console is not so different.

Two great open standards, OpenACC and OpenMP see a nice boost in adoption thanks to upcoming support in the latest open source compilers. Clang 3.5 features OpenMP 4.0 support. In addition GCC 4.9 also receives OpenACC support.

In the mobile space, offloading to GP-GPUs is hot as new architectures from Qualcomm Adreno 420, Nvidia Tegra K1 and Imagination PowerVR Series 6 will each allow offloading. Managing and programming the offloading will remain a problem: OpenCL? OpenACC? CUDA? RenderScript?

Major desktop applications jump on the compute-offloading bandwagon to win performance, using either OpenCL or OpenACC

Programmability of the next-gen Intel Xeon Phi (Knights Landing) will raise some eyebrows. This 2015 chip will have 72 Atom Silvermont cores with local memory, cache and up to 384 GB shared memory. This is outside the comfort zone of most programmers.

You guessed right: their tool has to do with parallel programming.

What are your predictions for 2014?

Learning both OpenCL and CUDA

Posted by Vincent Hindriksen on 23 September 2010 with 2 Comments

Be sure to read Taking on OpenCL where I’ve put my latest insights – also for CUDA.

The two¹ “camps” OpenCL and CUDA both claim you should first learn their language first, after which the other would be easy to learn. I’m from the OpenCL-camp, so I say you should learn OpenCL first, but with a strong emphasis on hardware-architecture understanding. If I had chosen for CUDA I would have said the opposite, so in other words it does not matter which you do first. But psychology tells us that you probably like the first language more since there is where you discovered the magic; also most people do not like to learn a second language which is much alike and does not add a real difference. Most programmers just want to get the job done and both camps know that. Be aware of that.

NVIDIA is very good in marketing their products, AMD has – to say it modest – a lower budget for GPGPU-marketing. As a programmer you should be aware of this difference.

The possibilities of OpenCL are larger than those of CUDA, because of task-parallel programming and support for far more different architectures. At the other side CUDA is much more user-friendly and has a lot of convenience built-in.

Continue reading “Learning both OpenCL and CUDA” →

Differences from OpenCL 1.1 to 1.2

Posted by Vincent Hindriksen on 19 November 2011 with 3 Comments

This article will be of interest if you don’t want to read the whole new specifications [PDF] for OpenCL 1.2.

As always, feedback will be much appreciated.

After many meetings with the many members of the OpenCL task force, a lot of ideas sprouted. And every 17 or 18 months a new version comes out of OpenCL to give form to all these ideas. You can see totally new ideas coming up and already brought outside in another product by a member. You can also see ideas not appearing at all as other members voted against them. The last category is very interesting and hopefully we’ll see a lot of forum-discussion soon what should be in the next version, as it is missing now.

With the release of 1.2 there was also announced that (at least) two task forces will be erected. One of them will target integration in high-level programming languages, which tells me that phase 1 of creating the standard is complete and we can expect to go for OpenCL 2.0. I will discuss these phases in a follow-up and what you as a user, programmer or customer, can expect… and how you can act on it.

Another big announcement was that Altera is starting to support OpenCL for a FPGA-product. In another article I will let you know everything there is to know. For now, let’s concentrate on the actual differences in this version software-wise, and what you can do with it. I have added links to the 1.1 and 1.2 man-pages, so you can look it up.

Continue reading “Differences from OpenCL 1.1 to 1.2” →

Sheets GPGPU-day 2012 online

Posted by Vincent Hindriksen on 11 June 2013

GPGPU Day Speakers_small.2 — Photos made by Cyrille Favreau

Better now than never. It has almost been a year, but finally they’re online: the sheets of the GPGPU-day Amsterdam 2012.

You can find the sheets at http://www.platformparallel.com/nl/gpgpu-day-2012/abstracts/ – don’t hotlink the files, but link to this page. The abstracts should introduce the sheets, but if you need more info just ask them in the comments here.

PDFs from two unmentioned talks:

Peter Michielse – Welcome and an introduction to GPGPU – GPGPUday12
What GPUs have brought in just a few sheets. This was the welcome by main sponsor SURFsara.
Gert-Jan van den Braak – Introduction to GPGPU and GPU-architectures – GPGPUday12
An introduction to GPGPU and CUDA.

I hope you enjoy the sheets. On 20 June the second edition will take place – see you there!