OpenCL SPIR by example

Posted by Vincent Hindriksen on 27 December 2013 with 1 Comment

OpenCL SPIR (Standard Portable Intermediate Representation) is an intermediate representation for OpenCL-code, comparable to LLVM IL and HSAIL. It is a search for what would be a good representation, such that parallel software runs well on all kinds of accelerators. LLVM IL is too general, but SPIR is a subset of it. I’ll discuss HSAIL, on where it differs from SPIR – I thought SPIR was a better way to start introducing these. In my next article I’d like to give you an overview of the whole ecosphere around OpenCL (including SPIR and HSAIL), to give you an understanding what it all means and where we’re going to, and why.

Know that the new SPIR-V is something completely different in implementation, and we are only discussing the old SPIR here.

Contributors for the SPIR specifications are: Intel, AMD, Altera, ARM, Apple, Broadcom, Codeplay, Nvidia, Qualcomm and Xilinx. Boaz Ouriel of Intel is the pen-holder of the specifications and to no surprise Intel has had the first SPIR-compiler. I am happy to see Nvidia is in the committee too, and hope they don’t just take ideas for CUDA from this collaboration but finally join. Broadcom and Xilinx are new, so we can expect stuff from them.

For now, just see what SPIR is – as it can help us understand how the compiler work and write better OpenCL code. I used Intel’s offline OpenCL compiler for compiling the below kernel to SPIR can be done on the command line with: ioc64 -cmd=build -input=sum.cl -llvm-spir32=sum.ll (you need an Intel CPU to use the compiler).

[raw]

__kernel void sum(const int size, __global float * vec1, __global float * vec2){
  int ii = get_global_id(0);

  if(ii < size) vec2[ii] += vec1[ii];

}

[/raw]

There are two variations for generating SPIR-code: binary SPIR, LLVM-SPIR (both in 32 and 64 bit versions). As you might understand, the binary form is not really readable, but SPIR described in the LLVM IL language luckily is. Run ioc64 without parameters to see more options (Assembly, pure LLVM, Intermediate Binary).

Continue reading →

WebCL Widget for WordPress

Posted by Vincent Hindriksen on 5 June 2013

See the widget at the right showing if your browser+computer supports WebCL?

It is available under the GPL 2.0 license and based on code from WebCL@NokiaResearch (thanks guys for your great Firefox-plugin!)

Download from WordPress.org and unzip in /wp-content/plugins/. Or (better), search for a new plugin: “WebCL”. Feedback can be given in the comments.

I’d like to get your feedback what features you would like to see in the next version.

Continue reading “WebCL Widget for WordPress” →

AMD gDEBugger 6.2 for Linux

Posted by Vincent Hindriksen on 19 May 2012 with 1 Comment

The printf-funtion in kernels isn’t the solution to everything, so hence profilers and debuggers specially tailored for GPU-programming. On Windows there is a lot of choice, but mostly only if you have a paid version of Visual Studio. On Linux you have GDB, but that program is not really user-friendly for the GUI-lovers.

For AMD there is now gDEBugger again available for Linux. Again, as version 5.8 by Gremedy worked with Linux, after AMD bought the company it got Windows-only for version 6. A few weeks ago, 10 months after 6.0, Linux-binaries got back with version 6.2. It supports OpenCL 1.2, OpenGL 3.2 and quite some extensions. As only AMD is supported, later more on debugging OpenCL-applications on NVidia and Intel.

Installation is quite straightforward. For creating a menu-item, you’ll find an useful image in /opt/gDEBugger6.2.xxx/tutorial/images/.

Continue reading “AMD gDEBugger 6.2 for Linux” →

PDFs of Monday 19 September

Posted by Vincent Hindriksen on 19 September 2011

Already the fourth PDF-Monday. It takes quite some time, so I might keep it to 10 in the future – but till then enjoy! Not sure which to read? Pick the first one (for the rest there is not order).

Edit: and the last one, follow me on twitter to see the PDFs I’m reading. Reason is that hardly anyone clicked on the links to the PDFs.

I would like if you let others know in the comments which PDF you liked a lot.

Adding Physics to Animated Characters with Oriented Particles (Matthias Müller and Nuttapong Chentanez). Discusses how to accelerate movements of pieces of cloth attached to the bodies. Not time to read? There are nice pictures.

John F. Peddy’s analysis on the GPU market.

Hardware/Software Co-Design. Simple Solution to the Matrix Multiplication Problem using CUDA.

CUDA Based Algorithms for Simulating Cardiac Excitation Waves in a Rabbit Ventricle. Bioinformatics.

Real-time implementation of Bayesian models for multimodal perception using CUDA.

GPU performance prediction using parametrized models (Master-thesis by Andreas Resios)

A Parallel Ray Tracing Architecture Suitable for Application-Specific Hardware and GPGPU Implementations (Alexandre S. Nery, Nadia Nedjah, Felipe M.G. Franca, Lech Jozwiak)

Rapid Geocoding of Satellite SAR Images with Refined RPC Model. An ESA-presentation by Lu Zhang, Timo Balz and Mingsheng Liao.

A Parallel Algorithm for Flight Route Planning with CUDA (Master-thesis by Seçkîn Sanci). About the travelling salesman problem and much more.

Color-based High-Speed Recognition of Prints on Extruded Materials. Product-presentation on how to OCR printed text on cables.

Supplementary File of Sequence Homology Search using Fine-Grained Cycle Sharing of Idle GPUs (Fumihiko Ino, Yuma Munekawa, and Kenichi Hagihara). They sped up the BOINC-system (Folding@Home). Bit vague what they want to tell, but maybe you find it interesting.

Parallel Position Weight Matrices Algorithms (Mathieu Giraud, Jean-Stéphane Varré). Bioinformatics, DNA.

GPU-based High Performance Wave Propagation Simulation of Ischemia in Anatomically Detailed Ventricle (Lei Zhang, Changqing Gai, Kuanquan Wang, Weigang Lu, Wangmeng Zuo). Computation in medicine. Ischemia is a restriction in blood supply, generally due to factors in the blood vessels, with resultant damage or dysfunction of tissue

Per-Face Texture Mapping for Realtime Rendering. A Siggraph2011 presentation by Disney and NVidia.

Introduction to Parallel Computing. The CUDA 101 by Victor Eijkhout of University of Texas.

Optimization on the Power Efficiency of GPU and Multicore Processing Element for SIMD Computing. Presentation on what you find out when putting the volt-meter directly on the GPU.

NUDA: Programming Graphics Processors with Extensible Languages. Presentation on NUDA to write less code for GPGPU.

Qt FRAMEWORK: An introduction to a cross platform application and user interface framework. Presentation on the Qt-platform – which has great #OpenCL-support.

Data Assimilation on future computer architectures. The problems projected for 2020.

Current Status of Standards for Augmented Reality (Christine Perey1, Timo Engelke and Carl Reed). not much to do with OpenCL, but tells an interesting purpose for it.

Parallel Computations of Vortex Core Structures in Superconductors (Master-thesis by Niclas E. Wennerdal).

Program the SAME Here and Over There: Data Parallel Programming Models and Intel Many Integrated Core Architecture. Presentation on how to program the Intel MIC.

Large-Scale Chemical Informatics on GPUs (Imran S. Haque, Vijay S. Pande). Book-chapter on the design and optimization of GPU implementations of two popular chemical similarity techniques: Gaussian shape overlay (GSO) and LINGO.

WebGL, WebCL and Beyond! A presentation by Neil Trevett of NVidia/Khronos.

Biomanycores, open-source parallel code for many-core bioinformatics (Mathieu Giraud, Stéphane Janot, Jean-Frédéric Berthelot, Charles Delte, Laetitia Jourdan , Dominique Lavenier , Hélène Touzet, Jean-Stéphane Varré). A short description on the project http://www.biomanycores.org.

StreamHPC communications

StreamHPC on tour

We ported GROMACS from CUDA to OpenCL

Posted by Vincent Hindriksen on 1 November 2014 with 9 Comments

GROMACS does soft matter simulations on molecular scale. Let it fly.

GROMACS is an important molecular simulation kit, which can do all kinds of “soft matter” simulations like nanotubes, polymer chemistry, zeolites, adsorption studies, proteins, etc. It is being used by researches worldwide and is one of the bigger bio-informatics softwares around.

To speed up the computations, GPUs can be used. The big problem is that only NVIDIA GPU could be used, as CUDA was used. To make it possible to use other accelerators, we ported it to OpenCL. It took several months with a small team to get to the alpha-release, and now I’m happy to present it to you.

For who knows us from consultancy (and training) only, might have noticed. This is our first product!

We promised to keep it under the same open source license and that effectively means we are giving it away for free. Below I’ll explain how to obtain the sources and how to build it, but first I’d like to explain why we did it pro bono.

Why we did it

Indeed, we did not get any money (income or funds) for this. There have been several reasons, of which the below four are the most important.

The first reason is that we want to show what we can. Each project was under NDA and we could not demo anything we made for a customer. We chose for a CUDA package to port to OpenCL, as we notice that there is a trend to port CUDA-software to OpenCL (i.e. Adobe software).
The second reason is that bio-informatics is an interesting industry, where we would like to do more work.
Third reason is that we can find new employees. Joining the project is a way to get noticed and could end up in a job-proposal. The GROMACS project is big and needs unique background knowledge, so it can easily overwhelm people. This makes it perfect software to test out who is smart enough to handle such complexity.
Fourth is gaining experience with handling open source projects and distributed teams.

Therefore I think it’s a very good investment, while giving something (back) to the community.

Presentation of lessons learned during SC14

We just jumped in and went for it. We learned a lot, because it did not go as we expected. All this experience, we would like to share on SuperComputing 2014.

During SC14 I will give a presentation on the OpenCL port of GROMACS and the lessons learned. As AMD was quite happy with this port, they provided me a place to talk about the project:

“Porting GROMACS to OpenCL. Lessons learned”
SC14, New Orleans, AMD’s mini-theatre.
19 November, 15:00 (3:00 pm), 25 minutes

The SC14 demo will be available on the AMD booth the whole week, so if you’re curious and want to see it live with explanation.

If you’d like to talk in person, please send an email to make an appointment for SC14.

Getting the sources and build

It still has rough edges, so a better description would be “we are currently porting GROMACS to OpenCL”, but we’re very close.

As it is work in progress, no binaries are available. So besides knowledge of C, C++ and Cmake, you also need to know how to work with GIT. It builds on both Windows and Linux, and NVIDIA and AMD GPUs are the target platforms for the current phase.

The project is waiting for you on https://github.com/StreamHPC/gromacs.

The wiki has lots of information, from how to build, supported devices to the project planning. Please RTFM, before starting! If something is missing on the wiki, please let us know by simply reporting a new issue.

Help us with the GROMACS OpenCL port

We would like to invite you to join, so we can make the port better than the original. There are several reasons to join:

Improve your OpenCL skills. What really applies to the project is this quote:

Tell me and I forget.
Teach me and I remember.
Involve me and I learn.
Make the OpenCL ecosphere better. Every product that has OpenCL support, gives choice to the user what GPU to use (NVIDIA, AMD or Intel)
Make GROMACS better. It is already a large community and OpenCL-knowledge is needed now.
Get hired by StreamHPC. You’ll be working with us directly, so you’ll get to know our team.

What can you do? There is much you can do. Once you managed to build and run it, look at the bug reports. First focus is to get the failing kernels working – this is top priority to finalise phase 1. After that, the real fun begins in phase 2: add features and optimise for speed on specific devices. Since AMD FirePro is much better in double precision than Nvidia Tesla, it would be interesting to add support for double precision. Also certain parts of the code is done on the CPU, which have real potential to be ported to the GPU.

If things are not clear and obstruct you from starting, don’t get stressed and send an email with any question you have. We’re awaiting your merge request or issue report!

Special thanks

This project wasn’t possible without the help of many people. I’d like to thank them now.

The GROMACS team in Sweden, from the KTH Royal Institute of Technology.
- Szilárd Páll. A highly skilled GPU engineer and PhD student, who pro-actively keeps helping us.
- Mark Abraham. The GROMACS development manager, always quickly answering our various questions and helping us where he could.
- Berk Hess. Who helped answering the harder questions and feeding the discussions.
Anca Hamuraru, the team lead. Works at StreamHPC since June, and helped structure the project with much enthusiasm.
Dimitrios Karkoulis. Has been volunteering on the project since the start in his free time. So special thanks to Dimitrios!
Teemu Virolainen. Works at StreamHPC since October and has shown to be an expert on low-level optimisations.
Our contacts at AMD, for helping us tackle several obstacles. Special thanks go to Benjamin Coquelle, who checked out the project to reproduce problems.
Michael Papili, for helping us with designing a demo for SC14.
Octavian Fulger from Romanian gaming-site wasd.ro, for providing us with hardware for evaluation.

Without these people, the OpenCL port would never been here. Thank you.

Meet us in April

Posted by Vincent Hindriksen on 4 April 2016

9017503_m The coming month we’re travelling every week. This generates are a lot of opportunities where you can meet the StreamHPC team! For appointments, send an email to contact@streamhpc.com.

Meet us at ParallelCon (6 April 2016, Heidelberg, Germany). Besides the crash course (see below), we also have a talk on Vulkan.
Crash Course OpenCL @ ParallelCon (8 April 2016, Heidelberg, Germany). This is part of the conference – you can still buy tickets!
Meet us in Toronto (11 April 2016, Toronto, Canada). In Toronto for business, with time for appointments.
Meet us at IWOCL (19 April 2016, Vienna, Austria). The event-of-the-year for all OpenCL. So ofcourse we’re there.
Meet us in Grenoble (25 April 2016, Grenoble, France). For a training we’re there the whole week. On Thursday and Friday there is time for appointments.

We’re happy to talk business and about technology. Also giving presentations at your company is an option.

This information was previously communicated via the newsletter and on LinkedIn.

Intel’s answer to AMD and NVIDIA: the XEON Phi 5110P

Posted by Vincent Hindriksen on 12 November 2012 with 3 Comments

NOTE: there are many contradicting sources out there, so there are mistakes in this article. Please give me feedback via twitter, mail or comments, so all the info can be completed.

Yes, another post in the answer-to series. At SC12 Intel tries to steal away the show from the Tesla K20 and FirePro S10000.

After two years of waiting Intel finally comes with an accelerator-card: the Xeon Phi. Compare it if NVIDIA would have skipped the GTX 200 series and now has presented the GTX 500 series. Or maybe even the GTX 600 series – we cannot tell yet.

The Phi is not a compute-card as we know it. As you cannot do a 1-to-1 comparison between AMD GCN architecture and NVIDIA Kepler, neither can be easily compared to the Phi. But this article should give an idea on where it is positioned.

Continue reading “Intel’s answer to AMD and NVIDIA: the XEON Phi 5110P” →

SC14 Workshop Proposals due 7 February 2014

Posted by Vincent Hindriksen on 17 December 2013

Just to let you know that there should be even more OpenCL and related technologies on SC14

Are you interested in hosting a workshop at SC14?

Please mark your calendars as SC will be accepting proposals from 1 January – 7 February for independently planned full-, half-, or multi-day workshops.

Workshops complement the overall SC technical program. The goal is to expand the knowledge base of practitioners and researchers in a particular subject area providing a focused, in-depth venue for presentations, discussion and interaction. workshop proposals will be peer-reviewed academically with a focus on submissions that wil inspire deep and interactive dialogue in topics of interest to the HPC community.

For more information, please consult: http://sc14.supercomputing.org/program/workshops

Important Submission Info

Web Submissions Open: 1 January 2014
Submission Deadline: 7 February 2014
Notification of acceptance: 28 March 2014

Submit proposals via: https://submissions.supercomputing.org/
Questions: workshops@info.supercomputing.org

We’re thinking of proposing one too. Want to collaborate? Mail us! Don’t forget, to go to HiPEAC (20 January) and IWOCL (12 May) to meet us!

All the members of the OpenCL working group 2010

Posted by Vincent Hindriksen on 11 April 2010

(If you’re searching for companies who offer OpenCL-products and services, please visit OpenCL:Pro)

You probably have heard AMD is on the OpenCL working group of Khronos; but there are many more and they possibly all have plans to use it. Here is an overview, so you can make your own conclusions about the future that lays ahead. Is your company on “the list”?

We’re specially interested in the less known companies, so most information is about the companies you and us possibly have not heard from before. We’ve made assumptions what the companies use OpenCL for, so we need your feedback if you think we’re wrong! Most of these companies have not openly written about their (future) accelerated products, so we had to make those guesses.

Disclaimer: All brand and product names are or may be trademarks of, and are used to identify products or services of, their respective owners.

Last updated 6-Oct-2010.

GPU Manufacturers

GPUs being the first products targeted by OpenCL, we blast away with a list of CPU-manufacturers. You might see some unknown companies and now know which companies missed the train; it is pretty clear why GPU-manufacturers have interest in OpenCL.
We skip the companies who have a GPU-stack built upon ARM-techology and only focus on pure GPU-manufacturers in this category.

AMD

We’ve already discussed the biggest fan of OpenCL several times. While having better GPU-cards than NVIDIA (arguable per quarter of the year), they put their bets completely on OpenCL. They even get credits like “AMD’s OpenCL” when compared with NVIDIA’s CUDA.

The end of 2010, beginning of 2011 they will ship their Fusion-product having a CPU and GPU on one chip. The first Fusion-chips will not have a high-end GPU because of heating problems, is told to PC-store employees.

NVIDIA

AMD’s biggest competitor with the very well marketed similar product CUDA. Currently they have the most specialised products in market for servers. While they put more energy in their own technology CUDA, it must be said that they have adopted OpenCL more than any other hardware vendor.

Intel

The biggest part of the CPU-market is for Intel en guess once, who has the biggest GPU-market in hands? Correct: onboard-GPUs are Intel’s speciality, but their high-end GPU Larrabee might once see the market. Just like AMD they have the technology (and products) to have an integrated CPU/GPU which will be very interesting for the upcoming OpenCL-market.

They are openly interested in OpenCL. Here is a nice interview which explains how a CPU-designer looks at GPU-designs.

Vivante

Vivante manufactures GPU-chips. They claim their OpenGL ES 2.0-compliant silicon footprint is the smallest on the market. There is a lot of talk about OpenGL Shader Language (OpenCL’s grandpa), for which their products are very well suited for. Quote: “The recent trend in graphics hardware has been to replace fixed functionality with programmability in areas that have grown exceedingly complex, such as vertex processing and fragment processing. The OpenGL® Shading Language was designed to allow application programmers to express the processing that occurs at those programmable points of the OpenGL pipeline. Independently compilable units written in this language are called shaders. A program is a set of shaders that are compiled and linked together.”

Takumi

Japanese corporation Takumi manufactures the GSHARK, a 2D/3D hardware accelerator. The focus is on shaders, like Vivante.

Imagination Technologies (ImTech)

From their homepage: >>POWERVR enables a powerful and flexible solution for all forms of multimedia processing, including 3D/2D/vector graphics and general purpose processing (GP-GPU) including image processing.

POWERVR’s unique tile-based, deferred rendering/shading architecture allows a very small area of a die to deliver higher performance and image quality at lower power consumption than all competing technologies. All major APIs are supported including OpenGL ES 2.0/1.1, OpenVG 1.1, OpenGL 2.0/3.0 and DirectX9/10.1 and OpenCL.<<

Currently all ARM-based OpenCL-capable devices have POWERVR-technology.

Toshiba

Like other huge Japanese everything-factories, you don’t know what else they make. Besides rice cookers they also make multimedia chips.

S3

Once they were big in the consumer-market of graphics cards, but S3 still exists as a more business-oriented manufacturer of graphics products.

CPU Manufacturers

We miss the Power Architecture, but IBM and Freescale are members of this group.

Intel

While AMD tries to make OpenCL available for the CPU, we have not heard of a similar product from Intel yet. They see a future for multi-core CPUs, as seen in these slides.

ARM

Most known for its same-named low-power processor, not supported by MS Windows. You can read below how many companies have a license on their technology. Together with POWERVR-technology they power all the embedded OpenCL devices of the coming year.

IBM

Currently they are most known for their Cell-processor (co-developed with Toshiba and Sony) and have a license to build PowerArchitecture-CPUs. The Cell has full OpenCL-support as first non-GPU. Older types of PS3s (without the latest firmware) ad IBM’s servers can use the power of OpenCL. End of June 2010 Khronos conformed their “Development Kit for Linux” for Power VMX and PowerXCell8i processors.

Freescale

Once a Motorola-division, they make lots of different CPUs. Besides ARM- and PowerArchitecure-based ones, they also have it’s own ‘Coldfire’. We cannot say for which architecture they are interested in OpenCL, but we really would like to hear something from them since they can open many markets for OpenCL.

Systems on a Chip (SoC)

While it is cool to have a GPU-card in your pc, more and more the Graphics-functionality is integrated onto a CPU. Especially in the mobile/embedded/gadget-market you’ll find such System-on-a-Chip solutions, which are actually all ARM- or PowerArchitecture based.

3DLABS (ZiiLabs)

Creators of embedded hardware with focus on handhelds. They have partners of Khronos for a long time, having built the first merchant OpenGL GPU, the GLINT 300SX. They have just released a multimedia-processor, which is an ARM-processor with pretty interesting graphic capabilities.

They have an “early access program for OpenCL” for their ZMS product line.

Movidia

On their Technology overview-page they imply they have flexible accelerators in their designs, which *could* in the future be controlled by OpenCL-kernels. They manufacture mobile GPUs-plus-loads-of-extras which are quite impressive.

Texas Instruments

Besides ARM-based processors they also have DSPs. We watch them, for which product they have OpenCL in mind.

Qualcomm

They might be most famous for their ARM-based Snapdragon-chipset. They have much more products, but we think they start with Snapdragon before building OpenCL in other products.

Apple

The Apple A4 powers their new products, the iPad. It becomes more and more clear Apple has really learned that you cannot rely on one supplier, after waiting for IBM’s G6. With OpenCL Apple can now make software that works on ARM, all kind of GPUs and CPUs.

Samsung

They make anything that is fed by batteries, so for that reason they should be in the “other” category: mobile phones, mp3-players, photo-cameras, camcorders, laptops, TVs, DVD-players and Bluray-players. All products where OpenCL can wield.

A good reason to make their own semi-conductors, ARM-based.

In the beginning of June 2010 they have launched their own Linux-based OS for mobiles: Bada.

Broadcom

Manufactures networking and communications ICs for data, voice, and video applications. They could use OpenCL for their mobile multimedia processors.

Seaweed

Since September acquired by Presagis. We cannot be sure they continue the OpenCL-business of Seaweed, but at least GPGPU is mentioned once.

Presagis is “the worldwide leader in embedded graphics solutions for mission-critical display applications. The company has provided human-machine interface (HMI) graphical modeling tools, drivers and devices for embedded systems for over 20 years. Presagis pioneered both the prototyping of display graphics and automatic code generation for embedded systems in the 1990s. Since then, code generated by its flagship HMI modeling products has been deployed to hundreds of aircraft worldwide and its software has been certified on over 30 major aircraft programs worldwide. Presagis is your trusted partner for reliable, high-performance embedded graphics products and services.”

ST Microelectronics

ST has many products: “Singapore Technologies Electronics is a leader in ICT. It has main businesses in Enterprise, Satellite Communications and Interactive Digital Media. It is divided into several Strategic Business Units consisting of Info-Comms, Info-Software, Training and Simulation, Electro-Optics, Large Scale Group, Satcom & Sensor Systems.”

We think they’ve shown interest for OpenCL for use with their Imaging processors. Together with Ericsson they have a joint-venture in de mobile market, ST-Ericsson.

Handheld Manufacturers

While most companies will find it hard to make OpenCL-business in the consumer-market, consumer-products of other companies make sales a little bit warmer.

Apple

At least the iPad and iPhone have hardware-capabilities of running OpenCL. It is expected that it will come available in the next major release of the iPhone-OS, iOS 4. We’re waiting for more news.

Nokia

The largest manufacturer of mobile phones from Finland has a lot of technology. Besides smartphones, possibly a netbook (in cooperation with Intel) they also have Symbian and the QT-library. Since a while QT has support for OpenCL. We think the support of OpenCL in programming languages (in a more high-level way) is very important. See these slides to read some insights of the company.

Motorola

They have consumer products like mobile phones and business products like networking. It is not clear where they are going to use OpenCL for, since they mostly use other companies’ technologies.

Super-computers

While OpenCL can revive old computers once upgraded with a new GPU, imagine what they can do with Super-computers.

IBM

IBM builds super-computers based on different technologies. With OpenCL-support for their Power VMX and PowerXCell8i processors, it is already possible to use OpenCL with IBM-hardware.

Fujitsu

They have many products, but they also make super-computers which use GPGPU.

Los Alamos National Laboratory

They build super-computers and really can use the extra power.

A job-post talks about heterogeneous architectures and OpenCL.

Petapath

Petapath, founded in 2008, focuses on delivering innovative hardware and software solutions into the high performance computing (HPC) and embedded markets. As can be seen from their homepage they build grids.

NVIDIA

As a newcomer in the super-computer business, they do very well having helped to build the #2 HPC. Many clusters are upgraded with their streaming-processors.

Other Hardware

We don’t know what they are actually doing with the technology, purely because they are to big to make assumptions.

GE

US-based electronics-giant General Electronics builds everything there is, fed by electricity and now also GPGPU-powered solutions as can be found on their GPGPU-page. They probably switched to CUDA.

ST-Ericsson

Ericsson together with ST they have a joint-venture in de mobile market, ST-Ericsson. Ericssson is big in (mobile) networking. It also builds mobile phones with Sony. It is unclear what the joint-venture wants to do with the technology, but it must be mobile.

Software Developers

While OpenCL is very close to hardware, we have to talk software too. Did anybody say there is a strict line between hardware and software?

Graphic Remedy

Builders of debugging software. You will hear later more from us about this company soon. See something about debugging in this presentation.

RapidMind

RapidMind provided a software product that aims to make it simpler for software developers to target multi-core processors and accelerators (GPUs). It was acquired by Intel in august 2009.

HI

Japanese corporation HI has a product MascotCapsule, which is a real-time 3D rendering engine (native library) that runs on embedded devices. We see names of other companies, except SMedia. If you’re not familiar with mobile GPUs, here you have a list.

This is another big hint, OpenCL will have a big future on mobile devices.

MascotCapsule V4 product specification

Operating environment	CPU	ARM: ARM9 or above Freescale: i.MX Series Marvell: XScale Qualcomm: MSM6280/6550/7200/7500 etc. Renesas Technology: SH-Mobile etc. Texas Instruments: OMAP 32-bit 150 MHz or above is recommended (Capable of running without a floating-point hardware)
	Code size	Approx. 200 KB
	Engine work area	2 MB or more is recommended, including data load area Note: The actual required work area varies depending on the content
3D hardware accelerator		ATI: Imageon Imagination Technologies: PowerVR MBX/MBX Lite/SGX NVIDIA: GoForce SMedia: Glamo TAKUMI: GSHARK Toshiba: T4G/T5G Other OpenGL ES compliant 3D accelerators
OS/platforms		BREW, iPhone, iPod touch, ITRON, Java, Linux, Symbian OS, Windows CE, Windows Mobile
3D authoring tools		3ds Max 9.0/2008/2009/2010 Maya 8.5/2008/2009/2010 LightWave3D 7.5 or later SOFTIMAGE\|XSI 5.x/6.x/7.0

Codeplay

They are most famous for their compilers for the Playstation. They also make code-analysis software.

QNX

From their homepage: “Middleware, development tools, realtime operating systemsoftware and services for superior embedded design”. Their real-time OS in all kinds of embedded products and they might want to see ways to support specialised low-power chips.

RIM acquired QNX in april 2010.

Fixstars

Newcomer in the list 2010. Famous for their PS3-Linux and for their OpenCL-book. They also have FOXC, Fixstars OpenCL Cross Compiler. They have written one of the few books for OpenCL.

Kestrel Institute

http://www.kestrel.edu/ does not show anything GPGPU. We’ll probably hear from them when the next version of their Specware-product is finished.

Game Designers

Physics-calculations and AI are too demanding to do on a CPU. The game-industry keeps pushing the GPU-industry, but now on a different way than in the 90’s.

Electronic Arts

This game-studio builds loads and loads of games with impressive AI. See these slides to see what EA thinks GPGPU can do.

Activision Blizzard

Yes, they are one company now, so now they are together famous for best-selling hit “World of Warcraft”. Currently not much is known where they use OpenCL for, but probably the same as EA.

Thank you for your interest in this article

If you know more about OpenCL at these companies or job-posts, please let us know via comment or via e-mail.

We’ve made some assumptions about what these companies use OpenCL for – we need your feedback!

Let’s enter the Top500 HPC list using GPUs

Posted by Vincent Hindriksen on 6 June 2010 with 1 Comment

The #500 super-computer has only 24 TFlops (2010-06-06): http://www.top500.org/system/9677

update: scroll down to see the best configuration I have found. In other words: a cluster with at least 30 nodes with 4 high-end GPUs each (costing almost €2000,- per node and giving roughly 5 TFlops single precision, 1 TFLOPS double precision) would enter the Top500. 25 nodes to get to a theoretic 25TFlops and 5 extra for overcoming the overhead. So for about €60 000,- of hardware anyone can be on the list (and add at least €13 000 if you want to use Windows instead of Linux for some reason). Ok, you pay most for the services and actual building when buying such a cluster, but you get the idea it does not cost you a few millions any more. I’m curious: who is building these kind of clusters? Could you tell me the specs (theoretical TFlops, LinPack TFlops and watts/TFlop) of your (theoretical) cluster, which costs the customer less then €100 000,- in total? Or do you know companies who can do this? I’ll make a list of companies who will be building the clusters of tomorrow, the “Top €100.000,- HPC cluster list”. You can mail me via vincent [at] this domain, or put your answer in a comment.

Update: the hardware shopping-list

Nobody told in the remarks it is easy to build a faster machine than the one described above. So I’ll do it. We want the most flops per box, so here’s the wishlist:

A motherboard with as many slots as possible for PCI-E, CPU-sockets and memory-banks. This because the lag between the nodes is high.
A CPU with at least 4 cores.
Focus on the bandwidth, else we will not be able to use all power.
Focus on price per GFLOPS.

The following is what I found in local computer stores (which for some reason people there love to talk about extreme machines). AMD currently has the graphics cards with the most double precision power, so I chose for their products. I’m looking around for Intel + Nvidia, but currently they are far behind. Is AMD back on stage after being beaten by Intel’s Core-products for so many years?

The GigaByte GA-890FXA-UD7 (€245,-) has 1 AM3-socket, 6(!) PCI-e slots and supports up to 16GB of memory. We want some power, so we use the AMD Phenom II X6 1090T (€289,-), which I chose for the 6 cores and the low price per FLOPS. And to make it a monster, we add 6 times a AMD HD5970 (€599,-) giving 928 x 6 = 3264 DP-GLOPS. If it can handle 16GB DDR3 (€750,-), so we put it in. It needs about 3 Power-supplies of 700 Watt (€100,-). We add 128GB SSD (€350,-) for working data and a big 2 TB HDD (€100,-). Case needs to house the 3 power supplies (€100,-). Cooling is important and I suggest you compete with a wind-tunnel (€500,-). It will cost you €6228,- for 5,6 Double Precision TFLOPS, and 27 TFLOPS single precision. A cluster would be on the HPC500-list for around €38000,- (pure hardware-price, not taking network-devices too much into account, nor the price for man-hours).

Disclaimer: this is the price of a single node, excluding services, maintenance, software-installation, networking, engineering, etc. Please note that the above price is pure for building a single node for yourself, if you have the knowledge to do so.

X86 Systems-on-a-Chip and GPGPU

Posted by Vincent Hindriksen on 4 May 2010 with 3 Comments

The System-on-a-chip (SoC) for X86 will be a revolution for GPGPU. Why? Because currently a big problem is transferring data from CPU-memory to GPU-memory and back, which will be solved with SoCs. Below you can read this architecture-target is very possible.

With AMD+ATI, Intel and its future high-end GPUs, and NVidia with the rumours around its X86-chips, we will certainly get changes in the field. If it is the way to go, what is probable?

Get both CPU and high-end GPU on 1 chip, separated memory
Techniques for sharing memory
Translating OpenCL from and to C on the fly

ARM-processors are combined with GPUs a lot of times, but they don’t have current support for a common shader-languages (read: OpenCL) to make GPGPU in reach. We’ve asked ourselves many times why ARM & friends are involved in OpenCL since the beginning, but still don’t have any public and promoted driver-support. More on ARM, once there is more news on multi-core ARM-CPUs or OpenCL drivers.

1: One chip for everything

The biggest problem with split CPU/GPU-functionality is the bus-speed between the two is limited. The higher this speed, the more useful GPGPU can be. The highest speeds are possible when the signal does not have to leave the chip and there are no concessions made to the architecture of the graphics-card, in other words: glueing CPU and GPU together, but leave the memory-buses the same.

Currently there is Intel’s Nehalem and AMD’s Fusion, but they use DDR3 for both GPU and CPU; this will not really unlock the GPGPU-possibilities of high-end GPUs. It seems these products were designed with lower costs in mind.

But the chances high-end GPUs will be integrated on the CPU is rising. Going to 32nm gives room for more functionality, such as GPUs. Other choices can be smaller chips, more cores and integrating functionality of the north/south-bridge of the motherboard. If GPU-cores can be turned off when not working optimally when being tested in the factory (just like they do with mult-core CPUs), integrating high-end GPU-cores will even become a save choice.

Another way it could go is using optical buses between the GPU and CPU. It’s unknown if it will really see mainstream markets soon enough.

2: Shared memory – new style

Some levels of cache and all memory should be easy accessible by both types of cores. Why? Because eventually you want to switch between CPU- and GPU-instructions continuously. CUDA has a nice feature already, which keeps objects synchronised between CPU and GPU; one step further is leaving out the need of synchronising.

The problem is that video-memory is accessed more parallel to provide higher data-speeds (GDDR5), so we don’t want to limit the GPU by attaching them to slower (=lower bandwidth) DDR3. Doing it the other way would then be the best solution: giving CPUs direct access to GDDR. There is always a probable option that a new type of (replaceable) memory will be used, which has a dual-bus by design.

The hard part is memory-protection; since now more devices get control to memory, the overhead of controlling/arranging the spots can increase enormously and might need a separate core for it – just like the Cell-processor. This need-for-control is a reason I don’t expect access to each other memory before there will be a fast bus between GPU and CPU, since then the access to GDDR via the GPU’s memory-manager will be much faster and maybe even fast enough.

3: Grown up software

If software would be able to easily select devices and use the same code for each device, then we’ve made a giant step forwards. Software has always been one step behind hardware; so when you do not develop such techniques, you just have to wait a while.

Translating OpenCL into normal C and back will be possible in all kinds of ways, once there is more acceptance of (and thus demand for) GPGPU. AMD’s OpenCL-implementation for CPUs is also a way to merge the fields of CPU and GPU. It’s hard to tell how these techniques will merge, but it will certainly happen. Think of situations that some instructions will be sent to the GPU by the OS even when the (OpenCL) programmer did not think of it. Or do you expect to be an ARM-processor integrated in a near-future CPU, when you write an OpenCL-kernel now?

See our article on the bright future of GPGPU to read more about it.

What’s next?

In case this is the way it goes, there will be a lot possible for both OpenCL and CUDA – depending on market demands. Some possibilities will be discussed in an upcoming article about FPGAs, but also let me hear what you think about X86-SoCs. Comment or send an e-mail.

Copyright

All content, media, theme and blogs are copyright 2010-2012 StreamHPC and Vincent Hindriksen, unless otherwise stated. For questions about using material for your own business, blog or personal usage, please contact us to ask for permission. We protect our copyrights by any means necessary.

OpenCL is a trademark of Apple Computers Inc.

We work a lot with open source software, such as WordPress and Eclipse. The brochures are created with Inkscape and svgslides. We believe that the base of innovation should be for everybody, so everybody can build on top of that. That’s why you get all the information from the blog for free, as sharing information could give us necessary information back in return.

Used photos

Many photos link to the origin and sometimes tell a story or show the webpage of an artist. The images bellow are not linked, as they are used in the slider.

All other photos and images are bought from paid services: 123RF and Big Stock Photo. Please contact us if you would like a to know the link of a certain photo or image.

Vivante GPU (Freescale i.MX6)

Vivante got into the news with OpenCL, when winning in the automotive-industry from NVIDIA. Reason: the car-industry wanted the open standard OpenCL. We at StreamHPC were very glad with this recognition from the automotive-industry.

See their GPGPU-page for more info, where below table is taken from.

	GC800 Series	GC1000 Series	GC2000 Series	GC4000 Series	GC5000 Series	GC6000 Series
Clock Speed MHz	600 – 800	600 – 800	600 – 800	600 – 800	600 – 800	600 – 800
Compute Cores	1	1	1	2	4	Up to 8
Shader Cores	1 (Vec-4) 4 (Vec-1)	2 / 4 (Vec-4) 8 / 16 (Vec-1)	4 (Vec-4) 16 (Vec-1)	8 (Vec-4) 32 (Vec-1)	8 (Vec-4) 32 (Vec-1)	16 (Vec-4) 64 (Vec-1)
Shader GFLOPS	6–8 (High) 12–16 (Medium)	11–30 (High) 22–60 (Medium)	22–30 (High) 44–60 (Medium)	44–60 (High) 88–120 (Medium)	44–60 (High) 88–120 (Medium)	88–118 (High) 176–236 (Medium)
GPGPU Options	Embedded Profile	Embedded Profile	Embedded / Full Profile	Embedded / Full Profile	Embedded / Full Profile	Embedded / Full Profile
Cache Coherent	Yes	Yes	Yes	Yes	Yes	Yes

One big advantage Vivante claims to have over the competition, is the GFLOPS/mm². This could be of advantage to win the 1TFLOPS-war over their competition (which they’ve entered). The upcoming GC4000 series can push around 48GFLOPS, leaving the 1TFLOPS to the GC6000 series.

Their GPUs are sold as IP to CPU-makers, so they don’t sell their own chips. Vivante has created the GPU-drivers, but you have to contact the chip-maker to obtain them.

Freescale i.MX6

The i.MX6 Quad (4 ARM Cortex-A9 cores) and i.MX6 Dual (2 ARM Cortex-A9 cores) have support for OpenCL 1.1 EP (Embedded Profile). (source)

Both have a Vivante GC2000 GPU, which has 16 GFLOPS to 24 GFLOPS depending on the source. The GPU cores can be used to run OpenGL 2.0 ES shaders and OpenCL EP kernels.

Board: SABRE

There are several boards available. Freescale suggests to use the SABRE Board for Smart Devices ($399).

This Linux board support package including OpenCL drivers and general BSP documentation is available for free download on the product page under Software & Tools tab.

Other Boards

Alternative evaluation boards from 3^rd parties can be found by searching on internet for “i.MX6Q board” as there are many! For instance the Wandboard (i.MX6Q, $139, shown at the left – was tipped that the Dual is actually a Duallite and thus not have support!!).

OpenCL-driver found on the Wandboard Ubuntu-image – download clinfo-output here (gcc clinfo.c -lOpenCL -lGAL).

Drivers & SDK

Under the Software & Tools tab of the SABRE-board there are drivers – they have not been tested with other boards, so no guarantees are given.

Most information is given in Get started with OpenCL on Qualcomm i.MX6.

IMX6_GPU_SDK: a collection of GPU code samples, for OpenCL the work is still in progress. You can find it under “Software Development Tools” -> “Snippets, Boot Code, Headers, Monitors, etc.”
IMX_6D_Q_VIVANTE_VDK_<version>_TOOLS: GPU profiling tools, offline compiler and an emulator with CL support which runs on Windows platforms. Be sure you pick the latest version! You can find it under “Software Development Tools” -> “IDE – Debug, Compile and Build Tools“.

More info

Check out the i.MX-community. You can also contact us for more info.

To see a great application with 4 i.MX6 Quad boards using OpenCL, see this “Using i.MX6Q to Build a Palm-Sized Heterogeneous Mini-HPC” project.

NVIDIA’s answer to SandyBridge and Fusion

Posted by Vincent Hindriksen on 10 January 2011 with 1 Comment

Intel has Sandy Bridge, AMD has Fusion, now NVIDIA has a combination of CPU and GPU too: Project Denver. The only difference is that it is not X86-based, but an ARM-architecture. And most-probable the most powerful ARM-GPU of 2011.

For years there were ARM-based Systems-on-a-chip: a CPU and a GPU combined (see list below). On the X86-platform the “integrated GPU” was on the motherboard, and since this year now both AMD/ATI and Intel hit this “new market”.The big advantage is that it’s cheaper to produce, is more powerful per Watt (in total) and has good acceleration-potential. NVIDIA does not have X86-chips and would have been the big loser of 2011; they did everything to reinvent themselves: 3D was reintroduced, CUDA was actively developed and pushed (free libraries and tools, university-programs, many books and trainings, Tesla, etc), a mobile Tegra graphics solution [1] (see image at the right), and all existing products got extra backing from the marketing-department. A great time for researchers who needed to get free products in exchange of naming NVIDIA in their research-reports.

NVIDIA chose for ARM; interesting for who is watching the CUDA-vs-OpenCL battle, since CUDA was for GPUs of NVIDIA on X86 and ARM was solely for OpenCL. Period. In the contrary to their other ARM-based chips, this new chip probably won’t be in smartphones (yet); it targets systems that need more GPU-power like CUDA and games.

In a few days the article about Windows-on-ARM is to be released, which completes this article.

Continue reading “NVIDIA’s answer to SandyBridge and Fusion” →

The current state of WebCL

Posted by Vincent Hindriksen on 28 September 2011 with 7 Comments

Years ago Microsoft was in court as it claimed Internet Explorer could not be removed from Windows without breaking the system, while competitors claimed it could. Why was this so important? Because (as it seems) the browser would get more important than the OS and internet as important as electricity in the office and at home. I was therefore very happy to see the introduction of WebGL, the browser-plugin for OpenGL, as this would push web-interfaces as the default for user-interfaces. WebCL is a browser-plugin to run OpenCL-kernels. Meaning that more powerful hardware-devices are available to JavaScript. This post is work-in-progress as I try to find more resources! Seen stuff like this? Let me know.

Continue reading “The current state of WebCL” →

Imagination Technologies PowerVR

[infobox type=”information”]

Need a PowerVR programmer? Hire us!

[/infobox]

Currently there are two PowerVR GPU architectures with OpenCL support: the 5 series (scroll down) and the 6 series (introduced in 2014).

PowerVR 6

In 2013 companies will launch processors using IP from Imagination Technologies, the PowerVR G6230 and G6430. Named licensees are:

ST-Ericcson: NovaThor A9600 – not available yet or even mentioned on their own webpage.
Texas Instruments: no products anounced, latest OMAP5-series are PowerVR 5 based.
Renesas Electronics: no products announced, latest are based on Series5 (SGX54x and 53x).
MediaTek: no products anounced, latest MT6577 is based on Series5.
HiSilicon: licensed, but no products announced.

While a lot of news was around this platform, it has been delayed several times. Their latest designs in the series are running on an FPGA and more details will be given at CES 2013 (source).

Performance

Below you see where the PowerVR6 stands. It is clocked much higher (from 250MHz to 600MHz), which suggests it will be baked sub-sub 45 nm. The PowerVR 5 is used in for example the iPad2 and delivers around 70GFlops. The PowerVR6 G62x0 is promised to deliver 200GFlops and up. The TFLOPS barrier is promised to be broken with the series.

3_cpu_vs_gpu_GFLOPS_bars — Comparison between PowerVR 5 and 6 series.

http://withimagination.imgtec.com/powervr/powervr-series6xe-gpus-bring-opengl-es-3-0-graphics-everyone

http://blog.imgtec.com/news/accelerate-design-closure-for-ip-cores-from-imagination-dok-design-flows

The below image shows the 6-series are baked on 32nm and below. It shows different series-identifiers though. The “3” in G6x30 is the addition of “frame buffer compression logic” (source).

PowerVR 5

The chipset that currently dominates mobile devices, from the PSP to tablets to phones to Apple iPad.

Drivers

Imagination only sells IP and refers to their licensees for driver-support. If you have ideas what to do with OpenCL -on-PowerVR, you can request for an NDA here.

Texas Instruments has the drivers available, but only under an SLA. Contact your TI-representative for the most recent information.

Samsung delivers drivers with their Exynos 5 Octa development board (Odroid XU).

Boards

Let me know if you know a TI-board and can give me a description of the business-requirements to get hands on drivers.

Exynos 5410: ODROID-XU

ARM Cortex-A15 Quad 1.6GHz + Cortex-A7 Quad 1.2GHz
PowerVR 5 SGX544 MP3 GPU
2GB LPDDR3 (12.8GB/s memory bandwidth)
Lots of IO-ports (see image below). No wifi without dongle.
CCI-400 bug seems to be fixed (source). Not clear how.

OpenCL info from the FAQ:

Which OpenGL and OpenCL are included in Android?
OpenGL ES 1.1 and OpenGL ES 2.0
OpenCL 1.1 Embedded Profile

Will It run Ubuntu or other Linux distros?
Currently we supports only Ubuntu 13.04 server version with only serial console.
We need to develop HDMI/LCD driver for Xorg display.
We are trying to release Linux BSP with OpenGL/OpenCL in Q4 of 2013.

Buy here. Forum here.

Drivers

See this page for explanation how to write the images. The links don’t get updated, so check the above links for the latest versions.

Minimum price for board+eMMC+shipping is $273,-. Price including some needed (eMMC, shipping), possibly needed (HDMI, USB-UART) and convenience add-on’s (SD, wifi) is:

Board includes adaptor and case. An eMMC is needed for a fast OS, but a micro-SD also works – although slower.

Without the integrated power analysis tool, it’s $30,- less. Ordering only the board is $10 less shipping.

Devices

Once I get more (public) info on OpenCL-drivers, this section will be extended.

Kindle Fire HD

As seen on Engadget, Imagination Technologies is working with Amazon and Texas Instruments to deliver OpenCL-enabled Kindle Fire HDs.

http://www.youtube.com/watch?v=-twOwM4LP9o

The chipset is a OMAP 4470 by Texas Instruments, which contains a PowerVR SGX544 GPU running at 384MHz. It only delivers 24.5GFLOPS.

ST-Ericsson NovaThor LP9600 (Nova A9600)

This chipset has a dual-core ARM Cortex-A15 2,3 GHz, 28 nm, PowerVR series6 and “d-channel LP-DDR2”. It should become available in Q1 2013.

Since Ericcson will leave ST-Ericcson after a transition period, it is unclear if the chipset is delayed.

Imagination

Imagination is best known for their GPUs in Apples iDevices. They have support for:

OpenCL
Apple Metal
Vulkan
OpenGL
Google RenderScript

Imagination is a strong supporter of Khronos APIs OpenCL and Vulkan.

OpenCL

Currently there are two PowerVR GPU architectures with OpenCL support: the 5 series (scroll down) and the 6 series (introduced in 2014).

PowerVR 6

In 2013 companies will launch processors using IP from Imagination Technologies, the PowerVR G6230 and G6430. Named licensees are:

ST-Ericcson: NovaThor A9600 – not available yet or even mentioned on their own webpage.
Texas Instruments: no products anounced, latest OMAP5-series are PowerVR 5 based.
Renesas Electronics: no products announced, latest are based on Series5 (SGX54x and 53x).
MediaTek: no products anounced, latest MT6577 is based on Series5.
HiSilicon: licensed, but no products announced.

While a lot of news was around this platform, it has been delayed several times. Their latest designs in the series are running on an FPGA and more details will be given at CES 2013 (source).

Performance

http://withimagination.imgtec.com/powervr/powervr-series6xe-gpus-bring-opengl-es-3-0-graphics-everyone

http://blog.imgtec.com/news/accelerate-design-closure-for-ip-cores-from-imagination-dok-design-flows

The below image shows the 6-series are baked on 32nm and below. It shows different series-identifiers though. The “3” in G6x30 is the addition of “frame buffer compression logic” (source).

PowerVR 5

The chipset that currently dominates mobile devices, from the PSP to tablets to phones to Apple iPad.

Drivers

Imagination only sells IP and refers to their licensees for driver-support. If you have ideas what to do with OpenCL -on-PowerVR, you can request for an NDA here.

Texas Instruments has the drivers available, but only under an SLA. Contact your TI-representative for the most recent information.

Samsung delivers drivers with their Exynos 5 Octa development board (Odroid XU).

Boards

Let me know if you know a TI-board and can give me a description of the business-requirements to get hands on drivers.

Exynos 5410: ODROID-XU

ARM Cortex-A15 Quad 1.6GHz + Cortex-A7 Quad 1.2GHz
PowerVR 5 SGX544 MP3 GPU
2GB LPDDR3 (12.8GB/s memory bandwidth)
Lots of IO-ports (see image below). No wifi without dongle.
CCI-400 bug seems to be fixed (source). Not clear how.

OpenCL info from the FAQ:

Which OpenGL and OpenCL are included in Android?
OpenGL ES 1.1 and OpenGL ES 2.0
OpenCL 1.1 Embedded Profile

Buy here. Forum here.

Drivers

See this page for explanation how to write the images. The links don’t get updated, so check the above links for the latest versions.

Minimum price for board+eMMC+shipping is $273,-. Price including some needed (eMMC, shipping), possibly needed (HDMI, USB-UART) and convenience add-on’s (SD, wifi) is:

Board includes adaptor and case. An eMMC is needed for a fast OS, but a micro-SD also works – although slower.

Without the integrated power analysis tool, it’s $30,- less. Ordering only the board is $10 less shipping.

Devices

Once I get more (public) info on OpenCL-drivers, this section will be extended.

Kindle Fire HD

As seen on Engadget, Imagination Technologies is working with Amazon and Texas Instruments to deliver OpenCL-enabled Kindle Fire HDs.

http://www.youtube.com/watch?v=-twOwM4LP9o

The chipset is a OMAP 4470 by Texas Instruments, which contains a PowerVR SGX544 GPU running at 384MHz. It only delivers 24.5GFLOPS.

ARM MALI

[infobox type=”information”]

Need a MALI programmer? Hire us!

[/infobox]

ARM takes OpenCL serious and has various developer boards and drivers for their MALI GPU. Most notable Samsung has their Exynos chips, but now also Rockchip brings high-end MALI GPUs.

Drivers and SDK

ARM MALI Linux SDK

The SDK can be downloaded here. Developers manual is here. A 14-page FAQ with lots of answers on your questions is here.

For compilation on Ubuntu g++-arm-linux-gnueabi is needed. Also remove the “-none” in platform.mk. Then compilation will result in a libOpenCL.so.

Drivers for Android

Software available for the Arndale can be found here – drivers (including graphics drivers with OpenCL-support) are here. The current state is that their OpenCL drivers are sometimes working, most times not – we are very sorry for that and try to find fixes.

We did not test the drivers with other devices than the Arndale with the same chipset (such as the new Chromebook and the Nexus 10).

Drivers for Linux

For the Samsung Chromebook drivers are available here. For Arndale these drivers should work too (not tested yet), if you use the kernel-drivers of the same version.

Firefly RK3288

The Firefly has:

RK3288 Cortex-A17 quad core@ 1.8GHz
Mali-T764 GPU with support for OpenGL ES 1.1/2.0 /3.0, OpenVG1.1, OpenCL, Directx11

Drivers have not yet been tested with this board yet! Cheap alternatives can be found here.

Exynos 5 Boards

The following boards are available:

Arndale 5250-A
~~ARMBRIX Zero~~
YICsystem YSE5250
YICsystem SMDKC520
Nexus 10
Samsung Chromebook

Scroll down for more info on these boards.

Board 1: Arndale 5250-A

The board is fully loaded and can be extended with touch-screen, SSD, Wifi+sound and camera. Below is an image with the soundboard and connectivity-board attached.

Working boards using OpenCL on display (click on image to see the Twitter-status):

Here are a few characteristics:

Cortex-A15 @ 1.7 GHz dual core
128 bit SIMD NEON
2GB 800MHz DDR3 (32 bit)

More information can be found on the wiki and forum.

Order information

For more information and to order, go to http://howchip.com/shop/item.php?it_id=AND5250A. For an overview of extensions, go to: http://howchip.com/shop/content.php?co_id=ArndaleBoard_en. The price is $250 for the board, $50 for shipping to Europe, and extension-boards start at $60. You need a VAT-number to get it through the customs, but you have to pay EU-VAT anyway.

Currently you need to order the LCD too, as the latest proprietary drivers (which includes OpenCL) does not work with HDMI. There are (vague) solutions, to be found on the forums.

Be sure you can buy a good 5V adapter (the + at the pin). A minimum of 3A is required for the board (TDP of the whole board is 11 to 12 Watt). Adapter costs around $25,- in the store, or you can buy them online for $7,50. You also need a serial cable – there are USB2COM-cables under €20,-. If you are in doubt, buy the $60,- package with cables (no COM2USB), adapter and microSD card.

Board 2: ARMBRIX Zero cancelled

This board has the same processor, but lacks the extension-boards and wifi/bluetooth. It does have 2GB 800MHz DDR3, various USB-ports, SATA(!) and Ethernet RJ45.

Order information

~~For more information and to order, go to http://www.howchip.com/shop/item.php?it_id=BRIX5250A.~~ The price is $145 for the board, $50 for shipping to Europe. You need a VAT-number to get it through the customs.

Make sure you can buy a good adapter (the + at the pin). The minimum for the Arndale is 3A, but I am not sure this board draws less. Adapter costs around $25,- in the store, or you can buy them online for $7,50.

Board 3: YICsystem YSE5250

This board has 2GB DDR3L(32bit 800MHz) and 8GB eMMC (onboard memory-card), USB 3.0 and LAN. Optional boards are for audio, WIFI, sensors (Gyroscope, Accelerometer, Magnetic, Light & Proximity), 5MB camera, LCD and GPS.

Currently it is unknown if OpenCL-drivers will be delivered, and there is no mention of it on their site.

Below you’ll find the layout of the board.

Order information

You can order at http://www.yicsystem.com/products/low-cost-board/yse5250/. The complete board costs $245,-. Costs for shipping and import are unknown.

Board 4: YICsystem SMDKC520

The SMDKC520 board is the offical reference board of Samsung Exynos5250 System. Currently it is unknown if OpenCL-drivers will be delivered, but as chances are high I already put it here.

It is like the YSE5250, but it seems it includes WIFI, camera and LCD – though the webpage is very vague. Once I have more info on the YSE5250, I’ll continue on getting more infor on this board.

Price is unknown, but does not fall under “budget boards”.

Order information

You can send an enquiry at http://www.yicsystem.com/products/smdk-board/smdk-c520/. Remember that OpenCL-support is currently unknown!

Board 5: Google Nexus 10

OpenCL-drivers have been found pre-installed on this tablet, so with some tinkering you can run openCL rightaway.

It is a complete tablet, so no case-modding is needed. It has 2GB RAM, WIFI, 16 or 32GB eMMC, 5MP camera, 10″ WXGA LCD, all sensors, NFC, sound, etc. For all the specs, see this page.

Order information

You can order the Nexus 10 not in all countries, as google has restricted sales channels. See http://www.google.com/nexus/10/ for more info on ordering. With some creativity you can find ways to order this tablet into countries not selected by Google. Price is $400 or €400.

Board 6: Samsung Chromebook

For €300 a complete laptop that runs Linux and has OpenGL ES and OpenCL 1.1 drivers? That makes it a great OpenCL “board”.

See ARM’s Chromebook dev-page for more information on how to get Linux running with OpenCL and OpenGL.

The drivers are brand new – when we’ve tested it, we’ll add more information on this page.

Products using OpenCL on ARM MALI are coming

Posted by Vincent Hindriksen on 26 October 2013

The past year you might not have heard much from OpenCL-on-ARM, besides the Arndale developer-board. You have heard just a small portion of what has been going on.

Yesterday the (Linux) OpenCL-drivers for the Chromebook (which contains an ARM MALI T604) the have been released and several companies will launch products using OpenCL.

Below are a few interviews with companies who have built such products. This will give an idea of what is possible on those low-power devices. To first get an idea of what this MALI T604 GPU can do if it comes to OpenCL, here a video from the 2013-edition of the LEAP-conference we co-organised.

http://www.youtube.com/watch?v=UQXfjvcqiQg

Understand that the whole board takes less than ~11.6 Watts – that is including the CPU, GPU, memory , interconnects, networking, SD-card, power-adapter, etc. Only a small portion of that is the GPU. I don’t know the exact specs as this developer-board was not targeted towards energy-optimisation goals. I do know this is less than the 225 Watts of a discrete GPU alone.

Interviews with ARM partners Continue reading “Products using OpenCL on ARM MALI are coming” →