Beyond function there is performance.

We are in the niche of GPGPU-computing, where GPUs are programmed to efficiently run scientific and large-scale simulations, AI training/inference and other mathematical compute-intensive software. As a recognized expert, customers from mostly US and Europe trust us to speed up their software.

Our projects range from several person-weeks to fix software performance problems, to several person-years to build extensive high performance software and libraries.

Join a growing list of companies that trust us with designing and building their core software with performance in mind.

 

A selection of our customers

We have helped many companies become competitive, and cannot mention them here as of today. Below are public examples.

RocRAND. The world’s fastest random number generator is built for AMD GPUs, and it’s open source. With random numbers generated at several hundreds of gigabytes per second, the library makes it possible to speed up existing code numerous times. The code is faster than Nvidia’s cuRAND and is therefore the preferred library to be used on any high-end GPU.

RocPRIM. A version of CUB optimised for AMD GPUs and is fully open source. This enables software like Tensorflow to run on AMD hardware at full performance.

OpenCL 2.2 test-suite. When a hardware-company wants to have OpenCL 2.2 on their processor, they need to use a large test-suite to test their drivers and device. We made that update, which was a big change from 2.1 because of the addition of C++ kernels. We hope to see more devices support OpenCL 2.2 and find the new test suite to be complete and correct.

GROMACS does soft matter simulations on molecular scale

We ported GROMACS to OpenCL and optimised the code for usage with AMD FirePro accelerators. This resulted in code that is as fast with CUDA. Gromacs is used world-wide by over 5000 research centers, from simulating molecular docking to examining the hydrogen bonds in a falling water drop. Read more…

stanford_chemistry_logo

For the university of Stanford we optimised a part of TeraChem, a general purpose quantum chemistry software designed to run on NVIDIA GPU architectures. Our work resulted in adding an extra 70% performance to the already optimised CUDA-code.

UniOfManchesterLogo.svg

For the University of Manchester we got a large speedup with UNIFAC when going from OpenMP code to optimised OpenCL. Where OpenMP could get the single threaded code down to about 8 seconds, we brought it down to 0.062 seconds. Read more…

Memorial Sloan-Kettering Cancer Center-logo

We helped the Memorial Sloan-Kettering Cancer Center with improving a tool they used daily. Where it previously took one hour, it now takes just two minutes – a speed-up of 30x. Their productivity rose, as they did not need to wait for results so long anymore and could get more done without buying new computers.

Success stories

Want to read more what we did? Read about work we do.

Our customers did not want to hire another team, as they wanted the code to be fast the first time.

Technologies we work with

CUDA
HSA
OpenMP
OpenACC
ROCm

Mobile Processor OpenCL drivers (Q3 2013) + rating

saveFor your convenience: an overview of all ARM-GPUs and their driver-availability. Please let me know if something is missing.

I’ve added a rating, to friendly push the vendors to get to at least an 7. Vendors can contact me, if they think the rating does not reflect reality.

ZiiLabs

SDK-page@StreamHPC

Drivers can be delivered by Creative, when you pledge to order ZMS-40 processors. Mail us for a contact at Creative. Minimum order size is unknown.

This device can therefore only be used for custom devices.

[usr=4]

Vivante

SDK-page@StreamHPC

They are found on public devices. Android-drivers that work on FreeScale processors are openly available and can be found here.

[usr=8]

Even though the processors are not that powerfull, Vivante/FreeScale offers the best support.

Qualcomm

SDK-page@StreamHPC

Drivers are not shipped on devices, according various sources. Android-drivers are in the SDK-drivers though, which can be found here.

[usr=7]

Rating will go up, when drivers are publicly shipped on phones/tablets.

ARM MALI

Samsung SDK-page@StreamHPC

There are lots of problems around the drivers for Exynos, which only seem to work on the Arndale-board when the LCD is also ordered.Android-drivers can be downloaded here.

[usr=5]

All is in execution – half-baked drivers don’t do it. It is unclear whom to blame, but it certainly has had influence on creating a new version of Exynos 5, the octa.

Imagination Technologies

SDK-page@StreamHPC

TI only delivers drivers under NDA. Samsung has one board coming up with OpenCL 1.1 EP drivers.

[usr=5]

Rating will go up, when drivers from TI come available without obstacles, or Samsung delivers what they failed to do with the previous Exynos 5.

Exciting times coming up

Mostly because of a power-struggle between Google and the GPU-vendors, there is some hesitation to ship OpenCL drivers on phones and tablets. Unfortunately, Google’s answer to OpenCL RenderScript Compute, does not provide the needs wanted by developers. Google’s official answer is that it does not want fragmentation nor code that is optimised for a certain GPU. The interpreted answer is that Google wants vendor-lockin and therefore blocks the standard. Whatever the reason is, OpenCL is used as sword to show teeth who has a say about the future of Android – only the advertisement-company Google or also the group of named processor-makers and various phone/tablet-vendors?

In H2 2014 Nvidia will ship CUDA-drivers with their Tegra 5 GPUs, making the soap complete.

There are rumours Apple will intervene and will make OpenCL available on iOS. This would explain why there is put so much effort in showing OpenCL-results by Imagination and Qualcomm

And always keep a close watch on POCL, the vendor-independent OpenCL implementation.

[bordered_box border_color=” background_color=’#C1DAD6′]

Need a programmer for any of the above devices? Hire us!

[/bordered_box]

StreamComputing is 7 years!

As of 1 April we are 7 years old. Because of all the jokes on that day, this post is a bit later.

Let me take you through our journey how we grew up from a 1-person company to what we’re now. With pride I can say that (with ups and downs) StreamComputing (now rebranded to StreamHPC) has become a brand that equals to (extremely) fast software, HPC, GPUs and OpenCL.

7 years of changes

Different services

After 7 years it’s also time for changes. Initially we solely worked on OpenCL related services, mostly GPUs. And this is what we’re currently doing:

  • HPC GPU computing: OpenCL, CUDA, ROCm.
  • Embedded GPU computing: OpenCL, CUDA, RenderScript, Metal.
  • Networked FPGA programming: OpenCL.
  • GPU-drivers testing and optimisation.
  • Software architecture optimisations.

While you see OpenCL a lot, our expertise in vendor-specific CUDA (NVidia), ROCm (AMD), RenderScript (Google) and Metal (Apple) cannot be ignored. Hence the “Performance Engineers” and not “GPU consultants” or “OpenCL programmers”.

From Fixers to Builders and getting new competition

Another change is that we have been going from fixing code afterwards to building software.

This has been a slow process and had to do with the confidence in performance engineering as an expert profession instead of a trick. We’re seeing new companies coming into the market and providing GPU-computing next to their usual services. This is a sign of the market growing up.

We’re confident in growing further in our market, as we have the expertise to design fast software while the newcomers have gained expertise to write code that runs on the GPU with only little speedup.

Community: OpenCL:PRO to OpenCL.org

There have been more times when we wanted to support the community more. The first try was OpenCL:PRO and did not live long, as it was actually unclear to us what “the community” wanted.

In the end it was not that hard. Everybody who starts with OpenCL has the same problems:

  • Lack of convenience code, resulting in many, many wrappers and libraries that are incompatible.
  • Lack of practice projects.
  • Lack of overview on what’s available.

With OpenCL.org we aim to solve these problems together with the community. All is shared on Github and anybody can join to complete the information we’ve shared. While our homepage had around 40 pages on these subjects, it was only our personal view on the subjects or had outdated info.

So we’re going to donate most of the OpenCL-related technical pages we’ve written over the years to the community.

There is much more to share – watch our blog, the OpenCLorg twitter and newsletter!

Different Logo

For who remembered: in 2010 the logo looked quite different. We still use the blocks in the background (like on our Twitter account), but since 2014 the colours and font are quite different. This change has been going along with the company growing up. The old logo is careful, while the new one is bold – now we’re more confident about our expertise and value.

Over the past 3 years the new logo has stayed the same and has fully become our identity.

Same kind of customers

It has been quite a journey! We could not have done it without all the customers we served over those 7 years.

Thank you!

Qt Creator OpenCL Syntax Highlighting

With highlighting for Gedit, I was happy to give you the convenience of a nice editor to work on OpenCL-files. But it seems that one of the most popular IDEs for C++-programming is Qt Creator. So you receive another free syntax highlighter. You need at least Qt Creator 2.1.0.

The people of Qt have written everything you need to know about their Syntax highlighting, which was enough help to create this file. You see that they use the system of Kate, so logically this file works with this editor too.

In this article there is all you need to know to use Qt Creator with OpenCL.

Installing

First download the file to your computer.

Under Windows and OSX you need to copy this file to the directory shareqtcreatorgeneric-highlighter in the Qt installation dir (i.e. c:Qtqtcreator-2.2.1shareqtcreatorgeneric-highlighter). Under Linux copy this file to ~/.kde/share/apps/katepart/syntax or to /usr/share/kde4/apps/katepart/syntax (all users). That’s all, have fun!

How to speed up Excel in 6 steps

After the last post on Excel (“Accelerating an Excel Sheet with OpenCL“), there have been various request and discussions how we do “the miracle”. Short story: we only apply proper engineering tactics. Below I’ll explain how you can also speed up Excel and when you actually have to call us (last step).

A computer can handle 10s of gigabytes per second. Now look how big your Excel-sheet is and how much time it takes. Now you understand that the problem is probably not your computer.

Excel is a special piece of software from a developer’s perspective. An important rule of software engineering is to keep functionality (code) and data separate. Excel mixes these two as no other, which actually goes pretty well in many cases unless the data gets too big or the computations too heavy. In that case you’ve reached Excel’s limits and need to properly solve it.

An Excel-file often does things one-by-one, with a new command in every cell. This prevents any kind of automatic optimizations – besides that, Excel-sheets are very prone to errors.

Below are the steps to go through, of which most you can do yourself!

Continue reading “How to speed up Excel in 6 steps”

AMD GPUs & CPUs

[infobox type=”information”]

Need a programmer for OpenCL on AMD FirePro, Radeon or APU? Hire us!

[/infobox]

AMD has support for all their recent GPUs and CPUS, and has good performance on products starting from 2010/2011:

[list1]

[/list1]
AMD does not provide a standard SDK kit which contains both hardware and software, as their hardware is available at many computer-shops.

SDK

The OpenCL SDK (software) needs to be downloaded in several steps:

[list1]

[/list1]

CodeXL replaces the following software in de AMD APP software family:

[list1]

[/list1]
These are still available for download.

Training

There is (free) training material available:

[list1]

[/list1]

Other AMD software for OpenCL

The APP Math Libraries contain FFT and BLAS functions optimised for AMD GPUs.

OpenCL-in-Java can be done using Aparapi.

ARM Mali-T604 GPU has 3.5x more performance than dual core Cortex-A15

mont-blancAccording to the latest newsletter of the Mont-Blanc Project, it was explained that the GPU on a Samsung Exynos 5 is much faster and greener than its CPU: 3.5 times faster with half the energy. They built a supercomputer using 810 Exynos SoCs, that can deliver a 26 TFLOPS of peak performance. With the upcoming mobile GPUs becoming exponentially faster, they have all the expertise to build an even faster and greener ARM-supercomputer after this.

The Mont-Blanc compute cards deliver considerably higher performance; at 50% lower energy consumption, compared with previous ARM-based developer platforms.

The Mont-Blanc prototype is based on the Samsung Exynos 5 Dual SoC, which integrates a dual-core ARM Cortex-A15 and an on-chip ARM Mali-T604 GPU, and has been featured and market proven in advanced mobile devices. The dual-core ARM Cortex-A15 delivers twice the performance of the quad-core ARM Cortex-A9, used in the previous generation of ARM-based prototype, whilst consuming 20% less energy for the same workload. Furthermore, the on-chip ARM Mali-T604 GPU provides 3.5 times higher performance than the dual-core Cortex-A15, whilst consuming half the energy for the same workload.

Each Mont-Blanc compute card integrates one Samsung Exynos 5 Dual SoC, 4 GB of DDR3-1600 DRAM, a microSD slot for local storage and a 1 GbE NIC, all in an 85x56mm card (3.3×2.2 inches). A single Mont-Blanc blade integrates fifteen Mont-Blanc compute cards and a 1 GbE crossbar switch, which is connected to the rest of the system via two 10 GbE links. Nine Mont-Blanc blades fit into a standard BullX 9-blade INCA chassis. A complete Mont-Blanc rack hosts up to six such chassis, providing a total of 1620 ARM Cortex-A15 cores and 810 on-chip ARM Mali-T604 GPU accelerators, delivering 26 TFLOPS of peak performance.

“We are only scratching the surface of the Mont-Blanc potential”, says Alex Ramirez, coordinator of the Mont-Blanc project. “There is still room for improvement in our OpenCL algorithms, and for optimizations, such as executing on both the CPU and GPU simultaneously, or overlapping MPI communication with computation.”

Continue reading “ARM Mali-T604 GPU has 3.5x more performance than dual core Cortex-A15”

Is the CPU slowly turning into a GPU?

It's all in the plan
It’s all in the plan?

Years ago I was surprised by the fact that CPUs were also programmable with OpenCL – I solely chose that language for the cool of being able to program GPUs. It was weird at start, but cannot think of a world without OpenCL working on a CPU.

But why is it important? Who cares about the 4 cores of a modern CPU? Let me first go into why CPUs have had mostly 2 cores for so long, about 15 years ago. Simply put, it was very hard to program multi-threaded software that made use of all cores. Software like games did, as they needed all the available resources, but even the computations in MS Excel are mostly single-threaded as of now. Multi-threading was maybe used most for having a non-blocking user-interface. Even though OpenMP was standardised 15 years ago, it took many years before the multi-threaded paradigm was used for performance. If you want to read more on this, search the web for “the CPU frequency wall”.

More interesting is what is happening now with CPUs. Both Intel and AMD are releasing CPUs with lost of cores. Intel has recently a 18-core processor (Xeon E5 2699-v3) and AMD was offering 16-core CPUs for a longer time (Opteron 6300 series). Both have SSE and AVX, which means extra parallelism. If you don’t know what this is precisely about, read my 2011-article on how OpenCL uses SSE and AVX on the CPU.

AVX3.2

Intel now steps forward with AVX3.2 on their Skylake CPUs. AVX 3.1 is in XeonPhi “Knight’s Landing” – see this rumoured roadmap

It is 512-bits wide, which means that 8 times as much vector-data can be computed! With 16 cores, this would mean 128 float operations per clock-tick. Like a GPU.

The disadvantage is alike the VLIW we had in the pre-GCN generation of AMD GPUs: one needs to fill the vector-instructions to get the speed-up. Also the relatively slow DDR3 memory is an issue, but lots of progress is being made there with DDR4 and stacked memory.

B6r22cCIQAAEPmP

So is the CPU turning into a GPU?

I’d say yes.

With AVX3.2 the CPU gets all the characteristics of a GPU, except the graphics pipeline. That means that the CPU-part of the CPU-GPU is acting more like a GPU. The funny part is that with the GPU’s scalar-architecture and more complex schedulers, the GPU is slowly turning into a CPU.

In this 2012-article I discussed the marriage between the CPU and GPU. This merger will continue in many ways – a frontier where the HSA-foundation is doing great work now.  So from that perspective, the CPU is transforming into a CPU-GPU; and we’ll keep calling it a CPU.

This all strengthens my believe in the future of OpenCL, as that language is prepared for both task-parallel and data-parallel programs – for both CPUs and GPUs, to say it in current terminology.

http://www.flickr.com/photos/imabug/2946930401/

OpenCL Potentials: Medical Imaging

Photo by Eugene MahWhen you ever saw a CT or MRI scanner, you might have noticed the full-sized computer next to it (especially the older ones). There is quite some processing power needed to keep up with the data-stream coming from the scanner, to process the data to a 3D-image and to visualise the data on a 2D-screen. Luckily we have OpenCL to make it even faster; which doctor doesn’t want real-time high-resolution results and which patient doesn’t want to see the results on Apple iPad or Samsung Galaxy Tab?

Architects, bankers and doctors have one thing in common: they get a better feeling for the current subject if they can play with the data. OpenCL makes it possible to process data much faster and thus let the specialist play with it. The interesting part of IT is that it is in every domain now and therefore a new series: OpenCL-potentials.

Continue reading “OpenCL Potentials: Medical Imaging”

OpenCL basics: Multiple OpenCL devices with the ICD.

tesla-xeonphi-firepro
XeonPhi, Tesla, FirePro

Most systems nowadays have more than just one OpenCL device and often from different vendors. How can they all coexist from a programming standpoint? How do they interact?

OpenCL platforms and OpenCL devices

Firstly, please bear with me for a few words about OpenCL devices and OpenCL platforms.

An OpenCL platform usually corresponds to a vendor. This is responsible for providing the OpenCL implementation for its devices. For instance, a machine with an i7-4790 Intel CPU is going to have one OpenCL platform, probably named “Intel OpenCL” and this platform will include two OpenCL devices: one is the Intel CPU itself and the other is the Intel HD Graphics 4600 GPU. This Intel OpenCL platform is providing the OpenCL implementation for the two devices and is responsible for managing them.

Let’s have another example, but this time from outside the Windows ecosystem. A MacBook running OS X and having both the Intel Iris Pro GPU and a dedicated GeForce card will show one single OpenCL platform called “Apple”. The two GPUs and the CPU will appear as devices belonging to this platform. That’s because the “Apple” platform is the one providing the OpenCL implementation for all three devices.

Last but not least, keep in mind that:

  • An OpenCL platform can have one or several devices.
  • The same device can have one or several OpenCL implementations from different vendors. In other words, an OpenCL device can belong to more than just one platform.
  • The OpenCL version of the platform is not necessarily the same with the OpenCL version of the device.

The OpenCL ICD

ICD stands for Installable Client Driver and it refers to a model allowing several OpenCL platforms to coexist. It is actually not a core-functionality, but an extension to OpenCL.

  • For Windows and Linux the ICD has been available since OpenCL 1.0.
  • OSX doesn’t have an ICD at all. Apple chose to put all the drivers themselves under one host.
  • Android did not have the extension under OpenCL 1.1, but people ported its functionality. With OpenCL 2.0 the ICD is also on Android.

How does this model work?

ICD Diagram
The OpenCL ICD on Windos

While a machine can have several OpenCL platforms, each with its own driver and OpenCL version, there is always just one ICD Loader. The ICD Loader acts as a supervisor for all installed OpenCL platforms and provides a unique entry point for all OpenCL calls. Based on the platform id, it dispatches the OpenCL host calls to the right driver.

This way you can compile against the ICD (opencl.dll on Windows or libOpenCL.so on Linux), not directly to all the possible drivers. At run-time, an OpenCL application will search for the ICD and load it. The ICD in turn looks in the registry (Windows) or a special directory (Linux) to find the registered OpenCL drivers. Each OpenCL call from your software will be resolved by the ICD, which will further dispatch requests to the selected OpenCL platform.

A few things to keep in mind

The ICD gets installed on your system together with the drivers of the OpenCL devices. Hence, a driver update can also result in an update of the ICD itself. To avoid problems, an OS can decide to handle the OpenCL itself.

Please note that the ICD, the platform and the OpenCL library linked against the application may not necessarily correspond to the same OpenCL version.

I hope this explains how the ICD works. If you have any question or suggestion, just leave a comment. Also check out the Khronos page for the ICD extension. And if you need the sources to build your own ICD (with license that allows you to distribute it with your software), check the OpenCL registry on Khronos.

AMD is back!

AMD_Logo-and-wordmark-1024x768For years we haven been complaining on this blog what AMD was lacking and what needed to be improved. And as you might have concluded from the title of this blogpost, there has been a lot of progress.

AMD is back! It will all come together in the beginning of 2017, but you’ll see a lot of progress already the coming weeks and months.

AMD quietly recognised and solved various totally new problems in HPC, becoming the hidden innovator everybody needed.

This blog is to give an overview of how AMD managed to come back and what it took to get to there. Their market cap supports it, as you can see.

amd-market-cap-history
AMD’s market cap is back at 2012 levels (source)

Continue reading “AMD is back!”

Bits&Chips actie voor OpenCL training

visitekaartje-achter-2013-VU bent op deze pagina terecht gekomen via de nieuwsbrief van Bits&Chips of op aanraden van een vriend of collega.

Uw doel is om een zware berekening of beeldverwerking te versnellen. Wij leren u dat door u enkele concepten aan te reiken, zodat u uw software op een andere manier ontwerpt en programmeert. Met veel snellere software als resultaat. Dit doen wij aan de hand van OpenCL, een programmeertaal voor parallelle processoren.

De training

In 3½ dag leert u:

  • Ontwerpen van een parallelle software architectuur,
  • Gebruik te maken van de grafische kaart als co-processor,
  • De programmeertaal OpenCL,
  • Parallelle algoritmes implementeren.

De helft is uitleg door de trainer, de andere helft practica. Op de 4de dag bespreken we een probleem gezamenlijk, om alles samen te laten komen.

Na de training kunt u (zonder speciale tools) langzame code herkennen, deze opnieuw ontwerpen en porteren naar de grafische kaart. U krijgt het lesboek en wat basis-software mee naar huis, zodat u eenvoudig verder kunt oefenen.

De data

De volgende 3 trainingen hebben al enkele aanmeldingen. Ze hebben een speciaal thema en doelgroep.

U kunt naar iedere training, als u OpenCL wilt leren. De gebruikte boeken zijn per training verschillend.

De kleine lettertjes

Enkele belangrijke aspecten van de training:

  • De voertaal is Engels, ivm trainees uit Europa.
  • Een goede basis-kennis van C is noodzakelijk. U kunt gratis een speciale thuisstudie opgestuurd krijgen.
  • U dient tijd in te roosteren om na de training te oefenen, zodat u niet snel alles snel weer vergeet.
  • U heeft een laptop nodig.
  • Het programmeren gebeurt meestal op een Linux-server. De basis-kennis hiervoor wordt ter plekke uitgelegd.

Aanmelding

De normale kosten voor de training zijn €1850 per persoon. Als u Bits&Chips noemt, krijgt u €100 korting.

Het enige wat u nu hoeft te doen is een mailtje te sturen naar trainings@streamhpc.com met daarin uw gewenste datum en/of onderwerpen. Wij sturen u daarna een vragenlijst terug als voorbereiding op de training.

Voor vragen kunt u terecht bij +31854865760 (kantoor) or +31645400456 (mobiel).

We don’t work for the war-industry

Last week we emphasized that we don’t work for the war-industry. We did talk to a national army some years ago, but even though the project never started, we would have probably said no. Recently we got a new request, got uncomfortable and did not send a quote for the training.

https://twitter.com/StreamHPC/status/1055121211787763712

This is because we like to think about the next 100 years, and investment in weapons is not something that would solve things for the long term.

To those, who liked the tweet or wanted to, thank you for your support to show us we’re not standing alone here. Continue reading “We don’t work for the war-industry”

HPC Appliance

serverWe offer fully supported computer appliances with our high-performance software included, to integrate with your IT workflow. Our solution makes it possible to standardise office PCs and still offer high-performance solutions to both your employees and customers.

Applications are very diverse:

  • Centralised office compute – running heavy simulations on a secure private office-cloud
  • Products with high compute-need – focus on your product, leveraging our expertise
  • Last, but not least: magic black boxes that solve everything.

Our advantages are:

  • Appliances are tailored to your requirements.
  • Units can be shipped from 1 to 10k units.
  • Software is maintained, improved and coupled with a safe update procedure.

Gratis kennislunch bij u op locatie in Nederland/België

Steel pipesDutch only

Zouden u en uw collega’s meer willen weten van termen zoals GPGPU, OpenCL en massive multi-core? Wilt u zich voorbereiden op de competitieve markt doordat software ineens enkele enkele factoren sneller kan zijn indien juist geprogrammeerd?

Afgezien dat we er eigenlijk niet meer omheen kunnen, leveren multi-core processoren met honderden cores enkele voordelen op. Als software geschreven is voor vele lichte processoren in plaats van een paar grote, dan is opschalen eenvoudiger. Dit is ook een verklaring waarom grafische kaarten een rekenkundig probleem enkele factoren sneller doorrekenen dan een gewone processor. Nog een voordeel is dat flexibel opschalen voor eenvoudigere en betere energie-besparing zorgt. Het plaatje met de buizen is bewust gekozen: we begonnen met een enkele core tot eind jaren ’90, kregen begin 21ste eeuw meerdere cores en nu krijgen we nog meer cores – denk aan 500 of meer cores.

In maximaal een uur tijd wordt een overzicht gegeven van de nieuwe mogelijkheden aan de hand van praktijkvoorbeelden die bij uw sector passen. Loopt u tegen de limieten van dataverwerking aan? En wilt u weten of de nieuwe type processoren uw probleem kunnen oplossen? Dan is dit de kans om informatie in te winnen. Anderen die dezelfde informatie eerder hebben ontvangen, gaven aan dat ze vooraf niet het idee hadden dat er zoveel nieuwe mogelijkheden op processor-gebied zijn bijgekomen, en dat ze nu veel IT-nieuws rondom cloud-computing en big-data beter konden plaatsen.

Interesse? Wij komen graag bij u langs. Stuur een mail naar kennislunch@StreamHPC.nl en wij nemen contact met u op, of u belt met 06-45400456 om direct een afspraak te maken.

StreamHPC is gespecialiseerd in het maximaliseren van de reken-snelheid in dataverwerking. We bieden trainingen in OpenCL en leveren versnelde software op maat. Deze kennis-lunch zien wij als een prettige manier om met u kennis te maken – u bent niet verplicht tot afname van diensten bij ons of onze partners.

“That is not what programmers want”

the-miracle-middle-colour2
I think you should be more explicit here in step two” (original print)

This post is part of the series Programming Theories, in which we discuss new and old ways of programming.

When discussing the design of programming languages or the extension of existing ones, the question What concepts can simplify the tasks of the programmer? always triggers lots of interesting debates. After that, when an effective solution is found, inventors are cheered, and a new language is born. Up ’till this point all seems ok, but the problem comes with the intervention of the status quo: C, C++, Java, C#, PHP, Visual Basic. Those languages want the new feature implemented in the way their programmers expect it. But this would be like trying to implement the advantages of a motorcycle into a car without paying attention to the adjustments needed by the design of the car.

I’m in favor of learning concepts instead of doing new things the old way… but only when the latter has proven to be better than the former. The lean acceptance of i.e. functional languages tells a lot about how it goes in reality (with great exceptions like LINQ). That brings a lot of trouble when moving to multi-core. So, how do we get existing languages to change instead of just evolve?

High Level Languages for Multi-Core

Let’s start with a quote from Edsger Dijkstra:

Projects promoting programming in “natural language” are intrinsically doomed to fail.

In other words: a language can be too high level. A programmer needs the language to be able to effectively micro-manage what is being done. We speak of concerns for a reason. Still, the urge to create the highest programming language is strong.

Don’t get me wrong. A high-level language can be very powerful once its concepts define both ways. One way concerns the developer: does the programmer understand the concept and the contract of the command or programming style being offered? The other concerns the machine: can it be effectively programmed to run the command, or could a new machine be made to do just that? This two-side contract is one of the reasons why natural languages are not fit for programming.

And we have also found out that binary programming is not fit for humans.

The cartoon refers to this gap between what programmers want and what computers want.

Continue reading ““That is not what programmers want””

Memberships

We are active in several foundations, communities and collaborations. Below is an overview.

Khronos

Khronos_500px_Dec14

Associate member of Khronos, the non-profit technology consortium that maintains important languages like OpenCL, OpenGL, SPIR and Vulkan.

The Khronos Group was founded in 2000 to provide a structure for key industry players to cooperate in the creation of open standards that deliver on the promise of cross-platform technology. Today, Khronos is a not for profit, member-funded consortium dedicated to the creation of royalty-free open standards for graphics, parallel computing, vision processing, and dynamic media on a wide variety of platforms from the desktop to embedded and safety critical devices.

High Tech NL

High Tech NL is the sector organization by and for innovative Dutch high-tech companies and knowledge institutes. High Tech NL is committed to the collective interests of the sector, with a focus on long-term innovation and international collaboration.

We’re a member because HighTech NL is one of the few organizations that understands IT is far more than digitisation. Our main focus there is robotics.

HSA Foundation

HSA-logo

Heterogeneous System Architecture (HSA) Foundation is a not-for-profit industry standards body focused on making it dramatically easier to program heterogeneous computing devices. The consortium comprises various semiconductor companies, tools providers, software vendors, IP providers, and academic institutions that develops royalty-free standards and open-source software.

HiPEAC

HiPEAC’s mission is to steer and increase the European research in the area of high-performance and embedded computing systems, and stimulate (international) collaborations.

We’ve sponsored multiple conferences over the years.

ETP4HPC

ETP4HPC is the European Technology Platform (ETP) in the area of High-Performance Computing (HPC). It is an industry-led think-tank comprising of European HPC technology stakeholders: technology vendors, research centres and end-users. The main objective of ETP4HPC is to define research priorities and action plans in the area of HPC technology provision (i.e. the provision of supercomputing systems).

OpenPower

OpenPOWER Foundation is an open, not-for-profit technical membership group incorporated in December 2013. It was incepted to enable today’s data centers to rethink their approach to technology. OpenPOWER was created to develop a broad ecosystem of members that will create innovative and winning solutions based on POWER architecture.

European HPC Magazines

If one thing can be said about Europe, is that it is quite diverse. Each country  solves or fails to solve its own problems individually, while European goals are not always well-cared for. Nevertheless, you can notice things changing. One of the areas where things have changed, is that of HPC. HPH  has always been a well-interconnected research in Europe (with its centre in CERN), but there is a catch-up going on for the European commercial market. The whole of Europe has new goals set for better collaboration between companies and research institutes with programs like Horizon 2020. This means that it becomes necessary to improve interconnections among much larger groups.

In most magazines HPC is a section of a broader scope. This is also very important as this introduces HPC to more people. Now, I’d like to concentrate on the focus magazines. There are mainly two magazines available: Primeur Magazine and HPC Today.

Primeur Magazine

logo-weeklyDe Netherlands based magazine Primeur-magazine has been around for years, with HPC-news from Europe, video-channel, knowledge-base, calendar and more. Issues of past weeks can be read online (for free), but news can also be delivered via a weekly e-mail (paid service, prices range from €125 to €4000 per company/institute, depending on size).

They focus on being a news-channel for what is going on in in the HPC-world, both in the EU and the US. Don’t forget to follow them on Twitter.

HPC Today

Update: the magazine changed its name from HPC Magazine to HPC Today.

With several editions (Americas, Europe and France), websites and TV channels, the France based HPC Today brings an actionable coverage of the HPC and Big Data news, technologies, uses and research. Subscriptions are free, as the magazine is paid-for by advertisements. They balance their articles by targeting both people who deeply understand malloc() and people who want to know what is going on. Their readers are developers and researchers from both academic and private sectors.

With the change to HPC Today the content has slightly changed according the requests from the readers: less science, more HPC news. For the rest it’s about the same.

HPC-today
To get an idea of how they’re doing, check the partners of HPC Magazine: Teratec, ISC events and SC conference.

Other European HPC sources

Not all information around the web is nicely bundled in a PDF. Find a small list below to help you start.

InSiDE

The German National Supercomputing Centers HLRS, LRZ, NIC publish the online magazine InSiDE (Innovatives Supercomputing in Deutschland) twice a year. The articles are available in html and PDF. It gives a good overview of what is going on in Germany and Europe. There are no ways to subscribe via e-mail, so it would be better to put it in your calendar.

e-IRG

The e-Infrastructure initiative‘s main goal is to support the creation of a political, technological and administrative framework for an easy and cost-effective shared use of distributed electronic resources across Europe.

e-IRG is not a magazine, but it is a good start to find information about HPC in Europe. Their knowledge-base is very useful when trying to get an overview what is there in Europe: Projects, Country-statistics, Computer centers and more. They closely collaborate with Primeur-magazine, so can you see some overlap in the information.

PRACE Digest

PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery, as well as engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to achieve this mission by offering world class computing and data management resources and services through a peer review process.

The PRACE digest appears as twice a year as a PDF.

More?

Did we miss an important news-source or magazine? Let us know in the comments below!

Event: Embedded boards comparison

A 2012 board
One of the first OpenCL enabled boards from 2012.

Date: 17 September 2015, 17:00
Location: Naritaweg 12B, Amsterdam
Costs: free

Selecting the right hardware for your OpenCL-powerd product is very important. We therefore organise a three hour open house where we you can test, benchmark and discuss many available chipsets that support OpenCL. For each showcased board you can read and hear about the advantages, disadvantages and preferred types of algorithms.

Board with the following chipsets will be showcased:

  • ARM Mali
  • Imagination PowerVR
  • Qualcomm Snapdragon
  • NVidia Tegra
  • Freescale i.MX6 / Vivante
  • Adapteva

Several demo’s and benchmarks are prepared that will continuously run on each board. We will walk around to answer your questions.

During the evening drinks and snacks are available.

Test your own code

There is time to test OpenCL code for free in our labs. Please get in contact, as time that evening is limited.

Registration

Register by filling in the below form. Mention with how many people you will come, if you come by car and if you want to run your own code.

[contact_form]

OpenCL.org

opencl-logoLast year we bought OpenCL.org with the purpose to support the OpenCL community and OpenCL-focused companies. In january we launched the first community-project on the website, porting GEGL to OpenCL. See below for more info.

The knowledge section of our homepage will be moved to the OpenCL.org website, but still be maintained by us.

GEGL project

GEGL is a free/libre graph based image processing framework used by GIMP, GNOME Photos, and other free software projects.

In january 2016 we launched an educational initiative that aims to get more developers to study and use OpenCL in their projects. Within this project, up to 20 collaborators will port as many GEGL operations to OpenCL as possible.

The goal of this project is to seek a way for a group to educate themselves in OpenCL, while supporting an open source project. One of the ways is to gamify the porting by benchmarking the kernels and defining winners, and another way is to optimize kernels within StreamHPC to push the limits. Victor Oliveira, who wrote most of the OpenCL code in GEGL, joined the GEGL-OpenCL project to advise.

All work is being done on GitHub. The communication between participants is taking place in a dedicated Slack channel (invite-only).

Want to have a vote on what is the next porting project after GEGL? Vote here.

PRACE Spring School 2014

prace-spring-school-2014On 15 – 17 April 2014 a 3-day workshop around HPC is organised. It is free, and focuses on bringing industry and academy together.

Research Institute for Symbolic Computation (RISC) / Johannes Kepler University Linz Kirchenplatz 5b (Castle of Hagenberg) 4232 Hagenberg Austria

The PRACE Spring School 2014 will take place on 15 – 17 April 2014 at the Castle of Hagenberg in Austria. The PRACE Seasonal School event is hosted and organised jointly by the Research Institute for Symbolic Computation / Johannes Kepler University Linz (Austria), IT4Innovations / VSB-Technical University of Ostrava (Czech Republic) and PRACE.

The 3-day program includes:

  • A 1-day HPC usage for Industry track bringing together researchers and attendees from industry and academia to discuss the variety of applications of HPC in Europe.
  • Two 2-day tracks on software engineering practices for parallel & emerging computing architectures and deep insight into solving multiphysical problems with Elmer on large-scale HPC resources with lecturers from industry and PRACE members.

The PRACE Spring School 2014 programme offers a unique opportunity to bring users, developers and industry together to learn more about efficient software development for HPC research infrastructures. The program is free of charge (not including travel and accommodations).

Applications are open to researchers, academics and industrial researchers residing in PRACE member countries, and European Union Member States and Associated Countries. All lectures and training sessions will be in English.

Applications are open to researchers, academics and industrial researchers residing in PRACE member countries, and European Union Member States and Associated Countries. All lectures and training sessions will be in English. Please visit http://prace-ri.eu/PRACE-Spring-School-2014/ for more details and registration.

At StreamHPC we support such initiatives.