Brochures

We have two brochures. One general and one for training. You can also generate PDFs from most pages on this website to support off-line discussion.

New versions will arrive soon – unfortunately there have been quite some delays.

[columns]
[one_half title=”Training”]

In the brochure for trainings (version January 2012) all training modules are written out.

Download our trainings brochure here.

[/one_half]
[one_half title=”General”]

In the general brochure (version May 2011) we explain both our consultancy and training services.

Download our general brochure here.

[/one_half]
[/columns]

StreamHPC.Brochure.2010-01.EN.pdf

Basic concept: Hosts and devices

Time for some basic concepts of OpenCL. As I notice a growing number of visitors to this page, I notices I have actually not written much about coding and basics.

One of the first steps of an OpenCL program is selecting hosts and devices. If you program for a tablet, which has one chip and a screen, you don’t think of several devices. And if you log in on a server, your context is there is one host and that’s the one you logged into. If you have read my article about how to install all drivers on Ubuntu, you have gotten several clues. I added some tips&tricks, but not too many. If you know more stuff about this subject yourself, please share with others in the comments.

Continue reading “Basic concept: Hosts and devices”

A typical week

Primary and secondary tasks

The main focus is programming and solving problems. But that means that everything that obstructs this focus, needs to be gotten out of the way. This is simpler on paper than in reality and therefore there are multiple “faiths” among company, how to do this.

We start with clearly distincting primary and secondary tasks, where the difference is that there needs to be more time spent on the primary tasks in the long term. The last part of the sentence is very important.

What we do every day and week:

  • Planning
    • Write issues
    • Make issue estimations
    • Prioritize issues
    • Bundle issues in epics
    • Pick issues for personal weekly milestones
  • Problem-solving
  • Coding and math
  • Learning
    • Reading books
    • Reading papers
    • Watching videos

Why so much emphasis on planning?

The planning-part takes good time, but refrains us from spending too much time on dead ends. And spending time on dead ends is not a primary task at all. Also planning helps with designing better strategies – there is limited time for solving problems and coding software, so doing a full-scope research is not going to work. As there is no way to efficiently build complex code without any time-estimations on the different approaches, planning-skills provide the necessary foundations for becoming a senior coder.

We start as early as possible to train these skills, so also juniors are asked to do all planning-tasks. Initially this takes a good part of the valuable coding-time but quickly goes down and first advantages are seen.

Style of project handling

Tools

We mostly use Gitlab and Mattermost to share code and have discussions. This makes it possible to keep good track of each project – searching for what somebody said or coded two years ago is quite easy. Using modern tools has changed the way we work a lot, thus we have questioned and optimized everything that was presented as “good practice”.

We continuously look into new tools that can help us improve. Also here the main focus is to reduce the time on secondary tasks, so we can spend more time thinking on problem-solving.

Pull-style project management

The tasks are written down by the team, using the project-doc as input. All these tasks are put into the task-list of the project and estimated. Then each team member picks the tasks that are a good fit. There are always tasks that need to be pushed instead of pulled, but luckily that’s a relatively small part of all work.

All code (MR) is checked by one or two colleagues, chosen by the one who wrote the code. More important are the discussions in advance, as the group can give more insight than any individual and one can get into the task well-prepared. The goal is not to get the job finished, but not having written the code where a future bug has been found.

All types of code can contain comments and Doxygen can create documentation automatically, so there is no need to copy functions into a Word-document. Log-style documentation was introduced, as git history and Doxygen don’t answer why a certain decision has been made. By writing down a logbook, a new member of the team can just read these remarks and fully understand why the architecture is how it is and what the limits are. We’ll discuss this in more detail later.

These type of solutions describe how we work and differ from a corporate environment: no-nonsense and effective.

The week

If you’d work here, how would your week look like the first year? Specifically saying the first year, as for more complex projects, different approaches could be chosen.

Monday weekly planning

Together with your team you pick up the issues for the week. The issues should have estimations, or these will be done during that meeting. When your week is filled, you know what to do.

Monday weekly meeting

Every Monday we have a weekly meeting to share with everybody how the other projects are doing.

Mon-Fri: Daily standup

Retrospective of the previous day, and tuning of the day ahead.

Practice:

  • Tools
  • C/C++
  • GPGPU
  • Scrum

Friday closing

Weekly retrospective, cleaning up, writing notes on issues, etc.

Weekly customer meetings

Here we discuss the progress and anything blocking. The customer shares their progress, and together problems can be solved.

Many projects have a shared (high-level) issue-list, so the progress is continuously synced with the customer and communication is easy.

IWOCL 2017 Toronto call for talks and posters is open

The fifth International Workshop on OpenCL (IWOCL) will be held on 16-18 May 2017 in Toronto, Canada. The event kicks-off with a full-day Advanced Hands-On OpenCL tutorial which is followed by two-days of conference: keynotes, academic papers, technical presentations, tutorials, poster sessions and table-top demonstrations.

IWOCL 2017 Call for Submission Now Open – Submit your abstract here. Deadline is beginning of February, so better submit the coming month!

Call for IWOCL 2017 Annual Sponsors is also open. For that contact the IWOCL organisation via this webform.

Every year there have been unique conversations having real influence on the OpenCL standard, and we heard real-life development experience during various talks. If you missed the real technical talks at certain other GPU conferences, then IWOCL is where you should go.

Let us do your peer-review

cuda-3-728There are many research papers that claim enormous speed-ups using an accelerator. From our experience a large part is because of code-modernisations (parallisation & optimisation), which makes the claim look false. That’s why we offer peer-reviews for half our rate for CUDA and OpenCL software. The final costs depend on the size and complexity of the code.

We will profile your CPU and Accelerator code on our machines and review the code. The results are the effect of the code-modernisations and the effect of using the accelerator (GPU, XeonPhi, FPGA). With this we hope that we stimulate the effect of code-modernization gets more research attention over using “miracle hardware”.

Don’t misunderstand: GPUs can still get an average of 8x speedup (or 700% speed improvement) over optimised code, which is still huge! But it’s simply not the 30-100x speed-up claimed in the slide at the right.

 

StreamComputing is 2 years old! A personal story.

More than two years ago, on 13 January 2010, I wrote my first blog-post. Four months later StreamComputing (redacted: rebranded to StreamHPC in 2017) was both official and unknown. I want to share with you my personal story on how I got to start-up this company.

The push-factor

I wanted to create a company which was about innovative projects –  something I had hardly encountered until then. The years before I programmed parts of A-to-B-flows, as I call them. That is software that is in the base quite simple, but tediously discussed as very, very complex.

“Complex” software

The complexity is not the software, as you can see. It is undocumented APIs, forgotten knowledge, knowledge in heads of unknown people, bossy and demanding people who friendly ask for last-minute architecture changes, deadlines around promotion-rounds, new deadlines due to board-decisions, people being afraid of getting replaced if the software is finished, jealousy if another team makes version 2 of the software, etc. The rule of office-software is therefore understandable:

Software is either unfinished,
or turned into a platform for unintended functionality.

The fun in office-software is there for analyst, architect or manager – the developer just puts in his earphones and makes all the requested changes (hooray for services like Spotify). But as I did not want to become a manager and wished to keep improving my development skills, I had to conclude I was on the wrong track.

Continue reading “StreamComputing is 2 years old! A personal story.”

Applied GPGPU-days Amsterdam 2013

6754632287-2December 2013: Videos are not ready yet, but link will be put here.

Amsterdam, 20 June – Applied GPGPU-days in Amsterdam. Keep your agenda free for this event.

What can you do with GPUs to speed up computations? This year we can see various examples where OpenCL and CUDA have been used. We hope to give you an answer if you can use GPUs for your software, research or algorithm.

After the success of last year (fully booked with 66 attendees), we now have reserved a larger location with place for 100 people. Difference with last year is that we focus more on applications, less on technical aspects.

The program has been made public recently:

Title of talk Company/Institute Presenter
Introduction to GPGPU and GPU-architectures StreamHPC Vincent Hindriksen
Blender Cycles & Tiles: Enhancing user experience AtMind bv Monique Dewanchand & Jeroen Bakker
XeonPhi vs K20: The fight of the titans SURFsara Evghenii Gaburov
A real-time simulation technique for ship-ship and ship-port interaction PMH bv Jo Pinkster
CUDA Accelerated Neural Networks LIACS Ana Balevic
Efficient Reconstruction of Biological Networks via Transitive Reduction on GPUs TU Eindhoven Anton Wijs
Running Petsc on GPUs with an example from fluid dynamics SURFsara Thomas Geenen
Connected Component Labelling, an embarrassingly sequential algorithm Leeuwarden University Jaap van de Loosdrecht
Visualizing sound and vibrations using a GPU and a 1024-channel microphone array TU Eindhoven Wouter Ouwens
Gravitational N-body simulations on 1 to many GPUs Leiden observatory Jeroen Bédorf

A few demos will be shown.

For more information, see the Platform Parallel webpage. Also to find other events by the platform.

Tickets are €75,-. If you are from a Dutch university or research institute affiliated with SURF, your ticket has been fully sponsored by SURFsara.

Associated events in the Netherlands

For the technical aspects (GPU-programming techniques, optimisation, etc) we have a special day: the GPU Dev Day 2013. More information on the Platform Parallel webpage. Date and place will be made public in June.

The first Khronos Meetup Benelux will take place just before the Applied GPGPU day, on 19 June in Amsterdam. More information on the meetup-page.

OpenCL potentials: Watermarked media for content-protection

HTML5 has the future, now Flash and Silverlight are abandoning the market to make the way free for HTML5-video. There is one big problem and that is that it is hard to protect the content – before you know the movie is on the free market. DRM is only a temporary solution and many times ends in user-frustration who just want to see the movie wherever they want.

If you look at e-books, you see a much better way to make sure PDFs don’t get all over the web: personalizing. With images and videos this could be done too. The example here at the right has a very obvious, clearly visible watermark (source), but there are many methods which are not easy to see – and thus easier to miss by people who want to have needs to clean the file. It therefore has a clear advantage over DRM, where it is obvious what has to be removed. Watermarks give the buyers freedom of use. The only disadvantage is that personalised video’s ownership cannot be transferred.

Continue reading “OpenCL potentials: Watermarked media for content-protection”

DirectCompute’s unpopularity

In the world of GPGPU we have currently 4 players: Khronos OpenCL, NVIDIA CUDA, Microsoft DirectCompute and PathScal ENZO. You probably know CUDA and OpenCL already (or start reading more articles from this blog). ENZO is a 64bit-compiler which serves a small niche-market, and DirectCompute is built on top of CUDA/OpenCL or at least uses the same drivers.

Edit 2011-01-03: I was contacted by Pathscale about my conclusions about ENZO. The reason why not much is out there is that they’re still in closed alpha. Expect more to hear from them about ENZO somewhere in the coming 3 months.

A while ago there was an article introducing OpenCL by David Kanter who claimed on page 4 that DirectCompute will win from CUDA. I quote:

Judging by history though, OpenCL and DirectCompute will eventually come to dominate the landscape, just as OpenGL and DirectX became the standards for graphics.

I twittered that I totally disagreed with him and in this article I will explain why I think that.

Continue reading “DirectCompute’s unpopularity”

What is OpenCL?

OpenCL (trademark of Apple Computers Inc.) is an open, royalty-free industry standard that makes much faster computations possible. The standard is controlled by non-profit standards organisation Khronos. By using this technique and graphics cards (GPUs) or extensions of modern processors you can for example convert a video in 20 minutes instead of 2 hours.

Programming the GPU was a very difficult task done by specialised teams and universities, but since 2010 it is in reach of more companies.

Below is a video which explains the differences between single-core, multiple core (starting at 1:27) and OpenCL (starting at 2:32).

http://www.youtube.com/watch?v=IEWGTpsFtt8

You can read more about the engineering ins and outs of the standard at http://www.khronos.org/opencl/.

How OpenCL works

OpenCL is an extension to existing languages. It makes it possible to specify a piece of code that is executed multiple times independently from each other. This code can run on various processors – not only the main one. Also there is an extension for vectors (float2, short4, int8, long16, etc), because modern processors have support for that.

So for example you need to calculate Sin(x) of a large array of one million numbers. OpenCL detects which devices could compute this for you and gives some statistics of each device. You can pick the best device, or even several devices, and send the data to the device(s). Normally you would loop over the million numbers, but now you say something like: “Get me Sin(x) of each x in array A”. When finished, you take the data back from the device(s) and you are finished.

As the compute-devices can do more in parallel and OpenCL is better in describing independent functions, the total execution time is much lower than conventional methods.

5 questions on OpenCL

Q: Why is it so fast?
A: Because a lot of extra hands make less work, the hundreds of little processors on a graphics card being the extra hands. But cooperation with the main processor keeps being important to achieve maximum output.

Q: Does it work on any type of hardware?
A: As it is an open standard, it can work on any type of hardware that targets parallel execution. This can be a CPU, GPU, DSP or FPGA.

Q: How does it compare to OpenMP/MPI?
A: Where OpenMP and MPI try to split loops over threads/servers and is CPU-oriented, OpenCL focuses on getting threads being data-position aware and making use of processor-capabilities. There are several efforts to combine the two worlds.

Q: Does it replace C or C++?
A: No, it is an extension which integrates well with C, C++, Python, Java and more.

Q: How stable/mature is OpenCL?
A: Currently we have reached version 1.2 and is 3 years old. OpenCL has many predecessors and therefore quite older than 3 years.

What does Khronos has more to offer than OpenCL and OpenGL?

opencl_from_accelerate_your_worldThe OpenCL standard is from the not-for-profit industry consortium Khronos Group. But they do a lot more, like the famous standard OpenGL for graphics. Focus of the group has always been on multimedia and getting the fastest results out of the hardware.

Now open source and open standards are getting more important, collabroations like the Khronos Group, get more attention. At StreamHPC we are very happy with this trend, as the business models are more focused on collaborations and getting things done than on making sure the customer cannot ever leave.

Below is an overview of the most important APIs that Khronos has to offer.

OpenCL related

  • OpenCL: compute
  • WebCL: web compute
  • SPIR/SPIR-V: intermedia language for compute-kernels, like those of OpenCL and OpenGL’s GSLS
  • SYCL: high-level language for OpenCL

OpenGL related

  • Vulkan: state-less graphics
  • OpenGL: graphics
  • OpenGL ES: embedded graphics
  • WebGL: web graphics
  • glTF: runtime asset format for WebGL, OpenGL ES, and OpenGL
  • OpenGL SC: Graphics for Safety Critical operations
  • EGL: interface between rendering APIs such as OpenGL ES and the underlying native platform window system, such as X.

Streaming input and output

  • OpenMAX: interface for multimedia codecs, platforms and hardware
  • StreamInput: interface for sensors
  • OpenVX: OpenCV-alternative, built for performance.
  • OpenKCam: interface for cameras and sensors

Others

One video called “OpenRoad” to show them all:

http://www.youtube.com/watch?v=ckD0op6OgMQ

Want to learn more? Feel free to ask in the comments, or check out https://www.khronos.org/

Software Performance is a Competitive Advantage

We are in the niche of GPGPU-computing, where GPUs are programmed to efficiently run scientific and large-scale simulations, AI training/inference and other mathematical compute-intensive software. As a recognized expert, customers from mostly US and Europe trust us to speed up their software.

Our projects range from several person-weeks to fix software performance problems, to several person-years to build extensive high performance software and libraries.

Join a growing list of companies that trust us with designing and building their core software with performance in mind.

A selection of Projects

From latest to oldest (2014):

  • Speeding up special purpose camera on mobile phones [C++, OpenCL, Vulkan]. Increasing the frame rate from stuttering frames to a responsive video-stream on a smartphone, made it possible to use the camera in new application areas.
  • Speeding up Generative AI software on MacOS [Objective C++, Metal]. Using MAC Studios with M1 and M2 chips, we reached theoretical max performance for offline Generative AI (making nice pictures).
  • Achieve 1 PFLOPS Attention on A Single H100 SXM [C++, CUDA]. We built the world’s first 1PF+ performance for the Attention-algorithm.
  • Writing a Compiler Test Suite for a C++ kernel language [OpenCL, C, C++]. For a large vendor we provided an extensive suite of tests to make sure the compiler is according specs. We made that update, which was a big change from 2.1 because of the addition of C++ kernels.
  • Porting GROMACS, OpenMM, AMBER and more to AMD MI100 GPUs [HIP, SYCL, C++, …]. AMD got awarded various supercomputers in 2022 and 2023 to use their GPUs, and it was therefore crucial to make sure that popular CUDA-optimized software would shine on AMD MI100 GPUs. While we were busy optimizing code, it also ran faster on Nvidia GPUs – this means the comparisons between Nvidia and AMD are fair, and not influenced by single-sided optimizations. If you run one of these softwares on your local supercomputer – you’re welcome. One example: Efficient molecular dynamics simulations on LUMI
  • Building the Khronos OpenCL SDK [OpenCL, C, C++]. It was always a wish to make OpenCL more than just the language. So we were happy when awarded Github
  • Speeding up pyPasWAS 3 to 5x [C, Python, OpenCL]. We boldly claimed that we could speed up this open-source software to do DNA/RNA/protein sequence alignment and trimming, and so we did. Speedup depends on the data. Read more on the blog
  • Building multiple libraries for AMD [HIP, C++]. Several foundational libraries on ROCm Github were built by us, and we still maintain. This project is still active.
    • rocRAND [HIP, C++]. The world’s fastest random number generator (or second, depending on Nvidia’s response) is built for AMD GPUs, and it’s open source. With random numbers generated at several hundreds of gigabytes per second, the library makes it possible to speed up existing code numerous times. The code is often faster than Nvidia’s cuRAND and is therefore the preferred library to be used on any high-end GPU.
    • rocThrust – AMD’s optimized version of Thrust [HIP, C++]. Highly optimized for CDNA GPUs. Lots of software for CUDA is Thrust based, and now has no lock-in anymore.
    • hipCUB – AMD’s optimized version of CUB [HIP, C++]. Highly optimized for CDNA GPUs. Now porting CUB-based software to AMD is a lot simpler. Both rocThrust and hipCUB share a library rocPRIM which unites many of the GPU-primitives.
  • Porting a set of ADSL-algorithms to an embedded special purpose GPU [OpenCL, C, C++]. Allowing central ADSL-routers in large buildings to handle modern ADSL-protocols.
  • Optimizing and extending the main image processing framework of a large photo hosting platform [CUDA, C++, AWS]. This project is still active. Here we make sure that nobody notices that the original photos are optimized for the current screen on-the-fly, while also providing additional filters and features.
  • Flooding simulation [OpenCL, C++, MPI]. Software that simulates flooding of land, which we ported to multi-GPU on OpenCL and got a 35x speedup over MPI. Read more on the blog
  • Further speeding up CUDA-enabled Quantum Chemistry software [CUDA, C++], a general purpose quantum chemistry software, called TeraChem, designed to run on NVIDIA GPU architectures. Our work resulted in adding an extra 70% performance to the already optimized CUDA-code.
  • Porting Manchester University’s UNIFAC to OpenCL on XeonPhi [OpenCL, C++, MPI]. Even though XeonPhi Knights Corner is not a very performant accelerator, we managed to get a 160x speedup, starting from single threaded code. Most of the speedup is due to clever code-optimizations and less due to low-level optimizations. Where OpenMP could get the single threaded code down to about 8 seconds, we brought it down to 0.062 seconds. Read more on the blog
  • Porting Gromacs from CUDA to OpenCL [CUDA, OpenCL, C, C++]. Until we ported the simulation software end of 2014, it has been CUDA-only. Porting took several man-months to manually port all code. You can now download the source, build it and run it on AMD/Intel hardware. All is open source, so you can see our code. Read more on the blog. The backend has been deprecated in favor of SYCL.

We have helped many more companies become competitive. Some we could vaguely describe, and some we can’t mention. See below the programming languages we worked with, as not all show up in the above list.

Technologies we work with

Xeon Phi Knights Corner compatible workstation motherboards

xeonphiIntel has assumed a lot if it comes to XeonPhi’s. One was that you will use it on dual-Xeon servers or workstations and that you already have a professional supplier of motherboards and other computer-parts. We can only guess why they’re not supporting non-professional enthusiasts who got the cheap XeonPhi.

After browsing half the internet to find an overview of motherboards, I eventually emailed Gigabyte, Asus and ASrock for more information for a desktop-motherboard that supports the blue thing. With the information I got, I could populate the below list. Like usual we share our findings with you.

Quote that applies here: “The main reason business grade computer supplies can be sold at a higher price is that the customers don’t know what they’re buying“. When I heard, I did not know why the customer is not well-informed – now I do. Continue reading “Xeon Phi Knights Corner compatible workstation motherboards”

StreamComputing is 7 years!

As of 1 April we are 7 years old. Because of all the jokes on that day, this post is a bit later.

Let me take you through our journey how we grew up from a 1-person company to what we’re now. With pride I can say that (with ups and downs) StreamComputing (now rebranded to StreamHPC) has become a brand that equals to (extremely) fast software, HPC, GPUs and OpenCL.

7 years of changes

Different services

After 7 years it’s also time for changes. Initially we solely worked on OpenCL related services, mostly GPUs. And this is what we’re currently doing:

  • HPC GPU computing: OpenCL, CUDA, ROCm.
  • Embedded GPU computing: OpenCL, CUDA, RenderScript, Metal.
  • Networked FPGA programming: OpenCL.
  • GPU-drivers testing and optimisation.
  • Software architecture optimisations.

While you see OpenCL a lot, our expertise in vendor-specific CUDA (NVidia), ROCm (AMD), RenderScript (Google) and Metal (Apple) cannot be ignored. Hence the “Performance Engineers” and not “GPU consultants” or “OpenCL programmers”.

From Fixers to Builders and getting new competition

Another change is that we have been going from fixing code afterwards to building software.

This has been a slow process and had to do with the confidence in performance engineering as an expert profession instead of a trick. We’re seeing new companies coming into the market and providing GPU-computing next to their usual services. This is a sign of the market growing up.

We’re confident in growing further in our market, as we have the expertise to design fast software while the newcomers have gained expertise to write code that runs on the GPU with only little speedup.

Community: OpenCL:PRO to OpenCL.org

There have been more times when we wanted to support the community more. The first try was OpenCL:PRO and did not live long, as it was actually unclear to us what “the community” wanted.

In the end it was not that hard. Everybody who starts with OpenCL has the same problems:

  • Lack of convenience code, resulting in many, many wrappers and libraries that are incompatible.
  • Lack of practice projects.
  • Lack of overview on what’s available.

With OpenCL.org we aim to solve these problems together with the community. All is shared on Github and anybody can join to complete the information we’ve shared. While our homepage had around 40 pages on these subjects, it was only our personal view on the subjects or had outdated info.

So we’re going to donate most of the OpenCL-related technical pages we’ve written over the years to the community.

There is much more to share – watch our blog, the OpenCLorg twitter and newsletter!

Different Logo

For who remembered: in 2010 the logo looked quite different. We still use the blocks in the background (like on our Twitter account), but since 2014 the colours and font are quite different. This change has been going along with the company growing up. The old logo is careful, while the new one is bold – now we’re more confident about our expertise and value.

Over the past 3 years the new logo has stayed the same and has fully become our identity.

Same kind of customers

It has been quite a journey! We could not have done it without all the customers we served over those 7 years.

Thank you!

Our offices

We’re expanding to more cities, to be closer to talent and our customers. The idea is to have multiple smaller offices instead of a few big ones. The idea for this was a simple set of questions on how work would be in 2030. The lines between offices would be shifting – not all is to be defined by walls. So smaller offices nearby, with the flexibility to temporarily move to another city, would be much more suited for what is expected in 2030.

Each city has one or two senior developer+manager person, who takes lead when the project-complexity demands it.

In HQ the main structure is provided for onboarding, administration, sales and such. All to make sure the different cities only have a few local things to take care off, so the focus can be on building great software and efficiently handling the projects.

EU – NL – Amsterdam

Koningin Wilhelminaplein 1 – 40601, 1062HG, Amsterdam, Netherlands

Amsterdam is the economic center of the Netherlands, a small country with 17 million inhabitants. It’s the home of HPC-companies like Bright Computing and ClusterVision, and has a large IT workforce that also feed the R&D demand of large international companies. As the number of companies settling here is still growing, Amsterdam is even planning to build a complete new city for 40 to 70 thousand people in the harbour area.

There are different sides of the city. When you think of Amsterdam as a tourist, you might think of the Anne Frank House, Gay Parade, Van Gogh Museum, the Red Light District, the canals, windmills and Tulips. If you would consider living here there are about the 180 different nationalities that live in the city, the 22 international schools and two universities, the vibrant night life and the many-villages-make-the-city atmosphere. Locals of all professions are fluent in English and there is a lively expat community.

You don’t need to live in Amsterdam, as there are several cities and villages nearby with all unique identities. As the Dutch infrastructure is of high standard, Amsterdam is easy to reach via train (and car) from several nearby cities and villages. For instance taking the train from Haarlem to the office takes 9 to 13 minutes, Leiden or Utrecht half an hour. Want to live at the sea? Zandvoort to the office is 25 minutes.

Expats (both single and with family) say they found it easy to build up a social life. For Europeans it’s very easy to move to Amsterdam, as there are no real borders in the EU.

EU – HU – Budapest

Radnóti Miklós u. 2, Budapest, 1137, Hungary

Two cities, Buda and Pest with both their own characteristics form the 1,75 million large capital of Hungary and the ninth-largest city in the EU. The country (est. in 895) has almost 10 million inhabitants.

There is more high-tech industry than you might think. Hungary has one of the highest rates of filed patents, the 6th highest ratio of high-tech and medium high-tech output in the total industrial output, the 12th-highest research Foreign Direct Investment inflow, placed 14th in research talent in business enterprise and has the 17th-best overall innovation efficiency ratio in the world.

If you walk in the city, you’ll find no average Hungarian. There is much creativity hidden and there’s a rich beer-culture. There is this unique quiet vibrant atmosphere that makes you immediately feel at home.

EU – ES – Barcelona

Better weather during winter than in Amsterdam and Budapest and a vibrant tech-city. It hosts the famous Barcelona Supercomputing Center, and is strong tech-hub.

Contenders

We’re researching multiple cities for starting a new office. Due to Covid these researches have been delayed a lot.

  • EU – NL – Utrecht
  • EU – NL – Eindhoven
  • EU – PL – Warsaw
  • EU – FR – Paris
  • EU – FR – Grenoble
  • EU – DE – Heidelberg
  • UK – Bristol

If you live in one of these cities and are good with GPUs, do get in contact. We start with these people:

  • An experienced developer who can manage projects
  • Three to four medior/senior developers
  • A temporary “location starter”
  • Optionally a sales-person

Mobile Processor OpenCL drivers (Q3 2013) + rating

saveFor your convenience: an overview of all ARM-GPUs and their driver-availability. Please let me know if something is missing.

I’ve added a rating, to friendly push the vendors to get to at least an 7. Vendors can contact me, if they think the rating does not reflect reality.

ZiiLabs

SDK-page@StreamHPC

Drivers can be delivered by Creative, when you pledge to order ZMS-40 processors. Mail us for a contact at Creative. Minimum order size is unknown.

This device can therefore only be used for custom devices.

[usr=4]

Vivante

SDK-page@StreamHPC

They are found on public devices. Android-drivers that work on FreeScale processors are openly available and can be found here.

[usr=8]

Even though the processors are not that powerfull, Vivante/FreeScale offers the best support.

Qualcomm

SDK-page@StreamHPC

Drivers are not shipped on devices, according various sources. Android-drivers are in the SDK-drivers though, which can be found here.

[usr=7]

Rating will go up, when drivers are publicly shipped on phones/tablets.

ARM MALI

Samsung SDK-page@StreamHPC

There are lots of problems around the drivers for Exynos, which only seem to work on the Arndale-board when the LCD is also ordered.Android-drivers can be downloaded here.

[usr=5]

All is in execution – half-baked drivers don’t do it. It is unclear whom to blame, but it certainly has had influence on creating a new version of Exynos 5, the octa.

Imagination Technologies

SDK-page@StreamHPC

TI only delivers drivers under NDA. Samsung has one board coming up with OpenCL 1.1 EP drivers.

[usr=5]

Rating will go up, when drivers from TI come available without obstacles, or Samsung delivers what they failed to do with the previous Exynos 5.

Exciting times coming up

Mostly because of a power-struggle between Google and the GPU-vendors, there is some hesitation to ship OpenCL drivers on phones and tablets. Unfortunately, Google’s answer to OpenCL RenderScript Compute, does not provide the needs wanted by developers. Google’s official answer is that it does not want fragmentation nor code that is optimised for a certain GPU. The interpreted answer is that Google wants vendor-lockin and therefore blocks the standard. Whatever the reason is, OpenCL is used as sword to show teeth who has a say about the future of Android – only the advertisement-company Google or also the group of named processor-makers and various phone/tablet-vendors?

In H2 2014 Nvidia will ship CUDA-drivers with their Tegra 5 GPUs, making the soap complete.

There are rumours Apple will intervene and will make OpenCL available on iOS. This would explain why there is put so much effort in showing OpenCL-results by Imagination and Qualcomm

And always keep a close watch on POCL, the vendor-independent OpenCL implementation.

[bordered_box border_color=” background_color=’#C1DAD6′]

Need a programmer for any of the above devices? Hire us!

[/bordered_box]

AMD vs NVIDIA – Two figures that can tell a whole story

titanUpdate September ’13: AMD gets their new GPUs “Volcanic Islands” with GCN 2.0 out in October. For this reason the HD 7970’s price has dropped to €250. This shakes up some of the things described in this article.

Update June ’14: It has become clear that Titan is not a consumer device and should be categorised as a “Quadro for compute”. All consumer devices of both AMD and Nvidia show relatively low GFLOPS for dual precision.

Update July’14: Graphs updated with GTX Titan Z and R9 290X.

AMD/ATI has always had the fastest GPU out there. Yes, there were lots of times in which NVIDIA approached the throne, or even held the crown for a while (at least theoretically), but it was Radeon, at the end, the one who had the right claim.

Nevertheless, some things have changed:

  • AMD has focused more on the new architecture, making it easier to program while keeping the GFLOPS the same.
  • AMD bets on their A-series APU with integrated GPU.
  • NVIDIA has increased both memory bandwidth and GFLOPS at a steady pace.
  • NVIDIA has done the nitro-trick for double precision.

With NVIDIA GTX Titan (see three of them in the image), NVIDIA snatched victory from the jaws of defeat.

I’m not saying you should jump now to CUDA; there’s more than just GFLOPS. We should think also of costs and prevention of vendor-lockin. More particularly, I would like to show how unpredictable the market for accelerator-processors is.

Let’s take a look at the figures. Continue reading “AMD vs NVIDIA – Two figures that can tell a whole story”

OpenCL error codes (1.x and 2.x)

computer-says-no
Little Britain: “Compu’er says no”. (links to Youtube movie)

Knowing all errors by heart is good for quick programming, but not always the best option. Therefore I started to create a full list with extra info, taken from cl.h and the reference documentation.

The problem with many error-codes is that they are sometimes context-dependent and then become quite useless in helping the programmer out. Also some drivers return different error-codes. Notice also that different errors are given per OpenCL-version for the same function. If you find problems, help make OpenCL better and give feedback.

Want it on your wall? You can easily copy these two tables into Excel or alike software and print it out.

Continue reading “OpenCL error codes (1.x and 2.x)”

Gedit OpenCL Syntax Highlighting

Update 17-06-2011: updated version of opencl.lang and added opencl_host.lang.

When learning a language it is nice to do it the hard way, so you take the default txt-file editor provided with your OS. No colours, not help, no nothing, pure hard-core learning. But in Linux-desktop Gnome the default editor Gedit is quite powerful without doing too much, has an official Windows-port and has a OSX Darwin-port. It took just a few hours to understand how highlighting in Gedit works and to get it implemented. I got some nice help from the work done at the cuda-highlighter by Hüseyin Temucin (for showing how to extend the c-highlighter the best way) and the VIM OpenCL-highlighter by Terence Ou (for all the reserved words). This is work in progress; I will tell about updates via Twitter.

Get it

Windows-users first need to download Gedit for Windows. OSX-folks can check Darwin-ports. Then the files opencl.lang (.cl-files) and opencl_host.lang (extension of c to highlight OpenCL-keywords) needs to be put in /usr/share/gtksourceview-2.0/language-specs/ (or in ~/.local/share/gtksourceview-2.0/language-specs/ for local usage only), or for Window in C:Program Filesgeditsharegtksourceview-2.0language-specs or for OSX in /Applications/gedit.app/Contents/Resources/share/gtksourceview-2.0/language-specs/. Make sure all Gedit-windows are closed so the configuration will be re-read, and then open a .cl-file with Gedit. If you have opened cl-files as C or Cuda, you have to set the highlighting to OpenCL manually (under view -> highlighting). For host-code you always need to set the highlighting manually to “OpenCL host”. You might want to associate cl-files with Gedit.

Alternatives

VIM: http://www.vim.org/scripts/script.php?script_id=3157

Notepad++: http://sourceforge.net/tracker/?func=detail&aid=2957794&group_id=95717&atid=612384

SciTE: http://forums.nvidia.com/index.php?showtopic=106156

StreamHPC is working on Eclipse-support and I’ve understood also work is done for Netbeans-support. Let me know if there are more alternatives.

OpenCL at SC15 – the booths to go to

SC15This year we’re unfortunately not at SuperComputing 2015 for reasons you will hear later. But we haven’t forgotten about the people going and trying to find a share of OpenCL. Below is a list of companies having a booth at SC15, which was assembled by the guys of IWOCL and we completed with some more background information.

Khronos

The first place to go to is booth #285 and meet Khronos to hear where to go at SC15 to see how OpenCL has risen over the years. More info here. Say hi from the StreamHPC team!

OpenCL on FPGAs

Altera | Booth: #462. Expected to have many demos on OpenCL. See their program here. They have brought several partners around the floor, all expecting to have OpenCL demos:

  • Reflex | Booth: #3115.
  • BittWare | Booth #3010.
  • Nallatech | Booth #1639.
  • Gidel | Booth #1937.

Xilinx | Booth: #381. Expected to show their latest advancements on OpenCL. See their program here.

Microsoft | Booth: #1319. Microsoft Bing is accelerated using Altera and OpenCL. Ask them for some great technical details.

ICHEC | Booth #2822. The Irish HPC centre works together with Xilinx using OpenCL.

Embedded OpenCL

ARM | Booth: #2015. Big on 64 bit processors with several partners on the floor. Interesting to ask them about the OpenCL-driver for the CPU and their latest MALI performance.

Huawei Enterprise | #173. Recently proudly showed the world their OpenCL capable camera-phones, using ARM MALI.

HPC OpenCL

Below are the three companies that promise at least 1 TFLOPS DP per co-processor.

Intel | Booth: #1333/1533. Where they spoke about OpenMP and forgot about OpenCL, Altera has brought them back. Maybe they share some plans about Xeon+FPGA, or OpenCL support for the new XeonPhi.

AMD | Booth: #727. HBM, HSA, Green500, HPC APU, 32GB GPUs and 2.2 TFLOPS performance – enough to talk about with them. Also lots of OpenCL love.

NVidia | Booth: #1021. Every year they have been quite funny when asked about why OpenCL is badly supported. Please do ask them this question again! Funniest answer wins something from us – to be decided.

Others

You’ll find OpenCL in many other places.

ArrayFire | Booth #2229. Their library has an OpenCL backend.

IBM | Booth: #522. Now Altera joined Intel, IBM’s OpenPower has been left with NVidia for accelerators. OpenCL could revive the initiative.

NEC | Booth: #313. The NEC group has accelerated PostgreSQL with OpenCL.

Send your photos and news!

Help us complete this post with news and photos, to complete this post. We’re sorry not to be there this year, so we need your help to make the OpenCL party complete. You can send via email, twitter and in the comments below. Thanks in advance!

A list of Desktop GPU architectures

p3-architectureUPDATED in February 2017

Some optimisation tricks work really well on one architecture, and are useless on others. And even with better drivers, the older architectures need some help. In other words, it helps to know what architecture the GPU has. Therefore you get some help from your friends at StreamHPC.

Below you’ll find a list of the architecture names of all OpenCL-capable GPU models of Intel, NVIDA and AMD. It does not contain the professional lines for now – first we are focusing on getting the general models right.

Understand it took a lot of time to gather the below information, and normally we share such information only with our clients.

Continue reading “A list of Desktop GPU architectures”