Meet Vincent in Bay Area between 11 and 16 August

Posted by Vincent Hindriksen on 30 July 2018

Our managing director, Vincent Hindriksen, is in San Francisco’s Bay Area from Saturday 11th up to Thursday 16th of August 2018. He’ll be visiting existing customers, but there is time left.

Current schedule (excluding several unconfirmed meetings):

Saturday: social meetups
Monday: full
Tuesday: all day good availability,
Wednesday: all day good availability
Thursday: morning good availability

Do you want to learn more about GPUs and how we can help you get there? Get in touch via our contact-page, and tell us address and time when you want to meet.

If you seek a job in GPUs, also get in contact! Stream HPC is growing quickly now, and a good moment to onboard and still make a difference. For job-talks also the evenings are available.

Scientific Visualisation of Molecules

Posted by Vincent Hindriksen on 31 October 2012 with 2 Comments

In many hard sciences focus is on formulas and text, whereas images are mainly graphs or simplified representations of researched matters. Beautiful visualisations are mainly artist’s impressions in popular media targeting hobby-scientists. When Cyrille Favreau made the first good-working version of his real-time GPU-accelerated raytracer, he saw potential in exactly this area: beautiful, realistic visualisations to be used in serious science. This resulted in software called IPV.

He chose to focus on rendering molecules of proteins and this article discusses raytracing in molecular sciences, while highlighting the features of the software.

This project has been discussed on GPU Science, but this article looks at the the software from a slightly different perspective. If you don’t want to know how the software works and what it can do, scroll down for a download-link.

Continue reading “Scientific Visualisation of Molecules” →

We have been awarded the Khronos project to upgrade the OpenCL test suite to 2.2!

Posted by Vincent Hindriksen on 16 December 2016 with 2 Comments

Some weeks ago we started with implementing the Compiler Test Suite for OpenCL 2.2. The biggest improvement of OpenCL 2.2 is C++ kernels, which originally was planned for 2.1. SPIRV 1.1 is another big improvement.

We are very happy to have a part in making OpenCL better! We find OpenCL C++ kernels very important, even if it has its limitations. Thanks to SPIRV 1.1 it gets easier to have more (unofficial) kernel languages next to C and C++, and to get SYCL. Also upgrading from 2.0 to 2.2 is rather easy thanks to the open source libclcxx.

Personally I found this project to also be very important for our internal knowledge building, as almost every function would be touched and discussed.

OpenCL 2.2 CTS RFQ has been awarded to StreamHPC

Dec 14th – Khronos issued a Request For Quote (RFQ) back in September 2016 to enhance and expand the existing OpenCL 2.1 conformance tests to create an OpenCL 2.2 test suite to be used to define conformance for OpenCL 2.2 implementations. The contract has been awarded to StreamHPC. StreamHPC is a software consultancy company specialized in performance tuned software development for CPU, GPU and FPGA. A large part of their clients hires them for their OpenCL expertise.

Already improvements have been added, bugs splatted and documentation improved. We hope to continue this the coming months!

We’ll be ready in March. Hopefully the first implementations are ready by then, as there is a test suite ready to iron out any bug discovered. Which three OpenCL drivers do you think will be first to have OpenCL 2.2? Intel, AMD, NVidia, ARM, Imagination, Qualcomm, TI, Intel FPGA (Altera), Xilinx, Portable OpenCL or another?

21-23 August: OpenCL Training London

Posted by Vincent Hindriksen on 13 May 2013

From 21 to 23 August StreamHPC will give a 3-day training in OpenCL. Here you will learn how to develop OpenCL-programs.

A separate ticket for only the first day can be bought, as then will be a crash-course into OpenCL. Module basics.

The second and third day will all about parallel-algorithm design, optimisation and error-handling. Module optimisation with several new subjects added.

The last part of the third day is reserved for special subjects, as requested by the attendees. Continue reading “21-23 August: OpenCL Training London” →

Get ready for conversions of large-scale CUDA software to AMD hardware

Posted by Vincent Hindriksen on 7 September 2016 with 1 Comment

IMG_20160829_172857_cropped In the past years we have been translating several types of software to AMD, targeting OpenCL (and HSA). The main problem was that manual porting limits the size of the to-be-ported code-base.

Luckily there is a new tool in town. AMD now offers HIP, which converts over 95% of CUDA, such that it works on both AMD and NVIDIA hardware. That 5% is solving ambiguity problems that one gets when CUDA is used on non-NVIDIA GPUs. Once the CUDA-code has been translated successfully, software can run on both NVIDIA and AMD hardware without problems.

The target group of HIP are companies with older clusters, who don’t want to pay the premium prices for NVIDIA’s latest offerings. Replacing a single server with 4 Tesla K20 GPUs of 3.5 TFLOPS by 3 dual-GPU FirePro S9300X2 GPUs of 11 TFLOPS will give a huge performance boost for a competitive price.

The costs of making CUDA work on AMD hardware is easily paid for by the price difference, when upgrading a GPU-cluster.

Continue reading “Get ready for conversions of large-scale CUDA software to AMD hardware” →

Tutorials

During our courses/trainings we will teach you the best of what you can find here.

We try to keep the following information as complete as possible, so please contact us if something is missing.

Learning OpenCL

[list1]

Hands on OpenCL, by Simon McIntosh-Smith and Tom Deakin from the University of Bristol in the UK. It currently is the most up-to-date tutorial on OpenCL, including code for lab-sessions.
Bruno Jurkovski wrote a clear quickstart.
AMD introduction to OpenCL.
MacResearch playlist on Youtube. Code of episode 3 and 6. Zip of PDFs.
CMSoft’s complete OpenCL tutorial.
The Code Project has a series on OpenCL, episodes 1, 2, 3, 4, 5, 6, 7 and 8. By Rob Farber.
Dr.Dobb’s has a series called “CUDA, Supercomputing for the Masses”. It is CUDA-oriented, but you can learn a lot about GPGPU in general and on NVIDIA specific optimisations. Login to their site and then you can access parts 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 and 21. Registration is free.
AMD’s university program. This is loads of information!
NVIDIA’s OpenCL pages provide all you need to program on NVIDIA.
Enjalot’s adventures in OpenCL giving the basics in OpenCL and pyOpenCL.
StreamHPC’s basic concepts with various tips&tricks on OpenCL.
KISTI Supercomputing Learning Centre has a beginners course for OpenCL. Material including PDFs and code is available on SF.net.
OpenCL cookbook by Dhruba Bandopadhyay.
Anteru’s introduction to OpenCL, part #1, #2 and #3.

[/list1]

OpenCL Optimisation guides

Intel Xeon and XeonPhi
NVidia (CUDA, but same applies to OpenCL)
AMD GPUs and CPUs
ARM MALI T600
Altera FPGAs

Not available (yet):

Imagination PowerVR
Qualcomm Adreno
Xilinx FPGAs

[infobox type=”information”][widgets_on_pages id=Trainings][/infobox]

University courses

OpenCL-based GPU-programming courses

[list1]

Marcus Bannermann university course made for the university of Erlangen, Germany.
Advanced Parallel Programming is a course on parallel programming by professor John Cavazos of University of Delaware.
Programming for Performance is a course on parallel programming by Jonathan Eyolfson of University of Waterloo.
Manchester OpenCL tutorial wiki. Materials from previous courses and more.
University of Innsbruck GPU-programming using OpenCL by Juan J. Durillo PhD.
University of Waterloo Programming for Performance. Lecture notes and assignments.

Architectures

Berkely University Computer Architecture and Engineering

[/list1]

Videos

[list1]

AMD’s OpenCL introduction. Takes about an hour in total, slides are provided.
Harvard Lectures on GPGPU. One hour each.

[/list1]

Cases/Studies

[list1]

AMD optimisation case study: Diagonal Sparse Matrix Vector Multiplication .
AMD optimisation case study: Simple reductions.

[/list1]

WebCL

WebCL is a new standard-to-be for OpenCL in the browser. Currently there are a few implementations, while Khronos is working on an official standard. WebCL is available on Firefox for Linux32, Windows32 and Windows64 by Nokia. Also available for Safari on OSX by Samsung. A Node.js-implementation is made by Motorola. Examples made for another implementation will probably not work.

Tutorials:

[list1]

[/list1]

Check Khronos’ WebCL page for more resources.

C/C++

Basic knowledge of C is needed to understand how to write kernels. Also many tutorials are in C++.

[list1]

A little C primer.
C++ for Java-programmers.
C for Java-programmers.
http://www.stanford.edu/class/cs101/ for if you have never programmed – don’t think about GPU-programming yet.

[/list1]

Basic OpenGL

Getting a grasp of OpenGL has advantages. Techniques for faster memory-operations in OpenGL have equivalents in OpenCL, giving reason to read on this subject.

[list1]

Discussion about OpenGL Shaders.
OpenGL Samples from getting an empty screen till the famous teapot.

[/list1]

Is OpenCL coming to Apple iOS?

Posted by Vincent Hindriksen on 19 August 2011 with 3 Comments

Answer: No, or not yet. Apple tested Intel and AMD hardware for OSX, and not portable devices. Sorry for the false rumour; I’ll keep you posted.

Update: It seems that OpenCL is on iOS, but only available to system-libraries and not for apps (directly). That explains part of the responsiveness of the system.

At the thirteenth of August 2011 Apple askked the Khronosgroup to test 7 unknown devices if they are conformant with OpenCL 1.1. As Apple uses OpenCL-conformant hardware by AMD, NVidia and Intel in their desktops, the first conclusion is that they have been testing their iOS-devices. A quick look at the list of available iOS devices for iOS 5 capable devices gives the following potential candidates:

iPhone 3GS
iPhone 4
iPhone 5
iPad
iPad 2
iPod Touch 4th generation
Apple TV

If OpenCL comes to iOS soon (as it is already tested), iOS 5 would be the moment. iOS 5 processors are all capable of getting speed-up by using OpenCL, so it is no nonsense-feature. This could speed up many features among media-conversion, security-enhancements and data-manipulation of data-streams. Where now the cloud or the desktop has to be used, in the future it can be done on the device.

Continue reading “Is OpenCL coming to Apple iOS?” →

OpenCL potentials: Watermarked media for content-protection

Posted by Vincent Hindriksen on 23 November 2011

HTML5 has the future, now Flash and Silverlight are abandoning the market to make the way free for HTML5-video. There is one big problem and that is that it is hard to protect the content – before you know the movie is on the free market. DRM is only a temporary solution and many times ends in user-frustration who just want to see the movie wherever they want.

If you look at e-books, you see a much better way to make sure PDFs don’t get all over the web: personalizing. With images and videos this could be done too. The example here at the right has a very obvious, clearly visible watermark (source), but there are many methods which are not easy to see – and thus easier to miss by people who want to have needs to clean the file. It therefore has a clear advantage over DRM, where it is obvious what has to be removed. Watermarks give the buyers freedom of use. The only disadvantage is that personalised video’s ownership cannot be transferred.

Continue reading “OpenCL potentials: Watermarked media for content-protection” →

Aparapi and GPU-code in Java

Aparapi is an open source framework used to write OpenCL-code in Java. It translates Java byte-code into OpenCL for AMD GPUs and all CPUs to get much faster performing code. Furthermore, Aparapi is also quite a fit for existing code (*). And there’s more: Since late 2011, a stable version is being released and new features have been introduced.

(*) You can read more about the alpha-version of Aparapi in this blog post

AMD and Oracle have agreed to collaborate on implementing support for GPU-programming in Java. This means that it’s very likely that this upcoming implementation by Oracle will resemble Aparapi, which in turn also means that it would be safe to invest in this technology.

Below is an example of a simple vector-addition. As you see, the code is very clean.

StreamHPC helps you to find bottlenecks in your Java-code and increase performance by using several types code optimisations and implementing heavy computations in Aparapi. We have a track-record of bringing down running time of batch-processes 2 to even 300 times. In case more performance is needed, Aprapi-code can be converted to pure OpenCL. We use libraries JOCL and JavaCL for this purpose.

If you want faster Java-code and/or want a training in Aparapi, contact us!

Installing and using Portable Computing Language (PoCL)

Posted by Vincent Hindriksen on 8 July 2013 with 1 Comment

Update August’13: 0.8 has been released

PoCL stands for Portable Portable Computing Language and the goal is to make a full and open source implementation of OpenCL 1.2 for LLVM.

This is about installing and using PoCL on Ubuntu 64. If you want to put some effort to build it on Windows, you will certainly help the project. See also this TODO for version 0.8, if you want to help out (or want to know its current state). Not all functionality is implemented, but the project progresses using test-driven development – using the samples in the SDKs as a base.

Backends

They are eager for collaboration, so new backends can be added. For what I’ve seen this project is one of the best starts for new OpenCL-drivers. First because of the work that already has been done (implement by example), second because it’s an active open source project (continuous post-development), third because of the MIT-license (permits reuse within proprietary software). Here at StreamHPC we keep a close eye on the project.

On a normal desktop it has only one device and that’s the CPU. It has backends for several other types of CPUs (check ./lib/kernel in the source):

ARM
Cell SPU
Powerpc
Powerpc64
x86_64

Also the TCE libraries can be used as backend. The maturity of each backend differs.

More support is coming. For instance Radeon via the R600 project, but PoCL first needs to support LLVM 3.3 for that.

It is also a good start to build drivers for your own processor (contact us for letting us assist you in building such backend, see Supporting OpenCL on your own hardware for some related info)

Continue reading “Installing and using Portable Computing Language (PoCL)” →

Internships: the self-driving vehicle – updated

Posted by Vincent Hindriksen on 9 March 2015

UPDATE: We now only offer thesis support (“externs”), for students who want to use OpenCL in their research, but don’t have such support at their university. for the rest the below applies.

From July there are several internships available here at StreamHPC, all around self-driving vehicles (or even self-flying drones). This means that with an interest in AI, embedded programming and sensors, you’re all set.

You can work as an intern for a period from 1 to 6 months, and combine it with your thesis. We will assist you with planning, thesis correction and technical support (especially OpenCL). There are also a few other startups in the building, who you’d like to talk with.

Your time will exist of literature studies, programming, testing, OpenCL-optimisations and playing. We’ll work with bikes and toy-cars, so no big cars that are expensive to crash. Study fields are road-location, obstacles, driving-style detection, etc.

If you want to do an internship purely to gain experience, we can offer you a combination of research and working for real customers.

Some targets:

Create a small test-car full with sensors:
- radar for distance
- multi cameras
- laser
- other sensors, like touch
Programming an embedded board with OpenCL-capability.
Programming pointcloud algorithms in OpenCL.
Defining the location on the road, also in OpenCL. (taken)
Detecting pedestrians, signs.
Have fun creating this.

Please contact us and tell your ideas and plan.

Building the HPC ecosphere in Amsterdam

Posted by Vincent Hindriksen on 17 April 2015

Here in Amsterdam a lot is going on around HPC. Including StreamHPC, we have companies like Vancis, Netherlands eScience Centre, and ClusterVision, the research institute for Dutch HPC, Surf SARA, (hosting the Dutch supercomputer) and the very busy Amsterdam IX.

Here in Amsterdam we’re focused on building up more local companies around big compute and big data. I’d like to give two examples. One is Scyfer, an academic startup specialised in deep learning. They’ve developed algorithms to more efficiently train neural networks and help their customers find answers quicker. The second is Euvision Technologies, who developed unique computer vision solutions. Last year it has been sold to Qualcomm, for tens of millions.

We welcome new companies to Amsterdam, to further build up the HPC-ecosphere. If you have a company and are seeking a good location, contact us to talk about HPC in Amsterdam.There are many opportunities to develop in Europe, and we’re open for partnerships in new markets.

If you want to start your own HPC-related startup, Amsterdam thinks of you! There are three steps to do:

Go to the Venture café on 30 April
Apply for the bootcamp
Become our neighbours
Build your own HPC startup

Ping me, if you want advice on which preparations you need to make, before you can make such big decision. I like to have an open discussion, so please use the comment-area below for what you think of HPC in Amsterdam and building companies.

Computational Finance

StreamHPC has consulted at various small and big financial institutes to improve the performance of their software: From optimising databases and computations to introducing modern hardware solutions such as FPGAs and GPUs.

We worked for fintech company Tempus energy to speed up their forecasting software to run under a minute. This speedup was necessary for them to improve their computational model so they could better serve their customers.

We can help you in developing high-performance and low-latency solutions as determined by your requirements. Please contact us to discuss what we can do for you.

Consultancy Partnerships

We have partnered with freelancers to easily scale up or include expertise.

Appilo

Israel. Expertise in OpenCL, CUDA, vision, C and C++.

Please contact us, if you want to have a partnership with StreamHPC. Please be very clear what synergy you seek.

Let’s meet at ISC in Frankfurt

Posted by Vincent Hindriksen on 13 June 2016

Vincent Hindriksen will be walking around at ISC from 20 to 22 June. With me I bring our latest brochure, some examples of great optimisations and some Dutch delicacies. Also we will also have some exciting news with an important partner – stay tuned!

It will be a perfect time to discuss how StreamHPC can help you solve tough compute problems. Below is a regularly updated schedule of my time at ISC.

Get in contact to schedule a meeting.

If you’d like to talk technologies and bits&bytes, we’re trying to make a get-together – date&time TBD.

Demo: cartoonizer on an Altera Arria 10 FPGA

Posted by Vincent Hindriksen on 6 July 2017

It takes quite some effort to program FPGAs using VHDL or Verilog. Since several years Intel/Altera has OpenCL-drivers, with the goal to reduce this effort. OpenCL-on-FPGAs reduced the required effort to a quarter of the time, while also making it easier to alter the specifications during the project. Exactly the latter was very beneficiary when creating the demo, as the to-be-solved problem was vaguely defined. The goal was to make a video look like a cartoon using image filters. We soon found out that “cartoonized” is a vague description, and it took several iterations to get the right balance between blur, color-reduction and edge-detection. Continue reading “Demo: cartoonizer on an Altera Arria 10 FPGA” →

Exposing OpenCL on Android: Q&A with Tim Lewis of ZiiLabs

Posted by Vincent Hindriksen on 28 July 2011

ZiiLabs has been offering an early access program for OpenCL SDK since last year. This program was very selective in choosing developers and little news has been put on their webpage. Now they are planning to make their Android NDK a standard component, it’s a good time to ask them some questions. GPGPU-consultant Liad Weinberger of Appilo also added a few questions.

The Q&A has been with Tim Lewis, director Marketing and Partner Relations of ZiiLabs, who has taken the time to give some insights in what we can expect around accelerated computations on Android. ZiiLabs has been better known as 3DLabs and has reinvented itself in 2009 (you can read the full history here). Like other companies in the ARM-industry they mostly design chips and let other parties manufacture devices using their schematics, drivers and software. Now to the questions.

Continue reading “Exposing OpenCL on Android: Q&A with Tim Lewis of ZiiLabs” →

SC15 news from Monday

Posted by Vincent Hindriksen on 16 November 2015 with 4 Comments

Warning: below is raw material, and needs some editing.

Today there was quite some news around OpenCL, I’m afraid I can’t wait till later to have all news covered. Some news is unexpected, some is great. Let’s start with the great news, as the unexpected news needs some discussion.

Khronos released OpenCL 2.1 final specs

As of today you can download the header files and specs from https://www.khronos.org/opencl/. The biggest changes are:

C++ kernels (still separate source files, which is to be tackled by SYCL)
Subgroups are now a core functionality. This enables finer grain control of hardware threading.
New function clCloneKernel enables copying of kernel objects and state for safe implementation of copy constructors in wrapper classes. Hear all Java and .NET folks cheer?
Low-latency device timer queries for alignment of profiling data between device and host code.

OpenCL 2.1 will be supported by AMD. Intel was very loud with support when the provisional specs got released, but gave no comments today. Other vendors did not release an official statement.

Khronos released SPIR-V 1.0 final specs

SPIR-V 1.0 can represent the full capabilities of OpenCL 2.1 kernels.

This is very important! OpenCL is not the only language anymore that is seen as input for GPU-compilers. Neither is OpenCL hostcode the only API that can handle the compute shaders, as also Vulkan can do this. Lots of details still have to be seen, as not all SPIRV compilers will have full support for all OpenCL-related commands.

With the specs the following tools have been released:

A bi-directional translator between LLVM to SPIR-V to enable flexible use of both intermediate languages in tool chains.
An OpenCL C to LLVM compiler that generates SPIR-V through the above translator, as Clang can compile OpenCL 1.2/2.0 C kernels.
A SPIR-V assembler and disassembler.

SPIRV will make many frontends possible, giving co-processor powers to every programming language that exists. I will blog more about SPIRV possibilities the coming year.

Intel claims OpenMP is up to 10x faster than OpenCL

The below image appeared on Twitter, claiming that OpenMP was much faster than OpenCL. Some discussion later, we could conclude they compared apples and oranges. We’re happy to peer-review the results, putting the claims in a full perspective where MKL and operation mode is mentioned. Unfortunately they did not react, as <sarcasm>we will be very happy to admit that for the first time in history a directive language is faster than an explicit language – finally we have magic!</sarcasm>

https://twitter.com/StreamHPC/status/666178711549583364

CT3iJBlVEAA2WCY — Left half is FFT and GEMM based, probably using Intel’s KML. All algorithms seems to be run in a different mode (native mode) when using OpenMP, for which intel did not provide OpenCL driver support for.

We get back later this week on Intel and their upcoming Xeon+FPGA chip, if OpenCL is the best language for that job. It ofcourse is possible that they try to run OpenMP on the FPGA, but then this would be big surprise. Truth is that Intel doesn’t like this powerful open standard intruding the HPC market, where they have a monopoly.

AMD claims OpenCL is hardly used in HPC

Well, this is one of those claims that they did not really think through. OpenCL is used in HPC quite a lot, but mostly on NVidia hardware. Why not just CUDA there? Well, there is demand for OpenCL for several reasons:

Avoid vendor lock-in.
Making code work on more hardware.
General interest in co-processors, not specific one brand.
Initial code is being developed on different hardware.
…

Thing is that NVidia did a superb job in getting their processors in supercomputers and clouds. So OpenCL is mostly run on NVidia hardware and a therefore the biggest reason why that company is so successful in slowing the advancement of the standard by rolling out upgrades 4 years later. Even though I tried to get the story out, NVidia is not eager to tell the beautiful love story between OpenCL and the NVidia co-processor, as the latter has CUDA as its wife.

Also at HPC sites with Intel XeonPhi gets OpenCL love. Same here: Intel prefers to tell about their OpenMP instead of OpenCL.

AMD has few HPC sites and indeed there is where OpenCL is used.

No, we’re not happy that AMD tells such things, only to promote its own new languages.

CUDA goes AMD and open source

AMD now supports CUDA! The details: they have made a tool that can compile CUDA to “HiP” – HiP is a new language without much details at the moment. Yes, I have the same questions as you are asking now.

Also Google joined in and showed progress on their open source implementation of CUDA. Phoronix is currently the best source for this initative and today they shared a story with a link to slides from Google on the project. the results are great up: “it is to 51% faster on internal end-to-end benchmarks, on par with open-source benchmarks, compile time is 8% faster on average and 2.4x faster for pathological compilations compared to NVIDIA’s official CUDA compiler (NVCC)”.

For compiling CUDA in LLVM you need three parts:

a pre-processor that works around the non-standard <<<…>>> notation and splits off the kernels.
a source-to-source compiler for the kernels.
an bridge between the CUDA API and another API, like OpenCL.

Google has done most of this and now focuses mostly on performance. The OpenCL community can use this to use this project to make a complete CUDA-to-SPIRV compiler and use the rest to improve POCL.

Khronos gets a more open homepage

Starting today you can help keeping the Khronos webpage more up-to-date. Just put a pull request at https://github.com/KhronosGroup/Khronosdotorg and wait until it gets accepted. This should help the pages be more up-to-date, as you can now improve the webpages in more ways.

More news?

AMD released HCC, a C++ language with OpenMP built-in that doesn’t compile to SPIRV.

There have been tutorials and talks on OpenCL, which I should have shared with you earlier.

Tomorrow another post with more news. If I forgot something on Sunday or Monday, I’ll add it here.

Accelerating an Excel Sheet with OpenCL

Posted by Vincent Hindriksen on 19 September 2016

excel-opencl One of the world’s most used software is far from performance optimised and there is hardly anything we can do about it. I’m talking about Excel.

There are various engine replacements which promise higher speeds, but those have the disadvantage that they’re still not fast enough with really heavy calculations. Another option is to use much faster LibreOffice, but companies prefer ribbons over new software. The last option is to offer performance-optimised modules for the problematic parts. We created a demo a few years ago and revived it recently. Continue reading “Accelerating an Excel Sheet with OpenCL” →

About Us

Stream HPC is a software development company in parallel software for many-core processors. We provide professional software development services, training and consulting to help you increase compute performance in software while lowering hardware-costs.

We have 3 locations.

Stream HPC B.V. (Amsterdam)

Koningin Wilhelminaplein 1
1062 HG Amsterdam
Netherlands, Europe

phone: +31 854865760 (office) or +31 6 45400456 (cell)

Visit us in Amsterdam

Stream HPC Hungary Kft. (Budapest)

Science Park 
Irinyi József u. 4-20.
1117 Budapest
Hungary, Europe

Stream HPC Spain S.L. (Barcelona)

Plaza de Catalunya 1, 4th floor
Barcelona 08002
Spain, Europe

History

2010 – 2013: the freelancing years

The company started as a freelancing business, with one focus: Programming GPUs with OpenCL. It was though, as back then the G in GPU stood for “Graphics only”.

The name was “StreamComputing” = A high-performance computer system that analyzes multiple data streams from many sources live. The main goal was to create software algorithms that analyze the data in real time as it streams in to increase speed and accuracy when dealing with data handling and analysis, which was in line with that.

2014: first hope

Four years later the first employee, Anca, was hired. Later that year the freelancing business was was turned into a limited company. GPUs got more seen as data-processors and trainings were the main income. Projects were still small, GPGPU was a world of early adopters and most time was invested on trainings.

First contact was made with AMD, now one of our biggest clients.

2015-2017: initial growth

Stream grew to a handful of employees, and we did projects for HSA foundation, Stanford, AMD, Zeiss, Nokia, Philips and many lesser known companies.

Trainings were still done, but were by far not the main resource of income anymore. We tried some FPGA-work, but found that most promises were not implemented yet.

2017: a new name

We renamed the company to Stream HPC. There were several reasons. As we focused more on customers from Asia and North America, we needed the .com, which was unavailable. Getting the new name was quite a quest, but we got there: by customers we were often referred to as “Stream”, a business coach assured us that CPU-work would remain important and thus “HPC” was more important that “GPU”, and it was quite difficult to type streamcomputign correctly.

2017-2020: hitting all kinds of ceilings

The goal was to grow further, but this turned out to be more difficult than expected. All kinds of obstacles got in our way, and we even once shrunk in size. With trainings, coaching, reading and persistence, we got to understand the hurdles and finally could implement solutions. Looking back it was easy.

2021: Stream HPC Hungary

Hungary started as a group of freelancers. We were very happy with the quality provided by our Hungarian colleagues, and that was enough reason to invest more. We opened the new office in Q3.

The company now turned into a group of companies, and all was set up to extend the group more easily.

We grew back to 15 people by the end of the year.

2022: Benchmark.io

At ISC Benchmark.io was started. To help our customers do better benchmarks, we put all our knowledge into a separate product. Due to high demand for our consultancy services, it is in private beta only.

2022: Stream HPC Spain

Barcelona was opened in Q3.

The estimation is to grow to 25-35 people by the end of the year.

Altera published their OpenCL-on-FPGA optimization guide

Posted by Vincent Hindriksen on 11 November 2013

Altera has just released their optimisation guide for OpenCL-on-FPGAs. It does not go into the howto’s of OpenCL, but assumes you have knowledge of the technology. Niether does it provide any information on the basics of Altera’s Stratix V or other FPGA.

It is the first public optimisation document, so it is appreciated to send feedback directly. Not aware what OpenCL can do on an FPGA? Watch the below video.

https://www.youtube.com/watch?v=p25CVFMc-dk

Subjects

The following subjects and optimisation tricks are discussed:

FPGA Overview
Pipelines
Good Design Practices
Avoid Pointer Aliasing
Avoid Expensive Functions
Avoid Work-Item ID-Dependent Backward Branching
Aligned Memory Allocation
Ensure 4-Byte Alignment for All Data Structures
Maintain Similar Structures for Vector Type Elements
Optimization of Data Processing Efficiency
Specify a Maximum Work-Group Size or a Required Work-Group Size
Loop Unrolling
Resource Sharing
Kernel Vectorization
Multiple Compute Units
Combination of Compute Unit Replication and Kernel SIMD Vectorization
Resource-Driven Optimization
Floating-Point Operations
Optimization of Memory Access Efficiency
General Guidelines on Optimizing Memory Accesses
Optimize Global Memory Accesses
Perform Kernel Computations Using Constant, Local or Private Memory
Single Work-Item Execution

Carefully compare these with CPU and GPU optimisation guides to be able to write more generic OpenCL code.

Download

You can download the document here.

If you have any question on OpenCL-on-FPGAs, OpenCL, generic optimisations or Altera FPGAs, feel welcomed to contact us.