http://www.flickr.com/photos/imabug/2946930401/

OpenCL Potentials: Medical Imaging

Photo by Eugene MahWhen you ever saw a CT or MRI scanner, you might have noticed the full-sized computer next to it (especially the older ones). There is quite some processing power needed to keep up with the data-stream coming from the scanner, to process the data to a 3D-image and to visualise the data on a 2D-screen. Luckily we have OpenCL to make it even faster; which doctor doesn’t want real-time high-resolution results and which patient doesn’t want to see the results on Apple iPad or Samsung Galaxy Tab?

Architects, bankers and doctors have one thing in common: they get a better feeling for the current subject if they can play with the data. OpenCL makes it possible to process data much faster and thus let the specialist play with it. The interesting part of IT is that it is in every domain now and therefore a new series: OpenCL-potentials.

Continue reading “OpenCL Potentials: Medical Imaging”

Partner up with StreamHPC for Horizon 2020!

For those working in a research-department at a company or university within the EU, Horizon 2020 might be a bit of a familiar sound.

horizon2020

For us this is an important program and a source possibilities for collaboration in the coming years. Our expertise in enabling ultra-fast computations, combined with your expertise can make Europe more competitive. We are interested in applied GPGPU, in the commercialization of tools, in co-developing new software with SMEs and also in universities based on the EU, Switzerland or Israel.

Fields Europe wants to focus on:

  • Micro- and nano-electronics; photonics
  • Nanotechnologies
  • Advanced materials
  • Biotechnology
  • Advanced manufacturing  and processing

The development of these technologies requires a multi-disciplinary knowledge and a capital-intensive approach.

Each of these industries has opportunities for using GPus and Accelerators.

Contact us today for more information. We’ll have a lot to share!

Events are organised throughout Europe to inform universities and companies about the programme. Are you a Dutch university or company? Check this site of the Dutch government.

Creative industry

Desktops

When working with CAD-software, we tend to need a lot of rendering (ray-trace animations) and extensively use photo-editors. This means that there will be a huge speed-up when using OpenCL. In case of a little mistake in the final rendering, there should not be a question to ignore the detail or go for quality. Now you can just re-render and still make the deadline. In most cases the extra processing-power is needed for all employees and thus the best option is to upgrade the software. You can consult your software-supplier for more information.

But what is most interesting is to get the hardware upgraded. On Apple-computers, unluckily, the support for the stream-processors is lacking. We’ll just have to wait until NVidia and AMD listens to the growing group of OpenCL-demanding users on Apple-computers. Contact us, if you want to be the first to know when this will be possible!

On PCs we can upgrade the computers to have up to 4 stream-processors and thereby provide up to 5 teraFlops of computing power. This will result in real-time rendering of normal resolution images and 50 times speed-up over high-resolution images. We don’t think the efficiency will increase through less lost hours (because creativity happens inside the head when drinking coffee), but it will certainly increase the end-quality because more possibilities have been tried out and the creator was exposed to more visual feedback.

Render farms

When rendering movies and other high quality, high resolution visual material, a single Desktop might not be sufficient. Our default solution is a render-farm (a cluster of at least 5 servers and 1 control-computer) with drQueue. We’re familiar with those  Pixar-movies stories that took 3 years to finish rendering. With OpenCL this can be brought back to 3 or 4 months, even with higher demands. Most movies with less complex materials (such as hair) can actually be rendered faster than real-time.

Training

Public trainings in Amsterdam

As Amsterdam is easy to reach from anywhere in Europe, we’re giving most of our public trainings in our offices. See the below list what is upcoming:

No events

In-company trainings globally

We at StreamHPC train IT-experts in OpenCL, CUDA and GPU directives world-wide. All trainings can be given in English or Dutch; on request printed materials can be translated into your local language.

We offer:

  • crash courses in GPUs, FPGAs and directives,
  • in-dept trainings, and
  • in-house trainings.

What customers said

“Normally you just get told to type in specific commands in some order, which you can find in the text books too. StreamHPC focused on teaching the backgrounds to get a mindset for GPU-programming and to better understand the hardware. After that I understood the SDK-examples much better.”

Power to the Vector Processor

Reducing energy-consumption is “hot”

After reading this article “Nvidia is losing on the HPC front” by The Inquirer which mixes up the demand for low-power architectures with the other side of the market: the demand for high performance. It made me think that it is not that clear there are two markets using the same technology. Also Nvidia has proven it to be not true, since the super-computer “Nebuale” uses almost half the watts per flop as the #1. How come? I quote The Register from an article of one year old:

>>When you do the math, as far as Linpack is concerned, Jaguar takes just under 4 watts to deliver a megaflops at a cost of $114 per megaflops for the iron, while Nebulae consumes 2 watts per megaflops at a cost of $39 per megaflops for the system. And there is little doubt that the CUDA parallel computing environment is only going to get better over time and hence more of the theoretical performance of the GPU ends up doing real work. (Nvidia is not there yet. There is still too much overhead on the CPUs as they get hammered fielding memory requests for GPUs on some workloads.)<<

Nvidia is (and should) be very proud. But actually I’m already looking forward when hybrids get more common. They will really shake up the HPC-market (as The Register agrees) in lowering latency between GPU and CPU and lowering energy-consumption. But where we can find a bigger market is the mobile market.

Continue reading “Power to the Vector Processor”

“That is not what programmers want”

the-miracle-middle-colour2
I think you should be more explicit here in step two” (original print)

This post is part of the series Programming Theories, in which we discuss new and old ways of programming.

When discussing the design of programming languages or the extension of existing ones, the question What concepts can simplify the tasks of the programmer? always triggers lots of interesting debates. After that, when an effective solution is found, inventors are cheered, and a new language is born. Up ’till this point all seems ok, but the problem comes with the intervention of the status quo: C, C++, Java, C#, PHP, Visual Basic. Those languages want the new feature implemented in the way their programmers expect it. But this would be like trying to implement the advantages of a motorcycle into a car without paying attention to the adjustments needed by the design of the car.

I’m in favor of learning concepts instead of doing new things the old way… but only when the latter has proven to be better than the former. The lean acceptance of i.e. functional languages tells a lot about how it goes in reality (with great exceptions like LINQ). That brings a lot of trouble when moving to multi-core. So, how do we get existing languages to change instead of just evolve?

High Level Languages for Multi-Core

Let’s start with a quote from Edsger Dijkstra:

Projects promoting programming in “natural language” are intrinsically doomed to fail.

In other words: a language can be too high level. A programmer needs the language to be able to effectively micro-manage what is being done. We speak of concerns for a reason. Still, the urge to create the highest programming language is strong.

Don’t get me wrong. A high-level language can be very powerful once its concepts define both ways. One way concerns the developer: does the programmer understand the concept and the contract of the command or programming style being offered? The other concerns the machine: can it be effectively programmed to run the command, or could a new machine be made to do just that? This two-side contract is one of the reasons why natural languages are not fit for programming.

And we have also found out that binary programming is not fit for humans.

The cartoon refers to this gap between what programmers want and what computers want.

Continue reading ““That is not what programmers want””

Intel OpenCL CPU-drivers 2013 beta with OpenCL 1.2 support

Screenshot from Intel’s “God Rays” demo

This article is still work-in-progress

Intel has just released its OpenCL bit CPU-drivers, version 2013 bèta. It has support for OpenCL 1.1 (not 1.2 as for the CPU) on Intel HD Graphics 4000/2500 of the 3rd generation Core processors (Windows only). The release notes mention support for Windows 7 and 8, but the download-site only mentions windows 8. Support under Linux is limited to 64 bits.

The release notes mention:

  • General performance improvements for many OpenCL* kernels running on CPU.
  • Preview Tool: Kernel Builder (Windows)
  • Preview Feature: support of  kernel source code hotspots analysis with the Intel VTuneT Amplifier XE 2011 update 3 or higher.
  • The GNU Project Debugger (GDB) debugging support on Linux operating systems.
  • New OpenCL 1.2 extensions supported by the CPU device:
    • cl_khr_int64_base_atomics and cl_khr_int64_extended_atomics
    • cl_khr_fp16
    • cl_khr_gl_sharing
    • cl_khr_gl_event
    • cl_khr_d3d10_sharing
    • cl_khr_dx9_media_sharing
    • cl_khr_d3d11_sharing.
  • OpenCL 1.1 extensions that were changed in OpenCL 1.2:
    • Device Fission supports both OpenCL 1.1 EXT API’s and also OpenCL* 1.2 fission core features
    • Media Sharing support intel 1.1 media sharing extension and also the 1.2 KHR media sharing extension
    • Printf extension is aligned with OpenCL 1.2 core feature.

Check the release notes for full information.

The drivers can be found on http://software.intel.com/en-us/articles/vcsource-tools-opencl-sdk-2013/. Installation is simple. For Windows there is a installer. If you have Linux, make sure you remove any previous version of Intel’s openCL drivers. If you have a Debian-based Linux, use the command ‘alien’ to convert the rpm to deb, and make sure ‘libnuma1‘ is installed. There are requirements for libc 2.11 or 2.12 – more information on that later as Ubuntu 12.04 has libc6 2.15.

Continue reading “Intel OpenCL CPU-drivers 2013 beta with OpenCL 1.2 support”

Porting code that uses random numbers

dobbelstenen

When we port software to the GPU or FPGA, testability is very important. A part of making the code testable, is getting its functionality fully under control. And you guessed already that run-time generated random numbers takes good attention.

In a selection of past projects random numbers were generated on every run. Statistically the simulations were more correct, but it is impossible to make 100% sure the ported code is functionally correct. This is because there are two variations introduced: one due to the numbers being different and one due to differences in code and hardware.

Even if the combined error-variations are within the given limits, the two code-bases can have unnoticed, different functionality. On top of that, it is hard to have further optimisations under control, as that can lower the precision.

When porting, the stochastic correctness of the simulations is less important. Predictable outcomes should be leading during the port.

Below are some tips we gave to these customers, and I hope they’re useful for you. If you have code to be ported, these preparations make the process quicker and more correct.

If you want to know more about the correctness of RNGs themselves, we discussed earlier this year that generating good random numbers on GPUs is not obvious.

Continue reading “Porting code that uses random numbers”

Why use OpenCL on FPGAs?

9781118942208.pdfAltera has just released the free ebook FPGAs for dummies. One part of the book is devoted to OpenCL, so we’ll quote some extracts here  from one of the chapters. The rest of the book is worth a read, so if you want to check the rest of the text, just fill in the form on Altera’s webpage.

In StreamHPC we’re interested in OpenCL on FPGAs for one reason: many companies run their software on GPUs, when they should be using FPGAs instead; and at the same time, others stick to FPGAs and ignore GPUs completely. The main reason, we think, is that converting CUDA to VHDL, or Verilog to CPU intrinsics, is simply too painful. Another reason can be seen in the a amount of investment put on a certain technology. We believe that OpenCL can solve both of these issues. OpenCL is much more portable and can be converted to a new architecture in a relatively short time (if the developer is familiar with the project, the hardware and OpenCL). We have high familiarity with these two latter, which means we’re used to get new projects up-and-running.

Since both Altera and Xilinx have invested in OpenCL, the two FPGAs code has become more portable now. Altera has a public SDK (and they’re proudly loud about it), while Xilinx offers it in their latest tools (although they’re unfortunately much more silent about it).

Now, let us now go back to the quotes from the book that we wanted to share with you.

Andrew Moore describes OpenCL effectively in just a few sentences:

The need for heterogeneous computing is leading to new programming languages to exploit the new hardware. One example is the OpenCL first developed by Apple, Inc. OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of CPUs, GPUs, DSPs, FPGAs, and other types of processors. OpenCL includes a language for developing kernels (functions that execute on hardware devices) as well as application programming interfaces (APIs) that define and control the various platforms. OpenCL allows for parallel computing using task-based and data-based parallelism.

The author also shares some interesting insights around the reasons why OpenCL should be used on FPGA:

FPGAs are inherently parallel, so they’re a perfect fit with OpenCL’s parallel computing capabilities. FPGAs give you an alternative to the typical data or task parallelism by offering a pipeline parallelism where tasks can be spawned in a push-pull configuration with each task using different data from the previous task with or without host interaction. OpenCL allows you to develop your code in the familiar C programming language but using the additional capabilities provided by OpenCL. These kernels can be sent to the FPGAs without your having to learn the low-level HDL coding practices of FPGA designers. Generally, there are several benefits for software developers and system designers to use OpenCL to develop code for FPGAs:

  • Simplicity and ease of development: Most software developers are familiar with the C programming language, but not low-level HDL languages. OpenCL keeps you at a higher level of programming, making your system open to more software developers.
  • Code profiling: Using OpenCL, you can profile your code and determine the performance-sensitive pieces that could be hardware accelerated as kernels in an FPGA.
  • Performance: Performance per watt is the ultimate goal of system design. Using an FPGA, you’re balancing high performance in an energy-efficient solution.
  • Efficiency: The FPGA has a fine-grain parallelism architecture, and by using OpenCL you can generate only the logic you need to deliver one fifth of the power of the hardware alternatives.
  • Heterogeneous systems: With OpenCL, you can develop kernels that target FPGAs, CPUs, GPUs, and DSPs seamlessly to give you a truly heterogeneous system design.
  • Code reuse: The holy grail of software development is achieving code reuse. Code reuse is often an elusive goal for software developers and system designers. OpenCL kernels allow for portable code that you can target for different families and generations of FPGAs from one project to the next, extending the life of your code.

Today, OpenCL is developed and maintained by the technology consortium Khronos Group. Most FPGA manufacturers provide Software Development Kits (SDKs) for OpenCL development on FPGAs.

You can continue here if you want to read of this ebook. And  of course, whenever you want to learn some more more, feel free to write to us, or follow this conversation on Twitter, which goes on through our special account: @OpenCLonFPGAs.

AMD is back!

AMD_Logo-and-wordmark-1024x768For years we haven been complaining on this blog what AMD was lacking and what needed to be improved. And as you might have concluded from the title of this blogpost, there has been a lot of progress.

AMD is back! It will all come together in the beginning of 2017, but you’ll see a lot of progress already the coming weeks and months.

AMD quietly recognised and solved various totally new problems in HPC, becoming the hidden innovator everybody needed.

This blog is to give an overview of how AMD managed to come back and what it took to get to there. Their market cap supports it, as you can see.

amd-market-cap-history
AMD’s market cap is back at 2012 levels (source)

Continue reading “AMD is back!”

We’re looking for an intern to do the cool stuff: benchmarking and Linux wizarding

intern
So, don’t let us retype your documents and blog posts, as that would make us your intern.

We have some embedded devices here, which badly need attention. Some have gotten some private time on the bench, but we did not share anything on the blog yet with our readers. We simply need some extra hands to do this. Because it’s actually cool to do, but admittedly a bit boring when doing several devices, it was the perfect job for an intern. Besides the benchmarking, we have some other Linux-related projects for you. You’ll get an average payment for an internship in the Netherlands (in Dutch: “stagevergoeding”), lunch, a desk and a bunch of devices (aka toys-for-techies).

Like more companies in the Netherlands, we don’t care about how you where born, but who you are as a person. We expect from you that you…

  • know everything about Linux administration, from servers to embedded devices.
  • know how to setup a benchmark.
  • document all what you do, not only the results.
  • speak and write Dutch and English.
  • have great humor! (Even if you’re the only one who laughs at your jokes).
  • study in the EU, or can arrange the paperwork to get to the EU yourself.
  • have a place to live/crash in or nearby Amsterdam, or don’t mind the daily travelling. You cannot sleep in the office.

Together with your educational institute we’ll discuss the exact learning goals of the internship, and make a plan for a period of 3 to 6 months.

If you are interested, send a mail to jobs@streamhpc.com. If you know somebody who would be interested, please tell that person that we’re waiting for him/her! Also tips&tricks on finding the right person are very welcome.

Call for papers: SYCL workshop, 13-March-2016, Barcelona, Spain

33d9e1_e784b7_SYCL_Color_Mar14A high-level language has been on OpenCL’s roadmap since the years, and would be started once the foundations were ready. Therefore with OpenCL 2.0, SYCL was born.

To keep the pace high, a SYCL workshop is being organised. This week the call-for-papers is opened, which you can read below.

1st SYCL workshop (SYCL’16) – co-located with PPoPP’16

Barcelona, Spain Sunday, 13th March, 2016

SYCL (sɪkəl – as in sickle) is a royalty-free, cross-platform C++ abstraction
layer that builds on the underlying concepts, portability and efficiency of
OpenCL, while adding the ease-of-use and flexibility of C++. For example, SYCL
enables single source development where C++ template functions can contain both
host and device code to construct complex algorithms that use OpenCL
acceleration, and then re-use them throughout their source code on different
types of data. SYCL has also been designed with resilience from the start, by
featuring, for example, a fall-back mechanism to automatically re-enqueue
kernels on different queues in case of a failure.

The SYCL Workshop aims to gather together SYCL’s users, researchers, educators
and implementors to encourage and grow a community of users behind the SYCL
standard, and related work in C++ for heterogeneous architectures. This will be
a half-day workshop. SYCL’16 will be held in Barcelona, 13 March 2016,
co-located with PPoPP 2016, HPCA 2016, CGO 2016 and LLVM 2016.

Travel Awards

Student authors who present papers in this workshop are eligible to apply for
travel awards. Further details will be announced after notification of
acceptance.

Important Dates

Submissions: 23rd November
Notification: 21st December
Final version: 24th January, 2016
Workshop: Sunday, 13th March, 2016

Submission Guidelines

All submissions must be made electronically through the conference submission
site, at https://easychair.org/conferences/?conf=sycl16.
Submissions may be one of the following:

  • Extended abstract: Two pages in standard SIGPLAN two-column conference
    format (preprint mode, with page numbers)
  • Short Paper: Four to six pages in standard SIGPLAN two-column conference
    format (preprint mode, with page numbers)

Submissions must be in PDF format and printable on US Letter and A4 sized
paper. All submissions will be peer-reviewed by at least two members of the
program committee. We will aim to give longer presentation slots to papers than
to extended abstracts. Conference papers will not be published, but made
available through the website, alongside the slides used for each presentation.
The aim is to enable authors to get feedback and ideas that can later go into
other publications. We will encourage questions and discussions during the
workshop, to create an open environment for the community to engage with.

Topics of interest include, but are not limited to:

  • Applications implemented using SYCL
  • C++ Libraries using SYCL
  • C++ programming models for OpenCL (C++AMP, Boost.Compute, …)
  • Other C++ applications using OpenCL
  • New proposals to the SYCL specification
  • Integration of SYCL with other programming models
  • Compilation techniques to optimise SYCL kernels
  • Performance comparisons between SYCL and other programming models
  • Implementation of SYCL on novel architectures (FPGA, DSP, …)
  • Using SYCL in fault-tolerant systems
  • Reports on SYCL implementations
  • Debuggers, profilers and tools

Organising Committee

Paul Keir, University of the West of Scotland (UK)
Ruyman Reyes, Codeplay Software Ltd, Edinburgh (UK)

Program Committee

Jens Breitbart, TU Munich
Alastair Donaldson, Imperial College London, UK
Christophe Dubach, University of Edinburgh, UK
Joel Falcou, LRI, Université Paris-Sud, France
Benedict Gaster, University of the West of England, UK
Vincent Hindriksen, StreamHPC, Netherlands
Christopher Jefferson, St. Andrews University, UK
Ronan Keryell, Xilinx, Ireland
Zoltán Porkoláb, ELTE, Hungary
Francisco de Sande, Universidad de La Laguna, Spain
Ana Lucia Varbanescu, University of Amsterdam, Netherlands
Josef Weidendorfer, TU Munich

Yes, we’re in the Program Committee as one of the few non-academics. We’re looking forward to read your proposal!

If you have a blog, feel free to copy the above text and repost it.

8 reasons why SPIR-V makes a big difference

From all the news that came out of GDC, I’m most eager to talk about SPIR-V. This intermediate language spir-vwill make a big difference for the compute-industry. In this article I’d like to explain why. If you need a technical explanation of what SPIR-V is, I suggest you first read gtruc’s article on SPIR-V and then return here to get an overview of the advantages.

Currently there are several shader and c ompute languages, which SPIR-V tries to replace/support. We have GLSL, HLSL for graphics shaders, SPIR (without the V), OpenCL, CUDA and many others for compute shaders.

If you have questions after reading this article, feel free to ask them in a comment or to us directly. Continue reading “8 reasons why SPIR-V makes a big difference”

The knowns and unknowns of the PEZY-SC accelerator at RIKEN

PEZY-SC_QuadPCB-1_smallThe green500 is out and one unknown processor takes the number one position with a huge improvement over last year. It is a new super-computer installed at RIKEN with an incredible 7 GFLOPS/Watt. It is powered by the processor-boards at the right: two Xeons, 4 PEZY-SC 1.4 accelerators and 128GB DRAM, which have a combined performance of about 6.2 TFLOPS. It has been designed for immersive cooling.

The second and third positions are also powered by the PEZY-SC, before we find the winner of last year: the AMD FirePro S9150 and a bit after that the rest (mostly NVidia Tesla). One constant is the CPUs used: Intel XEON is taking most. To my big surprise no ARM64.

green500_2015june_top5

From the third to the first PEZY-SC installation there is an improvement of 13%. It seems the first two are the new type, called “bricks”, while the third is the same as last year. Comparing with that super from last year (4.4945 GFLOPS/W) there is an improvement of 42% and 25%. The 13% improvement from the previous version is interesting enough, but the 25% improvement on exactly the same system raised questions. Probably it is due to compiler-optimisations. As the November-version of the Green500 is much more strict, it will be clear if the rules were bent – let’s hope it’s for real!

It supports OpenCL!

When new accelerators support OpenCL, it gets accepted more easily. So it is very interesting the PEZY-SC runs on OpenCL. I asked at ISC and got explained it was a subset of OpenCL, but could not get the finger on which subset, nor could I get access to test it. It does mean that code that would run well on this machine is easy to port. And then I mean the same “easy” Intel uses for explaining the easyness of porting OpenMP software to XeonPhi: PEZI-specific optimisations and writing around the missing functionality would still take effort – the typical stuff we do at StreamHPC.

RIKEN Shoubu

Some information on “Shoubu” (“Iris” in Japanese), the top 1 on the Green 500. According to the Green500 it is 353.8 TFLOPS (based on 50kW, using an actual benchmark). On 25 June RIKEN announced the Shoubu is 2 PFLOPS (theoretical). If the full machine is used for the Green500, then the efficiency was only 18%!

Below are some images of the installation.

shoubu2  shoubu3  shoubu1

Source: http://www.exascaler.co.jp/wp-content/uploads/2015/06/20150625.pdf

An important part is Exascaler’s immersion technology, what I understood is a spin-off of PEZY. I’m very curious what the AMD FirePro S9150 does when it uses immersion-cooling – I think we have to do some frying at the office to find out.

PEZY-SC1.4 and PEZY-SC2

PEZY started with a multi-core processor of 512 cores, the PEZY-1. The PEZY-SC has 1024 cores and has had a few gradual upgrades – currently PEZY-SC 1.4 (“the brick”) is installed.

PEZY-SC Specification:

Logic Cores(PE) 1,024
Core Frequency 733MHz
Peak Performance Floating Point. Single 3.0TFlops / Double 1.5TFlops
Host Interface PCI Express GEN3.0 x8Lane x 4Port (x16 bifurcation available)
JESD204B Protocol support
DRAM Interface DDR4, DDR3 combo 64bit x 8Port Max B/W 1533.6GB/s
+Ultra WIDE IO SDRAM (2,048bit) x 2Port Max B/W 102.4GB/s
Control CPU ARM926 dual core
Process Node 28nm
Package FCBGA 47.5mm x 47.5mm, Ball Pitch 1mm, 2,112pin

Source: http://pezy.co.jp/en/products/pezy-sc.html

Development on PEZY-SC2 is ongoing, which will have a staggering 4096 cores. Ofcourse efficiency has to go up (if the 18% is correct), to make this a good upgrade.

There is no promise on when the PEZY-SC2 will be announced, but it will certainly surprise us again hen it arrives.

Starting with Altera OpenCL-on-FPGAs

SVpressgraphicAltera has been very busy adding resources and has kicked off the beginning of June with opening up their OpenCL-program for the general public.
Only Stratix V devices are supported, but that could change later.

Below are all pages and PDFs concerning OpenCL I’ve found while searching Altera’s website.

Evaluation of CPUs, GPUs, and FPGAs as Acceleration Platforms

Altera wanted to know where they could compete with GPUs and CPUs. For a big company their comparisons are quite honest (for instance about their limited access-speed to memory), but they don’t tell everything – like the hours(!) of compilation-time. The idea is that you develop on a GPU and when it’s correct, you port the correctly working software to the FPGA.

If you don’t have any experience working with their FPGAs, best is to ask around.

Medical_OpenCL
Image taken from Altera website.

Continue reading “Starting with Altera OpenCL-on-FPGAs”

When Big Data needs OpenCL

Big Data in the previous century was the archive full of ring-binders/folders/ordners, which would grow each year at the same pace. Now the definition is that it should grow each year as much as all years before combined.

A few months ago SunGard named 10 Big Data trends transforming financial services. I have used their list as a base to have my own focus: on increased computation-demands and not specific for this one market. This resulted in 7 general trends where Big Data meets/needs OpenCL.

Since the start of StreamHPC we sought customers who could no compute through their whole data in time. Back then Big Data was still a buzz word catching on, but it best describes this one core businesses.

Continue reading “When Big Data needs OpenCL”

Rapid Performance Assessment

tesla-xeonphi-fireproYou might have heard about the major speed-ups GPUs and FPGAs have promised, but also about the fact that this speed-up will depend a lot on the type of software/algorithm. Investing in OpenCL or CUDA can therefore feel risky, since going in costs time and money, while keeping out can potentially give too much space to the competition. But if you want your customers to get the best experience without paying an unnecessary high price, you’ll need to know what the return of your investment could be. With this quick assessment we will help you determine exactly that.

What we’ve done before

Most assessments were on answering the question “How much speed-up can I get using GPUs?“. Other questions were:

  • Does this algorithm work on this specific mobile processor?
  • Can we better use CUDA, OpenCL or OpenGL shaders for this algorithm?
  • Does the HPC code run best on a Tesla K40 or FirePro S9150?
  • How many weeks/months would it take to port all code?
  • How many GPUs do I need for under 1 second responses?
  • Does this code port to an FPGA?
  • Which OpenCL device best suites by algorithm: CPU, GPU, APU, DSP, FPGA or something else?

Is your question in the list?

Program

Within a week we can fully analyse your code, or two weeks if the codebase is large or complex. During the assessment we write/port/optimise code, to be able to support our conclusions with numbers.

After the assessment you get an overview of the hotspots, an indication of total speed-up when using OpenCL (or comparable technology), and the answers to your questions.

Preparations

Send a mail to contact@streamhpc.com for more information, and we’ll call you back to talk about your requirements. Please provide times when you want to be called back.

[button text=”Contact form” url=”https://streamhpc.com/about-us/contact/” color=”red” target=”_self”]

Funded PhD internships at StreamHPC

We have several wishes for 2017 and two of them are to make code for the open source community. Luckily HiPEAC is interested in more collaboration between academia and industry and therefore funds PhD internships. There are 81 industrial PhD internships available and two are at StreamHPC.

What is this industrial PhD internship, you may ask? From the HiPEAC homepage:

The HiPEAC Industrial PhD Internship Programme offers PhD students a unique opportunity to experience the industrial research environment and to work on R&D projects solving real problems. To date the internship programme has resulted in several joint paper publications, patent applications and many students have been hired by the companies after completion of their PhDs.

 

The internships cover a 3-month period. Students should indicate when they will be available for an internship during 2016. When you apply for one of the internships, you must update your profile page including a link to your CV (preferably in PDF format).

Every intern receives €55 per day (€5000 for 3 months) + travel expenses (maximum €500). The main goal is to gain experience. Even if you don’t get a job after the internship, you tap into our network.

Continue reading “Funded PhD internships at StreamHPC”

An example of real-world, end-user OpenCL usage

We ask all our customers if we could use their story on our webpage. For competition reasons, this is often not possible. The people of CEWE Stiftung & Co. KGaA were so kind to share his experience since he did a OpenCL training with us and we reviewed his code.

Enjoy his story on his experience from the training till now!

This year, the CEWE is planning to implement some program code of the CEWE Photoworld in OpenCL. This software is used for the creation and purchase of photo products such as the CEWE Photobook, CEWE Calendars, greeting cards and other products with an installation base of about 10 million throughout Europe. It is written in Qt and works on Windows, Mac and Linux.

 

In the next version, CEWE plans to improve the speed of image effects such as the oil painting filter, to become more useful in the world of photo manipulation. Customers like to some imaging effects to improve photo products, to get even more individual results, fix accidentally badly focused photos and so on.

Continue reading “An example of real-world, end-user OpenCL usage”

Commodity and Open Standards – why OpenCL matters

V-UVThis article actually discusses the question: is GPGPU a solution for the masses, or is it for niche-products? For the latter open standards matter a lot less, as you will read.

If you watch the below video on sale&marketing by Victor Antonio, then you get what is so difficult about open standards: It pushes all companies using the standard into a focus on becoming the best. Indeed, survival of the fittest may be the base of (true) capitalism and giving the best products. Problem is that competition on price is not safe for the future of the company.

http://www.youtube.com/watch?v=SJ5QmW3LfN4

The key is specialisation, or creating unique value. The below video discusses this. The difference between “a feature” and “unique value” is a discussion on its own, you really should have with your team on your own products. Continue reading “Commodity and Open Standards – why OpenCL matters”

How expensive is an operation on a CPU?

Programmers know the value of everything and the costs of nothing. I saw this quote a while back and loved it immediately. The quote by Alan Perlis is originally about Perl LISP-programmers, but only highly trained HPC-programmers seem to have obtained this basic knowledge well. In an interview with Andrew Richards of Codeplay I heard it from another perspective: software languages were not developed in a time that cache was 100 times faster than memory. He claimed that it should be exposed to the programmer what is expensive and what isn’t. I agreed again and hence this post.

I think it is very clear that programming languages (and/or IDEs) need to be redesigned to overcome the hardware-changes of the past 5 years. I talked about that in the article “Separation of compute, control and transfer” and “Lots of loops“. But it does not seem to be enough.

So what are the costs of each operation (on CPUs)?

This article is just to help you on your way, and most of all: to make you aware. Note it is incomplete and probably not valid for all kinds of CPUs.

Continue reading “How expensive is an operation on a CPU?”