How we sped up a flooding simulation 35 times (from 32-core CPU to multi-GPU)

LymingtonFlood2002
Hampstead flooding

How water moves through an area given a certain pace of instream, can be fully simulated. We got a request to make such simulation faster, as it took already too much time to do moderate simulations. As the customer wanted to be able to have more details, larger areas and more alternative situations computed, the current performance did not suffice.

The code was already ported to MPI to scale to 8 cores. This code was used as a base for creating our optimised GPU-code. Using a single GPU we managed to get an 44 to 58 times speedup over single core CPU, which is 5 to 7 times faster than MPI on 8 to 32 CPU cores.

For larger experiments we could increase the performance advantage over MPI-code from 7 times to a total of 35 times, using multiple GPUs.

We solved both the weak-scaling problem and the mapping on GPUs

If you add the 9x speedup of the initial performance-optimisation, the total is over 2600x. What could be done in a year, now can be done in 3.5 hours. This clearly shows the importance of software performance engineering. Most code already had some optimisations applied (just like here) and 5 to 7 times speedup is quite achievable.

Read below for some more details. Continue reading “How we sped up a flooding simulation 35 times (from 32-core CPU to multi-GPU)”

Disruptive Technologies

Steve Streeting tweeted a few weeks ago: “Remember, experts are always wrong about disruptive tech, because it disrupts what they’re experts in.”. I’m happy I evangelise and work with such a disruptive technology and it will take time until it is bypassed by other technologies. And that other technologies will be probably be source-to-OpenCL-source compilers. At StreamHPC we therefore keep track of all these pre-compilers continuously.

Steve’s tweet got me triggered, since the stability-vs-progression-balance make changes quite hard (we see it all around us). Another reason was heard during the opening-speech of engineering world 2011 about “the cloud”, with a statement which went something like: “80% of today’s IT will be replaced by standardised cloud-solutions”. Most probably true; today any manager could and should click his/her “data from A to B”-report instead of buying a “oh, that’s very specialised and difficult” solution. But at the other side companies try to let their business live as long as possible. It’s therefore an intriguing balance.

So I came up with the idea to play my own devil’s advocate and try to disrupt GPGPU. I think it’s important to see what can disrupt the current parallel-kernel-execution model of OpenCL, CUDA and the others.

Continue reading “Disruptive Technologies”

OpenCL – the battle, part III

The first two parts described hardware-companies and operating systems, programming languages and software-companies, written about half a year ago. Now we focus on what has driven NVIDIA and ATI/AMD for decades: games.

Disclaimer: this is an opinion-piece on the current market. We are strong supporters of OpenCL and all companies which support it too. Since our advise on specific hardware in a consult will be based on specific demands on the customer, we could advise differently than would be expected on the below article.

Games

Computer games are cool; merely because you choose from so many different kinds. While Tetris will live forever, the latest games also have something to add: realistic physics simulation. And that’s what’s done by GPUs now. Nintendo has shown us that gameplay and good interaction are far more important than video-quality. The wow-factor for photo-realistic real-time rendering is not as it was years ago.
You might know the basics for falling objects: F = m*g (Force = Mass times Gravity-acceleration), and action = – reaction. If you drop some boxes, you can predict falling speed, interaction, rotation and possible change of centre of gravity from a still image as a human being. A computer has to do a lot more to detect collision, but the idea is very doable on a fast CPU. A very well-known open source library for these purposes is Bullet Physics. The nice thing comes, when there is more than just a few boxes, but thousands of them. Or when you walk through water or under a waterfall, see fire and smoke, break wood but bend metal, etc. The accelerometer of the iPod was a game-changer too in the demand for more realism in graphics. For an example of a “physics puzzle game” not using GPGPU see World of Goo (with free demo) – for the rest we talk more about high-end games. Of current game-ready systems PCs (Apple, Linux and Windows) have OpenCL support, Sony PlayStation 3 is now somewhat vague and the Xbox 360 has none.

The picture is from Crysis 3, which does not use OpenCL, as we know it.

Continue reading “OpenCL – the battle, part III”

video: OpenCL on Android

Michael-Leahy-talk-videoMichael Leahy spoke on AnDevCon’13 about OpenCL on Android. Enjoy the overview!

Subjects (globally):

  • What is OpenCL
  • 13 dwarfs
  • RenderScript
  • Demo

http://www.youtube.com/watch?v=XQCYWmYCJWo

Mr.Leahy is quite critical about Google’s recent decisions to try to block OpenCL in favour of their own proprietary RenderScript Compute (now mostly referred to as just “RenderScript” as they failed on pushing twin “RenderScript Graphics”, now replaced with OpenGL).

Around March ’13 I submitted a proposal to speak about OpenCL on Android at AnDevCon in November shortly after the “hidden” OpenCL driver was found on the N4 / N10. This was the first time I covered this material, so I didn’t have a complete idea on how long it would take, but the AnDevCon limit was ~70 mins. This talk was supposed to be 50 minutes, but I spoke for 80 minutes. Since this was the last presentation of the conference and those in attendance were interested enough in the material I was lucky to captivate the audience that long!

I was a little concerned about taking a critical opinion toward Google given how many folks think they can create nothing but gold. Afterward I recall some folks from the audience mentioning I bashed Google a bit, but this really is justified in the case of suppression of OpenCL, a widely supported open standard, on Android. In particular last week I eventually got into a little discussion on G+ with Stephen Hines of the Renderscript team who is behind most of the FUD being publicly spread by Google regarding OpenCL. One can see that this misinformation continues to be spread toward the end of this recent G+ post where he commented and then failed to follow up after I posted my perspective: https://plus.google.com/+MichaelLeahy/posts/2p9msM8qzJm

And that’s how I got in contact with Micheal: we both are irritated by Google’s actions against our favourite open standards. Microsoft has long learned that you should not block, only favour. But Google lacks the experience and believes they’re above the rules of survival.

Apparently he can dish out FUD, but can’t be bothered to answer challenges to the misinformation presented. Mr. Hines is also the one behind shutting down commentary on the Android issue tracker regarding the larger developer communities ability to express their interest in OpenCL on Android.

Regarding a correction. At the time of the presentation given the information at the time I mentioned that Renderscript is using OpenCL for GPU compute aspects. This was true for the Nexus 4 and 10 for Android 4.2 and likely 4.3; in particular the Nexus 10 using the Mali GPU from Arm. The N4 & N10 were initially using OpenCL for GPU compute aspects for Renderscript. Since then Google has been getting various GPU manufacturers to make a Renderscript driver that doesn’t utilize OpenCL for GPU compute aspects.

I hope you like the video and also understand why it remains important we keep the discussion on Google + OpenCL active. We must remain focused on the long-term and not simply accept on what others decide for us.

Q&A with Adrien Plagnol and Frédéric Langlade-Bellone on WebCL

WebCL_300WebCL is a great technique to have compute-power in the browser. After WebGL which gives high-end graphics in the browser, this is a logical step on the road towards the browser-only operating system (like Chrome OS, but more will follow).

Another way to look at technologies like WebCL, is that it makes it possible to lift the standard base from the OS to the browser. If you remember the trial of Microsoft’s integration of Internet Explorer, the focus was on the OS needing the browser for working well. Now it is the other way around, but it can be any OS. This is because the push doesn’t come from below, but from above.

Last year two guys from Lyon (South-France) got quite some attention, as they wrote a WebCL-plugin. Their names: Adrien Plagnol and Frédéric Langlade-Bellone. Below you’ll find a Q&A with them on WebCL. Enjoy! Continue reading “Q&A with Adrien Plagnol and Frédéric Langlade-Bellone on WebCL”

Will OpenCL work for me?

OpenCL_LogoOpenCL can accelerate your software multiple factors, but… only if the data and the software are fit.

The same applies to CUDA and other GPGPU-methods.

Get to know if you can speed up your software with OpenCL in 4 steps.
[columns]
[one_half title=”1. Lots of repetitions”]
The main focus to find code that can be run in parallel is finding loops that take relatively much time. If an action needs to be done for each part of the data-input, then the code certainly contains a lot of loops. You can go to the next step.

If data goes through the code from A to B in a straight line without many loops, then there is a very low chance that computing-speed is the bottle-neck. A faster network, better caching, faster memory and such should be first looked into.
[/one_half]
[one_half title=”2. No or few dependencies”]
If in loops there are no dependencies on the previous step, then you can go to the next step.

As low interdependencies do not matter for single-core software, this was not an important developer’s focus even five years ago. Know there are many new algorithms now, which decrease loop-internal dependencies. If your software has been optimised for several processors or even a cluster, then the step to OpenCL is much smaller.

For example search-problems can be sped up by dividing the data between many processes. Even though the dependency is high within the thread, the dependency on the other threads is very low.
[/one_half]

[/columns]

[columns]

[one_half title=”3. High predictability to avoid branching”]

Computations need to be as predictable as possible, to get the highest speed-up. That means the code within the loops needs to have no or few branches. That is code without statements like if, while or switch. This is because GPUs work better if the whole processor does the same. So if you now have many threads which all do different things, then a CPU is still the best solution. Like for decreasing dependencies from step two, in many cases redesigning the algorithm can result in performing GPU-code.

[/one_half]

[one_half title=”4. Low Data-transport overhead”]

In step 1 you looked for repeated computations. In this last step we look at the ratio between computations and data-size.

If the computations per data-chunk is high, then using the GPU is a good solution. A simple way to find out if a lot of computations are done is to look at CPU-usage in the system monitor. The reason is that data needs to be transferred to and from the GPU, which takes time even with 3 to 6 GB throughput per second.

When computations per data-chunk is low, doubling of speed is still possible when OpenCL is used on CPUs. See the technical explanation how OpenCL on modern CPUs work and can even  outperform a GPU.

[/one_half]
[/columns]


Does it fit?

Found out OpenCL is right for you? Contact us immediately and we can discuss how we can make your software faster. Not sure? Request a code-review or Rapid OpenCL Assessment to quickly find out if it works.

Do you think openCL is not the solution, but still processing data at the limits of your system? Feel free to contact us, as we can give you feedback for free on how to solve your problem with other techniques.

More to read on our blog

OpenCL is supported on many CPUs and GPUs. See this blog article to have an extensive overview of hardware that supports OpenCL.

A list of application areas where OpenCL can be used is written down here.

Finally there is aso a series on parallel programming theories, which explain certain theories behind OpenCL.

Do you have GPU-brains? A poster-initiative.

This is a message to GPU-programmers only.

It is a simple question, and has many answers: what are GPU-brains? How is it possible your brain can code GPUs and only few friends and colleagues understand what you are doing? Is it thinking in parallel, focusing on one kernel and having the architecture in the back of the head. Is it simple loop-unrolling? Is it a web of thoughts? Is it just cool, as not many people can do it?

Continue reading “Do you have GPU-brains? A poster-initiative.”

Gratis kennislunch bij u op locatie in Nederland/België

Steel pipesDutch only

Zouden u en uw collega’s meer willen weten van termen zoals GPGPU, OpenCL en massive multi-core? Wilt u zich voorbereiden op de competitieve markt doordat software ineens enkele enkele factoren sneller kan zijn indien juist geprogrammeerd?

Afgezien dat we er eigenlijk niet meer omheen kunnen, leveren multi-core processoren met honderden cores enkele voordelen op. Als software geschreven is voor vele lichte processoren in plaats van een paar grote, dan is opschalen eenvoudiger. Dit is ook een verklaring waarom grafische kaarten een rekenkundig probleem enkele factoren sneller doorrekenen dan een gewone processor. Nog een voordeel is dat flexibel opschalen voor eenvoudigere en betere energie-besparing zorgt. Het plaatje met de buizen is bewust gekozen: we begonnen met een enkele core tot eind jaren ’90, kregen begin 21ste eeuw meerdere cores en nu krijgen we nog meer cores – denk aan 500 of meer cores.

In maximaal een uur tijd wordt een overzicht gegeven van de nieuwe mogelijkheden aan de hand van praktijkvoorbeelden die bij uw sector passen. Loopt u tegen de limieten van dataverwerking aan? En wilt u weten of de nieuwe type processoren uw probleem kunnen oplossen? Dan is dit de kans om informatie in te winnen. Anderen die dezelfde informatie eerder hebben ontvangen, gaven aan dat ze vooraf niet het idee hadden dat er zoveel nieuwe mogelijkheden op processor-gebied zijn bijgekomen, en dat ze nu veel IT-nieuws rondom cloud-computing en big-data beter konden plaatsen.

Interesse? Wij komen graag bij u langs. Stuur een mail naar kennislunch@StreamHPC.nl en wij nemen contact met u op, of u belt met 06-45400456 om direct een afspraak te maken.

StreamHPC is gespecialiseerd in het maximaliseren van de reken-snelheid in dataverwerking. We bieden trainingen in OpenCL en leveren versnelde software op maat. Deze kennis-lunch zien wij als een prettige manier om met u kennis te maken – u bent niet verplicht tot afname van diensten bij ons of onze partners.

Code Review

Code reviews are one of the fastest ways to get the dev-team back on track in order to add performance to the code. We offer two types of code reviews, all safely under an NDA. This way you keep in control of the development, while getting expert-knowledge in.

A quick scan gives you an overview of the main ways to speed up the code and how it can be done.
This quick scan can be delivered in one week, if necessary, to give you the direction you may require in times of pressure.

Also, an extensive code review can provide all the necessary information for a redesigned architecture.

GPU-code (OpenCL, CUDA, Aparapi, and more)

Writing GPU-code and performing host-code can be tricky. The best method to learn CUDA or openCL is by doing. Nevertheless, you may need feedback sometimes to be sure you’re doing the right thing. We can check your code and give you a report with hand-on tricks to make it optimal.

CPU-code (Java, C, C++ and more)

Many CPU-codes, like Java, C, C++ and C# are written with functionality in mind, but not performance. Adding performance (cache-optimisation, memory-usage reduction, parallelisation of computations, adding OpenMP-threads, etc) is quite doable, but only when you know how. We can help you increase performance of the software through feedback and clear steps.

Let us help you!

If you are interested in this service, request more information today and we will get back to you as soon as possible. Of course, you can also contact us via phone (+31 6 45400456), or e-mail (info@streamhpc.com).

Rapid Performance Assessment

tesla-xeonphi-fireproYou might have heard about the major speed-ups GPUs and FPGAs have promised, but also about the fact that this speed-up will depend a lot on the type of software/algorithm. Investing in OpenCL or CUDA can therefore feel risky, since going in costs time and money, while keeping out can potentially give too much space to the competition. But if you want your customers to get the best experience without paying an unnecessary high price, you’ll need to know what the return of your investment could be. With this quick assessment we will help you determine exactly that.

What we’ve done before

Most assessments were on answering the question “How much speed-up can I get using GPUs?“. Other questions were:

  • Does this algorithm work on this specific mobile processor?
  • Can we better use CUDA, OpenCL or OpenGL shaders for this algorithm?
  • Does the HPC code run best on a Tesla K40 or FirePro S9150?
  • How many weeks/months would it take to port all code?
  • How many GPUs do I need for under 1 second responses?
  • Does this code port to an FPGA?
  • Which OpenCL device best suites by algorithm: CPU, GPU, APU, DSP, FPGA or something else?

Is your question in the list?

Program

Within a week we can fully analyse your code, or two weeks if the codebase is large or complex. During the assessment we write/port/optimise code, to be able to support our conclusions with numbers.

After the assessment you get an overview of the hotspots, an indication of total speed-up when using OpenCL (or comparable technology), and the answers to your questions.

Preparations

Send a mail to contact@streamhpc.com for more information, and we’ll call you back to talk about your requirements. Please provide times when you want to be called back.

[button text=”Contact form” url=”https://streamhpc.com/about-us/contact/” color=”red” target=”_self”]

Feedback & Privacy

thankyouThis field is huge and ever-changing, which means that certain old posts might need to get updated information. Also, most of our team is made by humans: a species famous for making all kinds of mistakes.

We care about what you say!

"Feedback is the breakfast of champions."
--Ken Blanchard
  • We are not native English speakers. Did we say something strange?
  • Is the site somewhat too technical or is it missing a bit of hard-core code examples?
  • Is some important information missing?
  • Is the publishing of the book or Eclipse-plugin taking too long?
  • Is the site too slow? (or broken in any other way?)
  • Do you have any compliments? We blush easily!

Tell us what you think by using the contact-page or sending an e-mail to feedback@streamhpc.com.

Privacy

We track the pages you visit with Piwik and Google Analytics, and we use this information to improve our webpage. For Google Analytics there is an opt-out tool, to be excluded from any webpage that uses it. For Piwik you have the choice to opt-out by using the form below. If you opt-out, please be kind to give us feedback on how we can improve our page.

Khronos OpenCL presentation at SIGGRAPH 2010

Here you find the videos uploaded by Khronos of their presentation about OpenCL. I added the time-line, so you can scroll to the more interesting parts easily. The presentation by Ofer Rosenberg of Intel and Cliff Woolly of NVIDIA were not uploaded (yet). Please note that for non-American people the speech of Affi Munchie is hard to hear; luckily his sheets explain most.

http://www.youtube.com/watch?v=BdZFtcQ2LYw

For the first two presentations the sheets can be downloaded from the Khronos-website. The time-line has the sheet-numbers mentioned.

0:00 [sheet 1] Presentation by the president of Khronos and chair of the session: Neill Trevett of NVIDIA.
0:06 [sheet 2] Welcome and a quick overview
1:12 [sheet 3] The prizes for the attendees (not us, online viewers)
1:40 [4] Overview of all members of Khronos. Khronos does not only take care of OpenCL but also the more famous OpenGL and projects like Collada.
2:26 [5] Processor Parallelism. CPUs are getting more parallel and GPUs more programmable. The overlapping area is called Heteregenous Computing and there is where OpenCL pops up.
3:10 [6] OpenCL timeline from version 1.0 to 1.1.
4:44 [7] OpenCL workinggroup with only 30 logos. He mentions missing logos like the one from Apple.
5:18 [8] The Visual Computing Ecosystem, where OpenCL interoperability with other standards are shown. The talk is not complete, so I don;t know if he talks about DirectX.

Continue reading “Khronos OpenCL presentation at SIGGRAPH 2010”

OpenCL tutorial videos from Mac Research

macresearchA while ago macresearch.com stopped from existing, as David Gohara pulled the plug. Luckily the sources of a very nice tutorial were not lost, and David gave us permission to share his material.

Even if you don’t have a MAC, then these almost 5 year old materials are very helpful to understand the basics (and more) of OpenCL.

We also have the sources (chapter 4, chapter 6) and the collection of corresponding PDFs for you. All material is copyright David Gahora. If you like his style, also check out his podcasts.

Introduction to OpenCL

http://www.youtube.com/watch?v=oc1-y1V1TPQ

OpenCL fundamentals

http://www.youtube.com/watch?v=FrLqSgYyLQI

Building an OpenCL Project

http://www.youtube.com/watch?v=K7QiD74kMvU

Memory layout and Access

http://www.youtube.com/watch?v=oPE3ypaIEv4

Questions and Answers

http://www.youtube.com/watch?v=9rA6DypMsCU

Shared Memory Kernel Optimisation

http://www.youtube.com/watch?v=oFMPWuMso3Y

Did you like it? Do you have improvements on the code? Want us to share more material? Let us know in the comments, or contact us directly.

Want to learn more? Look in our knowledge base, or follow one of our  trainings.

 

Professional and Consumer Media Software using OpenCL

OpenCL_Logo

More and more professional media software now has support for OpenCL. It starts to be a race where you cannot stay behind. If the competitor runs more than twice as fast on the same hardware, then you just can’t say “Sorry, you should buy NVIDIA hardware”. I expected this to happen, but could not tell in what industry they would run fastest. Seems it is fluid dynamics, video-editors and photo-editors.

AMD and Intel mostly have been selected as collaboration partners. Apple has been a main drive, especially with the introduction of their new MAC Pro with two high-end AMD FirePro GPUs.

Sony Catalyst Family

Sony released three new software packages to support video professionals in pre- and post-production.

Sony-catalyst

This new family of products, Catalyst Browse (media management), Catalyst Prepare (video preproduction assistant) and Catalyst Edit (4K and Sony RAW video editing) has OpenCL support from the start.

Colorfront Express Dailies and On-Set Dailies

This software is an on-set dailies processing system (playback and sync, QC, colour grading, audio and metadata management).

The 2014 versions have OpenCL support in their transcoder plugin, Transkoder.

CGE05_a_OnsetDailies

RED REDCINE-X PRO

redcine-x

REDCINE-X is a coloring toolset, integrated timeline, and post effects collection in a professional, flexible environment for your 4K or 5K .R3D files. RED has added support for OpenCL in build 22.

The Foundry Nuke Blink framework

As presented on GPUconf, The Foundry has opened their framework for running OpenCL kernels. It creates OpenCL-kernels (optimised for AMD or NVIDIA) from C++ Blink kernels.

nukestudio

NUKE studio is a node-based VFX, editorial and finishing studio. As with most products on this page, look for the “reel” to get a nice demo of its capabilities.

Magix Hybrid Video Engine

Video Deluxe and Movie Edit both have OpenCL support since 2012, thanks to the new shared video engine.

http://www.youtube.com/watch?v=27M7vJIYR3c

Adobe CS6 creative suite

Adobe has entered the OpenCL market publicly with . With Premiere Pro (video editing) and Photoshop (photo-editing) two main products with advanced GPU-acceleration via OpenCL.

http://www.youtube.com/watch?v=F3LwNT1QUPQ

Video on GPU-effects on Premiere Pro CS5.

FAQ on GPU-acceleration on Photoshop CS6.

Sony Vegas Pro

Vegas Pro is a video editing software package for non-linear editing systems, and has OpenCL support since version 10d. Also in the consumer version (Sony Movie Studio) there is OpenCL-support.

Sony-Vegas-Pro-13

RealFlow Hybrido2 engine

RealFlow is fluid dynamics software and its new engine Hybrido2 has support for OpenCL since this year. And you just have to love their commercial videos.

http://www.youtube.com/watch?v=Fj0err96BbQ

Autodesk Maya

Maya is a toolsets to help create and maintain the modern, open pipelines you need to address today’s challenging 3D animation, visual effects, game development and post-production projects. Since the 2013 version it is accelerated for physics simulations via Bullet and OpenCL.

http://www.youtube.com/watch?v=36bIdH6EBkM

ArcSoft SimHD and Sim3D engine

ArcSoft media-engines SimHD and Sim3D have OpenCL support since several years and are used in several of their  products.

http://www.youtube.com/watch?v=tvXyLKEeX2I

simHD

BlackMagic Design

BMD has two suites which use OpenCL, Resolve and Fusion. DaVinci was acquired in 2009 and EyeOn in 2014.

(DaVinci) Resolve

Resolve has real-time colour correction thanks to OpenCL.

http://www.youtube.com/watch?v=lfrudtCTwv0

(Eyeon) Fusion

EyeonFusionScreenshotSmall

Fusion is an image compositing software program created by eyeon Software Inc. It is typically used to create visual effects and digital compositing for film, HD and commercials.

It uses OpenCL since version 6.

Roxio Creator Suite

Roxio uses OpenCL for accelerated rendering in their suite. They were one of the first to implement OpenCL – I think already in 2010, before OpenCL was even cool.

boxshot-creator

Unluckily they don’t have much information – just a mention that they have support.

Apple Final Cut Pro and iMovie

Apple has support in Final Cut Pro X, Motion 5 and Compressor 4.

finalcutprox_magnetic

Also iMovie works a lot faster when you have an OpenCL capable MAC.

Blender Cycles & Bullet

You cannot find any demonstration of new video hardware without Big Bucks Bunny, the short CG movie created with Blender.

It uses OpenCL in two parts: physics simulations (Bullet) and compositor (Cycles).

http://www.youtube.com/watch?v=QbzE8jOO7_0

Side Effects Houdini

Houdini is a procedural node based 3D animation and visual effects tools for film, broadcast, entertainment and visualisation production.

http://vimeo.com/46444204

zMatte_4bDigitalFilmTools

There is support for OpenCL in zMatte, Composite Suite Pro and Film Stocks since Q4 2013.

zMatte is a keyer for blue and green screen composites. Composite Suite Pro is a collection of visual effects plug-ins. Film Stocks simulates color and black and white still photographic film stocks, motion picture films stocks and historical photographic processes.

OTOY OctaneRender 3

OctaneRender is a GPU-based, real-time 3D, unbiased rendering application. In March 2015 OTOY announced OctaneRender 3, which has full OpenCL support:

OpenCL support: OctaneRender 3 will support the broadest range of processors possible using OpenCL to run on Intel CPUs with support for out-of-core geometry, OpenCL FPGAs and ASICs, and AMD GPUs.

Below is a reel of OcateRender 2 with CUDA. According to OTOY the performance on AMD and NVidia is comparable.

https://www.youtube.com/watch?v=gLSBVt0VQSI

SAM Alchemist XF

SAM-alchemist-XF

Alchemist XF supports format and framerate conversion from SD up to 4K for a wide variety of file formats at high speed.

More?

There is a lot more OpenCL-powered software coming up rapidly (we hear things). But we also missed (or accidentally forgot) software. Please help making this list complete and send us an email.

When Big Data needs OpenCL

Big Data in the previous century was the archive full of ring-binders/folders/ordners, which would grow each year at the same pace. Now the definition is that it should grow each year as much as all years before combined.

A few months ago SunGard named 10 Big Data trends transforming financial services. I have used their list as a base to have my own focus: on increased computation-demands and not specific for this one market. This resulted in 7 general trends where Big Data meets/needs OpenCL.

Since the start of StreamHPC we sought customers who could no compute through their whole data in time. Back then Big Data was still a buzz word catching on, but it best describes this one core businesses.

Continue reading “When Big Data needs OpenCL”

Power to the Vector Processor

Reducing energy-consumption is “hot”

After reading this article “Nvidia is losing on the HPC front” by The Inquirer which mixes up the demand for low-power architectures with the other side of the market: the demand for high performance. It made me think that it is not that clear there are two markets using the same technology. Also Nvidia has proven it to be not true, since the super-computer “Nebuale” uses almost half the watts per flop as the #1. How come? I quote The Register from an article of one year old:

>>When you do the math, as far as Linpack is concerned, Jaguar takes just under 4 watts to deliver a megaflops at a cost of $114 per megaflops for the iron, while Nebulae consumes 2 watts per megaflops at a cost of $39 per megaflops for the system. And there is little doubt that the CUDA parallel computing environment is only going to get better over time and hence more of the theoretical performance of the GPU ends up doing real work. (Nvidia is not there yet. There is still too much overhead on the CPUs as they get hammered fielding memory requests for GPUs on some workloads.)<<

Nvidia is (and should) be very proud. But actually I’m already looking forward when hybrids get more common. They will really shake up the HPC-market (as The Register agrees) in lowering latency between GPU and CPU and lowering energy-consumption. But where we can find a bigger market is the mobile market.

Continue reading “Power to the Vector Processor”

Funded PhD internships at StreamHPC

We have several wishes for 2017 and two of them are to make code for the open source community. Luckily HiPEAC is interested in more collaboration between academia and industry and therefore funds PhD internships. There are 81 industrial PhD internships available and two are at StreamHPC.

What is this industrial PhD internship, you may ask? From the HiPEAC homepage:

The HiPEAC Industrial PhD Internship Programme offers PhD students a unique opportunity to experience the industrial research environment and to work on R&D projects solving real problems. To date the internship programme has resulted in several joint paper publications, patent applications and many students have been hired by the companies after completion of their PhDs.

 

The internships cover a 3-month period. Students should indicate when they will be available for an internship during 2016. When you apply for one of the internships, you must update your profile page including a link to your CV (preferably in PDF format).

Every intern receives €55 per day (€5000 for 3 months) + travel expenses (maximum €500). The main goal is to gain experience. Even if you don’t get a job after the internship, you tap into our network.

Continue reading “Funded PhD internships at StreamHPC”

Events&Talks

StreamHPC gives talks at public and in-company events to explain what GPU-programming is, while focusing on the day’s theme.

You are welcome to attend these days, or you can request a talk about OpenCL and GPU-programming to be given at your event.

Agenda Talks

At the events in the list below Vincent Hindriksen will give a talk, or has given a talk.

[table id=1 /]

Reservations

For reservations and requests, please mail to events@streamhpc.com.

Agenda Events

We are visiting or have visited the following events. This is perfect if you want to have a quick discussion with us.

[table id=3 /]

ImageJ and OpenCL

For a customer I’m writing a plugin for ImageJ, a toolkit for image-processing and analysis in Java. Rick Lentz has written an OpenCL-plugin using JOCL. In the tutorial step 1 is installing the great OS Ubuntu, but that would not be the fastest way to get it going, and since JOCL is multi-platform this step should be skippable. Furthermore I rewrote most of the code, so it is a little more convenient to use.

In this blog-post I’ll explain how to get it up and running within 10 minutes with the provided information.

Continue reading “ImageJ and OpenCL”

Do your (X86) CPU and GPU support OpenCL?

Does your computer have OpenCL-capable hardware? Read on and find out if your computer is compatible…

If you want to know what other non-PC hardware (phones, tablets, FPGAs, DSPs, etc) is running OpenCL, see the OpenCL SDK page.

For people who only want to run OpenCL-software and have recent hardware, just read this paragraph. If you have recent drivers for your GPU, you can be sure OpenCL is already supported and you can run OpenCL-capable software. NVidia has support for OpenCL 1.1 since drivers 280.13, so if you need OpenCL 1.1, then make sure you have this version or later. If you want to use Intel-processors and you don’t have an AMD GPU installed, you need to download the runtime of Intel OpenCL.

If you want to know if your X86 device is supported, you’ll find answers in this article.

Often it is not clear how OpenCL works on CPUs. If you have a 8 core processor with double threading, then it mostly is understood that 16 pipelines of instructions are possible. OpenCL takes care of this threading, but also uses parallelism provided by SSE and AVX extension. I talked more about this here and here. Meaning that an 8-core processor with AVX can compute 8 times 32 bytes (8*8 floats or 8*4 doubles) in parallel. You could see it as parallelism of parallelism. SSE is designed with multimedia-operations in mind, but has enough to be used with OpenCL. The minimum requirement for OpenCL-on-a-CPU is SSE 4.2, though.

A question I see often is what to do if you have more devices. There is no OpenCL-package for all the available devices, so you then need to install drivers for each device. CPU-drivers are often included in the GPU-drivers.

Read on to find out exactly which processors are supported.

Continue reading “Do your (X86) CPU and GPU support OpenCL?”