The 12 latest Twitter Poll Results of 2018

Reading Time: 8 minutes

Via our Twitter channel we have various polls. Not always have we shared the full background of these polls, so we’ve taken the polls of the past half year and put them here. The first half of the year there were no polls, in case you wanted to know.

As inclusive polls are not focused (and thus difficult to answer), most polls are incomplete by design. Still insights can be given. Or comments given.

Below’s polls have given us insight and we hope they give you insights too how our industry is developing. It’s sorted on date from oldest first.

It was very interesting that the percentage of votes per choice did not change much after 30 votes. Even when it was retweeted by a large account, opinions had the same distribution.

Is HIP (a clone of CUDA) an option?

AMD has worked on their implementation of CUDA for quite some time. It’s rather simple to do 80% of the compiler part, but then come the weird functions that might only be there for backwards compatibility. Add to that the libraries which needed to be optimised for AMD GPUs.

It’s July 2018 and the time to port software using a Python-based tool called “hipify” is a lot less then when the tool was first created. But how is the current status observed? AMD might see it as an option, but do others share that idea?

50% Always Nvidia with CUDA
11% Always AMD with HIP
39% The fastest GPU with HIP
28 votes

The fastest GPU is obviously the best solution, and would have be much higher if it said “The fastest GPU with CUDA”. I wanted to know how HIP was observed, so framed it into CUDA and HIP. And you see HIP is not seen as a good solution, increasing the votes for Nvidia.

Out of scope: SYCL is the answer from OpenCL to make a real alternative to CUDA.

Intel discontinues Xeon Phi

We’ve been a (sometimes loud) non-fan of the Xeon Phi, as sarcasm was used when benchmark results were discussed. Meanwhile various scientific papers were written with great info on the accelerator, and numerous HPC-centers were happy to build these in. We remained skeptic and assumed Intel gave those cards away for free (or even paid for the electricity costs). We were hopeful to see the results of the socketed XeonPhi and were ready to invest in it (a complete system for developers was €4000), but Intel gave up and pulled the plug.

There is no better moment to ask if others when the whole line is discontinued. To keep the focus on Intel, the question was not focused on competition.

44% It simply did not perform
11% Replaced by FPGAs
29% Replaced by 20+ core CPUs
16% No, it worked perfectly..
55 votes

The results are clear, that we were not the only ones who disliked the architecture.

FPGAs are unexpected, as we’ve seen even worse results on FPGAs when porting the XeonPhi-code, even when taking a lot of time to fully optimise the code for the architectures. But indeed there are a (very) few examples where FPGAs are a better option. But so there are people who said the XeonPhi performed perfectly – simply cases where we did not work on ourselves.

Multi-core CPUs are actually quite close to the Xeon Phi, as the architecture was sometimes described as nothing more than 72 Pentium cores. As the latest CPU-architectures

So please, dear researchers, stop writing great articles about new accelerator-cards, but always (read: ALWAYS) benchmark your algorithms on various other (recent) GPUs/accelerators. No access to such machines? Ask Professor Simon McIntosh-Smith’s group and/or us, to benchmark your code.

Shitty code is impossible to fix?

When searching articles around code quality, I came to this statement. It’s not an exact quote, but I wanted to make it a statement.

91% Agree
9% Disagree
47 votes

Reddit source.

Most coders who worked with the so called “shitty code” think that in most cases only a rewrite will work. So when the request is to make the code faster with something like magic, then we only have the overclocking opportunities left. If that does not work, then nothing else than rewriting works.

Shitty code is not bad on itself. Delaying rewriting is.

Intel on 10nm – Is there an escape route?

Lots of rumours that Intel could not get their 10nm, while the others could, even delaying their 10nm products to 2020. Serious news, if true. Given that Intel’s leadership was not really good for quite some years, the poll was also a bit about how one thought this would change on the short term.

Using another foundry would solve the problem that was predicted (and as time of writing coming out): IBM, AMD, ARM, everybody getting a chance to get a larger part of the cake.

53% Patience till 2020
15% Surprise breakthrough
6% Use TSMC, GF or Samsung
26% Something else
34 votes

Patience is not the Intel I used to know, but neither such delays in technical advancements.

Metal only on MacOS

So Apple kicked out OpenCL, and was pretty clear about what developers should do: rewrite everything in Metal. As there is not even porting-tool made available this was quite unfriendly. So what did the people think, who should do the work?

13% Yes
67% No
13% Yes, via Vulkan+MoltenVK
7% OSS, not by me
45 votes
Those who only had a few kernels, could quickly port it and would not have much work in maintenance. Those who found MacOS important for their software, would do the porting. Several put their hope on Vulkan-on-top-of Metal. A large part simply objected.

Intel stuck on 10nm – should they skip 10nm?

Intel always had the technical advantage, but that era seems to be over.The rumour is this time they realised themselves that they’re lagging and 10nm efforts have stopped, but soon Intel officially reacted that this was false information.

A rumour starts often by unsatisfied (higher up) employees, so we do can conclude that this discussion has actually taken place.

While they’re being attacked from all sides, I’m a bit surprised they don’t seem to show strong leadership. All I see is this “AI software also runs on our processors”, which doesn’t sound great to me. I love it when hardware companies understand that software is as important, but when hardware gets secondary then I think something is very wrong there.

48% Yes
14% No
38% It’s complicated
98 votes

Skipping 10nm is not as simple as “skipping 32nm” would be, but I see the high number as a cry for Intel getting their act together.

Who’s the master of tools?

Nvidia is well-known for their tools. But Intel, AMD and the others have not sit still. Due the maximum of 4 options, I focused on server/desktop GPUs.

There are two ways of using tools. As a beginner, when you have not really an idea where to look. And as a senior, when working on a large project. When you are doing MPI+GPU, then good tools are crucial.

67% Nvidia
30% AMD
0% Intel
3% Other
30 votes

As expected Nvidia came out highest. AMD has built up trust with ROCm. Intel has quite a good suite, but got an unexpectedly low 0 votes. It might be because its products are paid for, or because Intel GPUs (or Xeon Phis) are not used in the large-sized installations (as was the question)?

Anyway, Nvidia remains the example of how it should be done.

How should we share our knowledge?

Sharing our knowledge actually does not reduce the number of projects and has a good chance we’re educating a future colleague. We’re therefore confident we should keep sharing our knowledge. but what knowledge exactly? I think the most important are the techniques, then the language. Still the language is a good part of the technique.

I was curious if CUDA+HIP would get more votes than Nvidia alone, as there are good reasons to make code work on GPUs from both vendors. Second I was curious how OpenCL would do in comparison. “Generic GPGPU” was added as a “other” vote.

21% Nvidia CUDA + AMD HIP
48% OpenCL on GPUs+CPUs
22% Generic GPGPU
9% Nvidia CUDA only
58 votes

A remark was “SYCL + OpenCL”, which indeed is a good option.

As we’ve started with OpenCL, this could have influence on the number of votes. We do a good share of CUDA-projects, but that’s not a well-known fact.

RedHat being bough by IBM

We’re not really fond of RedHat/CentOS. The projects that required this Linux the time put in work-arounds and hacking was higher than Windows-projects, and far higher than projects that required a modern Linux. We like to put our time on the important things, not OS-related problems. So when it got known that IBM bought RedHat, the obvious question was “why?”. It was said the Cloud was important, but for that you would not need to buy a complete distribution. In the end, the question would also be on the Cloud, if users are happy with their Linux distribution.

15% Red Hat / CentOS
72% Debian / Ubuntu / Mint
11% Arch / Manjaro
2% Suse / OpenSuse
61 votes

Intel/Altera support their FPGAs only on CentOS/CentOS, so the results could be skewed by those who got used to RedHat. It’s higher than we expected, as we’ve not spoken any power-user who likes the distribution or used a more positive description than “it does the job”.

How AMD’s offering is observed

I got curious how AMD then was observed as a whole, and thus sent out a broad question.

3% Already far better
10% Truly competative
18% Closing in fast
69% Still year(s) behind
73 votes

One reaction was clear: two years in Gaming, a year in compute.

I had a follow-up question on AMD’s software, but they released their new CPU “Rome” and 7nm GPU “MI60” this week, so a bit of bad timing.

It came to my ear a lot that most people don’t know what AMD is actually doing. Adding the strong focus on CUDA by researchers (who got free GPUs from Nvidia), and the lack of strong messaging by AMD, and you get a lot of “I didn’t know”s.

It seems to be a large part around AMD’s software offerings, and since a lot has changed this seems to be because of inefficient marketing. According to a message on a forum HIP is a tool to converts CUDA to OpenCL, as AMD can only run OpenCL. As nobody corrected the claim, it seems to be the common knowledge.

The news of “new CUDA version released!”

With CUDA 9 and 10 there was no “I can’t wait to get my hands on it”, except by starters. This was because hardly any new features were introduced. So the question came to mind how this was observed. Did the developer have piece with this?

13% Yes, vastly improved
40% Yes, but only a little
30% No, it’s all AI now
17% CUDA is sorta finished
30 votes

When it was asked as the question for AMD (“Did you see a lot of improvements?”), you see there is one yes-answer and 3 no-answers. The yes-answer I don’t really understand – I think it refers to the hardware and the tensor-cores being programmable. Most chose for “only a little progress”, which is a neutral answer. Then there is the complaint that Nvidia is shifting its focus to AI, then there is the “I’m ok with it”.

The question is how GPU-developers deal with it. Many are very smart people who like to learn new things, so what would be the next objective to learn for them? Another thing is that now OpenMP and several other languages have improved GPU-support, which will solve the needs for a large group – especially in 3 to 4 years when these languages have progressed even further.

Then there is AMD with HIP, their CUDA clone, who is helped a lot with a non-moving target.

For those who know AMD HIP, seen the improvements?

We’ve worked with AMD a lot to make high-performance libraries for their GPUs, and have seen the advancements made in their new compilers and drivers. But how was is observed by others?

Upcoming polls

We’ve got various new questions we would still like to ask you. We hope that you join in, so we can all get to know the industry a bit better poll-by-poll.

We don’t work for the war-industry

Reading Time: 2 minutes

Last week we emphasized that we don’t work for the war-industry. We did talk to a national army some years ago, but even though the project never started, we would have probably said no. Recently we got a new request, got uncomfortable and did not send a quote for the training.

This is because we like to think about the next 100 years, and investment in weapons is not something that would solve things for the long term.

To those, who liked the tweet or wanted to, thank you for your support to show us we’re not standing alone here. Continue reading “We don’t work for the war-industry”

OpenCL Basics: Running multiple kernels in OpenCL

Reading Time: 1 minute

This series “Basic concepts” is based on GPGPU-questions we get via email more than once, or when the question is not clearly explained in the books. For one it is obvious, for the other just what they’re missing.

They say that learning a new technique is best done by playing around with working code and then try to combine it. The idea is that when you have Stackoverflowed and Githubed code together, you’ve created so many bugs by design that you’ll learn a lot if you make it work. When applying this to OpenCL, you quickly get to a situation that you want to run one.cl file and then another.cl file. Almost all beginner’s material discuss a single OpenCL-file, so how to do this elegantly?

Continue reading “OpenCL Basics: Running multiple kernels in OpenCL”

Start your GPU-career here

Reading Time: 2 minutes

GPUs have been our mysterious friends and known enemies for years, as they let us run code in expected and unexpected ways. GPUs have solved problems for many of our customers. GPUs have such a high rate of evolvement, that they’ll remain important for the years to come.

Problem is that programming GPUs is not an easy task. Where do you learn to program GPUs? We found these to be the main groups:

  • Universities
  • Research centers
  • GPU vendors (AMD, Nvidia, Intel, Qualcomm, ARM)
  • Self-study

This is far from enough. Add to that, that only a very select group learns the craft at a company. We’d like to change that, and we think now is the time for us to be able to deliver on this.

In January we’ll our internal training program will start with 4 to 8 developers. Focus in on fully understanding recent GPU-architectures, CUDA and OpenCL. It will consist of lectures, workshops, discussions, paper reading and ofcourse coding for one month. The months after that will have guidance, paper presentations, code reviews and time for self-study. The exact form will differ per person.

The hard side

The current measurable requirements are:

  • EU citizen or already having a working permit
  • Great at C/C++
  • High interest in algorithmic optimisations
  • Any performance improvement focus (i.e. Assembly, clean code) is a plus
  • Any GPU experience (i.e. OpenGL, DirectX, self-study) is a plus
  • High interest in performance
  • Willing to move to Amsterdam by January (or earlier)
  • Willing to work for Stream HPC for at least 2 years

The soft side

We’re looking for people that fit our culture and we think we can train. This means that the selection is based for a large part on “the spark”. Therefore the application starts with a speed date, and we’re sorry for not finding a better wording for this. This is a 20 minute discussion about what we like and what we don’t. This can be done via phone, Skype or in person, during the evening, in the weekends or during your lunch break.

The fine print

To avoid people changing their mind after the training and returning to their old job, the training is paid off only after 2 years.

How to apply

Read about our company culture. Look at the jobs we have open. These describe the requirements after the training. Then write us a motivational letter: explain us why this is exactly what you want, why you’re capable and why you’re a cultural fit. If you find it hard to write such letter, then just start with answering the list of requirements. It’s a big bonus to share code (Github, Gitlab, zip-file). Send your email to jobs@streamhpc.com

Other jobs

Feeling more senior? We have other jobs:

    What does it mean to work at Stream HPC?

    Reading Time: 6 minutes

    High performance computing on many-core environments and low-level optimisations are very important concepts in large scientific projects nowadays. Stream HPC is one of the market’s more prominent clubs and is substantially expanding. As we often get asked how it is to work at the company, we’d like to give you a little peak into our kitchen.

    Stream HPC’s DNA

    To understand our DNA, you’d need to know how the company got started. In 2010 the company was born from the deep boredom that was born from within the corporate IT workspace. Stream’s founder Vincent Hindriksen had to maintain a piece of software that was often failing to process the daily reports. After documenting the internals and algorithms of the code by interviewing the key people and some reverse engineering, it was a lot easier to create effective solutions for the bugs within the software. After fixing a handful of bugs, there was simply a lot less to do except reading books and playing online games.

    To avoid becoming a master in Sudoku, he spent the following three weeks in rewriting all the code, using the freshly produced documentation. 2.5 hours needed to process the data was reduced to 19 seconds – yes, the kick for performance optimisation was already there. For some reason it took well over 6 months to port the proof-of-concept, which was simply unbearable as somebody had to make sure the old code was maintained for 40 hours a week.

    The reason to start the company was simple: to make intelligent use of time and provide software that is engineered for performance and maintainability. Lots of exciting projects for fantastic clients followed in the next 8 years that allowed us to broaden our expertise and build up confidence. GPUs were there at just the right time – without GPUs it would have probably been performance engineering on CPUs.

    Continue reading “What does it mean to work at Stream HPC?”

    Meet Vincent in Bay Area between 11 and 16 August

    Reading Time: 1 minute

    Our managing director, Vincent Hindriksen, is in San Francisco’s Bay Area from Saturday 11th up to Thursday 16th of August 2018. He’ll be visiting existing customers, but there is time left.

    Current schedule (excluding several unconfirmed meetings):

    • Saturday: social meetups
    • Monday: full
    • Tuesday: all day good availability,
    • Wednesday: all day good availability
    • Thursday: morning good availability

    Do you want to learn more about GPUs and how we can help you get there? Get in touch via our contact-page, and tell us address and time when you want to meet.

    If you seek a job in GPUs, also get in contact! Stream HPC is growing quickly now, and a good moment to onboard and still make a difference. For job-talks also the evenings are available.

    Help us find our future COO

    Reading Time: 2 minutes

    Is this a motto that goes with your personality? Then we want to talk with you.
    After 8 years the time has come that we have continuous growth for almost 2 years, instead of dealing with the usual peaks and lows of consultancy. I’d like to get your help to find our future COO to help streamline this growth.
    You might have seen that there are hardly any new blog posts – now you know why. By helping us find that special person, there can be put more time of writing new blog posts again.
    If you know the perfect person for this job in Amsterdam, please let them know there is this unique company looking for her or him. Sharing this blog-post would help a lot.
    You can find more information in this job-post:

    We all know that quality comes with attention to detail, but also that with growth the details are the first to be postponed. We seek help in handling daily operations during our growth. The most important tasks are:

    • Customer contact. You make sure the communication is regular and smooth with all our customers, making them more engaged and happy with us.
    • Sales follow up. You take over to discuss the needs of potential customers pre-sales has had contact with.
    • Team support. You help the development-teams to get even better by helping them to solve their daily and long-term problems.

    The job is very broad, but is all around a listening ear and getting things done.

    You have studied business administration or alike, and have a can-do attitude. You know how to work with technical people and are a real team-player. You understand how to develop and engage group dynamics.

    Do you think this is a job written for you, then we would like to hear more from you! Send an email to jobs@streamhpc.com with a motivational letter and listing relevant experience.

    Thanks for helping out!

    If you got sent here, we hope to hear from you!

    How to speed up Excel in 6 steps

    Reading Time: 3 minutes

    After the last post on Excel (“Accelerating an Excel Sheet with OpenCL“), there have been various request and discussions how we do “the miracle”. Short story: we only apply proper engineering tactics. Below I’ll explain how you can also speed up Excel and when you actually have to call us.

    Excel is a special piece of software from a developer’s perspective. An important rule of software engineering is to keep functionality (code) and data separate. Excel mixes these two as no other, which actually goes pretty well in many cases unless the data gets too big or the computations too heavy. In that case you’ve reached Excel’s limits and need to properly solve it.

    Below are the steps to go through, of which most you can do yourself! Continue reading “How to speed up Excel in 6 steps”

    Call for speakers: IEEE eScience Conference in Amsterdam

    Reading Time: 2 minutes

    We’re in the program committee of the 14th IEEE eScience Conference in Amsterdam, organized by the Netherlands eScience Center. It will be held from 29 October to 1 November 2018, and the deadlines for sending the abstracts is Monday 18 June.

    The conference brings together leading international researchers and research software engineers from all disciplines to present and discuss how digital technology impacts scientific practice. eScience promotes innovation in collaborative, computationally- or data-intensive research across all disciplines, throughout the research lifecycle.

    Continue reading “Call for speakers: IEEE eScience Conference in Amsterdam”

    Do you want to join StreamHPC?

    Reading Time: 1 minute

    As of this month Stream exists 8 years. 8 full years of helping our customers with fast software.In Chinese numerology 8 is a very lucky number, and we notice that.

    Over the years we’ve kept focus on quality and that was a good decision. The only problem is that we don’t have enough time to write on the blog, to organise events or even send the “monthly” newsletter. With over 200 drafts for the blog (subjects that really should be shared), we need extra people to help us out.

    Dear developers who are good with C,C++, OpenCL/CUDA and algorithms, please take a look at the following vacancies. I know you are frequenting our blog.

    We’re also seeking an all-rounder that supports in daily operations, that includes management, customer contact, team-support, etc.

    See below for more details.

      We’re looking forward to your application! We accept both remote and Amsterdam-based.

      Selecting Applications Suitable for Porting to the GPU

      Reading Time: 5 minutes

      Assessing software is never comparing apples to apples

      The goal of this writing is to explain which applications are suitable to be ported to OpenCL and run on GPU (or multiple GPUs). It is done by showing the main differences between GPU and CPU, and by listing features and characteristics of problems and algorithms, which can make use of highly parallel architecture of GPU and simply run faster on graphic cards. Additionally, there is a list of issues that can decrease potential speed-up.

      It does not try to be complete, but tries to focus on the most essential parts of assessing if code is a good candidate for porting to the GPU.

      GPU vs CPU

      The biggest difference between a GPU and a CPU is how they process tasks, due to different purposes. A CPU has a few (usually 4 or 8, but up to 32) ”fat” cores optimized for sequential serial processing like running an operating system, Microsoft Word, a web browser etc, while a GPU has a thousands of ”thin” cores designed to be very efficient when running hundreds of thousands of alike tasks simultaneously.

      A CPU is very good at multi-tasking, whereas a GPU is very good at repetitive tasks. GPUs offer much more raw computational power compared to CPUs, but they would completely fail to run an operating system. Compare this to 4 motor cycles (CPU) of 1 truck (GPU) delivering goods – when the goods have to be delivered to customers throughout the city the motor cycles win, when all goods have to be delivered to a few supermarkets the truck wins.

      Most problems need both processors to deliver the best value of system performance, price, and power. The GPU does the heavy lifting (truck brings goods to distribution centers) and the CPU does the flexible part of the job (motor cycles distributing doing deliveries).

      Assessing software for GPU-porting fitness

      Software that does not meet the performance requirement (time taken / time available), is always a potential candidate for being ported to a GPU. Continue reading “Selecting Applications Suitable for Porting to the GPU”

      DOI: Digital attachments for Scientific Papers

      Reading Time: 3 minutes

      Ever saw a claim on a paper you disagreed with or got triggered by, and then wanted to reproduce the experiment? Good luck finding the code and the data used in the experiments.

      When we want to redo experiments of papers, it starts with finding the code and data used. A good start is Github or the homepage of the scientist. Also Gitlab. Bitbucket, SourceForge or the personal homepage of one of the researchers could be a place to look. Emailing the authors is often only an option, if the university homepage mentions such option – we’re not surprised to get no reaction at all. If all that doesn’t work, then implementing the pseudo-code and creating own data might be the only option – not if that will support the claims.

      So what if scientific papers had an easy way to connect to digital objects like code and data?

      Here the DOI comes in.

      Continue reading “DOI: Digital attachments for Scientific Papers”

      Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks

      Reading Time: 3 minutes

      When CUDA kept having a dominance over OpenCL, AMD introduced HIP – a programming language that closely resembles CUDA. Now it doesn’t take months to port code to AMD hardware, but more and more CUDA-software converts to HIP without problems. The real large and complex code-bases only take a few weeks max, where we found that solved problems also made the CUDA-code run faster.

      The only problem is that CUDA-libraries need to have their HIP-equivalent to be able to port all CUDA-software.

      Here is where we come in. We helped AMD make a high-performance Pseudo Random Generator (PRNG) Library, called rocRAND. Random number generation is important in many fields, from finance (Monte Carlo simulations) to Cryptographics, and from procedural generation in games to providing white noise. For some applications it’s enough to have some data, but for large simulations the PRNG is the limiting factor. Continue reading “Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks”

      GPU and FPGA challenge for MSc and PhD students

      Reading Time: 3 minutes

      While going through my email, I found out about the third “HiPEAC Student Heterogeneous Programming Challenge”. Unfortunately the deadline was last week, but just got an email: if you register by this weekend (17 September), you can still join.

      EDIT: if you joined, be sure to comment in early November how it was. This would hopefully motivate others to join in next year. Continue reading “GPU and FPGA challenge for MSc and PhD students”

      The single-core, multi-core and many-core CPU

      Reading Time: 3 minutes

      Multi-core CPU from 2011

      CPUs are now split up in 3 types, depending on the number of cores: single (1), multi (2-8) and many (10+).

      I find it more important now to split up into these three types, as the types of problems to be solved by each is very different. Based on the problem-differences I’m even expecting that the number of cores between multi-core CPUs and many-core CPUs will grow.

      Below are the three types of CPUs discussed and a small discussion on many-core processors we see around. Continue reading “The single-core, multi-core and many-core CPU”

      HPC centre EPCC says: “Better software, better science”

      Reading Time: 2 minutes

      The University of Edinburgh houses the HPC centre EPCC. Neelofer Banglawala wrote about a programme which funds the development and improvement of scientific software, and also discussed about the results.

      Many of the 10 most used application codes on ARCHER have been the focus of an eCSE project. Software with more modest user bases have improved user uptake and widened their impact through eCSE-funded work. Furthermore, performance improvements can lead to tens of thousands of pounds of savings in compute time.

      Saving tens of thousands of pounds is certainly worth the investment. This also means more users can work on the same supercomputer, thus reducing waiting times. Continue reading “HPC centre EPCC says: “Better software, better science””

      Demo: cartoonizer on an Altera Arria 10 FPGA

      Reading Time: 2 minutes

      It takes quite some effort to program FPGAs using VHDL or Verilog. Since several years Intel/Altera has OpenCL-drivers, with the goal to reduce this effort. OpenCL-on-FPGAs reduced the required effort to a quarter of the time, while also making it easier to alter the specifications during the project. Exactly the latter was very beneficiary when creating the demo, as the to-be-solved problem was vaguely defined. The goal was to make a video look like a cartoon using image filters. We soon found out that “cartoonized” is a vague description, and it took several iterations to get the right balance between blur, color-reduction and edge-detection. Continue reading “Demo: cartoonizer on an Altera Arria 10 FPGA”

      CPU Code modernisation – our hidden expertise

      Reading Time: 2 minutes

      You’ve seen the speedups possible on GPUs. We secretly know that many of these techniques would also work on modern multi-core CPUs. If after the first optimisations the GPU still gets an 8x speedup, the GPU is the obvious choice. When it’s 2x, would the better choice be a bigger CPU or a bigger GPU? Currently the GPU is chosen more often.

      Now AMD, Intel and AMD have 28+ core CPUs, the answer to that question might now lean towards the CPU. With a CPU that has 32 cores and 256bit vector-computations via AVX2, each clock-cycle 32 double4 can be computed. A 16-core AVX1 CPU could work on 16 double2’s, which is only a fourth of that performance. Actual performance compared to peak-performance is comparable to GPUs here. Continue reading “CPU Code modernisation – our hidden expertise”

      New training dates for OpenCL on CPUs and GPUs!

      Reading Time: 1 minute

      OpenCL remains to be a popular programming language for accelerators, from embedded to HPC. Good examples are consumer software and embedded devices. With Vulkan potentially getting OpenCL-support in the future, the supported devices will only increase.

      For multicore-CPUs and GPUs we now have monthly training dates for the rest of the year:

      Minimum number of participants is two. By request the location and date can be changed.

      The first day of the training is the OpenCL Foundations training, which can be booked separately.

      For more information call us at +31854865760.

      IWOCL 2017 slides and proceedings now available

      Reading Time: 1 minute

      A month ago IWOCL (OpenCL workshop) and DHPCC++ (C++ for GPUs) took place. Meanwhile many slides and posters have been published online. As of today 23 talks are with slides.

      The proceedings are available via the ACM Digital Library. This needs a ACM Digital Library subscription of $198, if your company/university does not have access yet.

      IWOCL 2018 will be in Edinburgh (Scotland, UK), 15-17 May 2018 (provisional).

      Bug fixing the MESA 3D drivers

      Reading Time: 3 minutes

      Most of our projects are around performance optimisation, but we’re cleaning up bugs too. This is because you can only speed up software when certain types of bugs are cleared out. A few months ago, we got a different type of request. If we could solve bugs in MESA 3D that appear in games.

      Yes, we wanted to try that and got a list of bugs to solve. And as you can read, we were successful.

      Below you found a detailed description of one of the 5 bugs we solved by digging deep into the different games and the MESA 3D drivers. At the end of the blog post you’ll find the full list with links to issues in MESA’s bugtracker. Continue reading “Bug fixing the MESA 3D drivers”