European HPC Magazines

If one thing can be said about Europe, is that it is quite diverse. Each country  solves or fails to solve its own problems individually, while European goals are not always well-cared for. Nevertheless, you can notice things changing. One of the areas where things have changed, is that of HPC. HPH  has always been a well-interconnected research in Europe (with its centre in CERN), but there is a catch-up going on for the European commercial market. The whole of Europe has new goals set for better collaboration between companies and research institutes with programs like Horizon 2020. This means that it becomes necessary to improve interconnections among much larger groups.

In most magazines HPC is a section of a broader scope. This is also very important as this introduces HPC to more people. Now, I’d like to concentrate on the focus magazines. There are mainly two magazines available: Primeur Magazine and HPC Today.

Primeur Magazine

logo-weeklyDe Netherlands based magazine Primeur-magazine has been around for years, with HPC-news from Europe, video-channel, knowledge-base, calendar and more. Issues of past weeks can be read online (for free), but news can also be delivered via a weekly e-mail (paid service, prices range from €125 to €4000 per company/institute, depending on size).

They focus on being a news-channel for what is going on in in the HPC-world, both in the EU and the US. Don’t forget to follow them on Twitter.

HPC Today

Update: the magazine changed its name from HPC Magazine to HPC Today.

With several editions (Americas, Europe and France), websites and TV channels, the France based HPC Today brings an actionable coverage of the HPC and Big Data news, technologies, uses and research. Subscriptions are free, as the magazine is paid-for by advertisements. They balance their articles by targeting both people who deeply understand malloc() and people who want to know what is going on. Their readers are developers and researchers from both academic and private sectors.

With the change to HPC Today the content has slightly changed according the requests from the readers: less science, more HPC news. For the rest it’s about the same.

HPC-today
To get an idea of how they’re doing, check the partners of HPC Magazine: Teratec, ISC events and SC conference.

Other European HPC sources

Not all information around the web is nicely bundled in a PDF. Find a small list below to help you start.

InSiDE

The German National Supercomputing Centers HLRS, LRZ, NIC publish the online magazine InSiDE (Innovatives Supercomputing in Deutschland) twice a year. The articles are available in html and PDF. It gives a good overview of what is going on in Germany and Europe. There are no ways to subscribe via e-mail, so it would be better to put it in your calendar.

e-IRG

The e-Infrastructure initiative‘s main goal is to support the creation of a political, technological and administrative framework for an easy and cost-effective shared use of distributed electronic resources across Europe.

e-IRG is not a magazine, but it is a good start to find information about HPC in Europe. Their knowledge-base is very useful when trying to get an overview what is there in Europe: Projects, Country-statistics, Computer centers and more. They closely collaborate with Primeur-magazine, so can you see some overlap in the information.

PRACE Digest

PRACE (Partnership for Advanced Computing in Europe) is to enable high impact scientific discovery, as well as engineering research and development across all disciplines to enhance European competitiveness for the benefit of society. PRACE seeks to achieve this mission by offering world class computing and data management resources and services through a peer review process.

The PRACE digest appears as twice a year as a PDF.

More?

Did we miss an important news-source or magazine? Let us know in the comments below!

Will OpenCL work for me?

OpenCL_LogoOpenCL can accelerate your software multiple factors, but… only if the data and the software are fit.

The same applies to CUDA and other GPGPU-methods.

Get to know if you can speed up your software with OpenCL in 4 steps.
[columns]
[one_half title=”1. Lots of repetitions”]
The main focus to find code that can be run in parallel is finding loops that take relatively much time. If an action needs to be done for each part of the data-input, then the code certainly contains a lot of loops. You can go to the next step.

If data goes through the code from A to B in a straight line without many loops, then there is a very low chance that computing-speed is the bottle-neck. A faster network, better caching, faster memory and such should be first looked into.
[/one_half]
[one_half title=”2. No or few dependencies”]
If in loops there are no dependencies on the previous step, then you can go to the next step.

As low interdependencies do not matter for single-core software, this was not an important developer’s focus even five years ago. Know there are many new algorithms now, which decrease loop-internal dependencies. If your software has been optimised for several processors or even a cluster, then the step to OpenCL is much smaller.

For example search-problems can be sped up by dividing the data between many processes. Even though the dependency is high within the thread, the dependency on the other threads is very low.
[/one_half]

[/columns]

[columns]

[one_half title=”3. High predictability to avoid branching”]

Computations need to be as predictable as possible, to get the highest speed-up. That means the code within the loops needs to have no or few branches. That is code without statements like if, while or switch. This is because GPUs work better if the whole processor does the same. So if you now have many threads which all do different things, then a CPU is still the best solution. Like for decreasing dependencies from step two, in many cases redesigning the algorithm can result in performing GPU-code.

[/one_half]

[one_half title=”4. Low Data-transport overhead”]

In step 1 you looked for repeated computations. In this last step we look at the ratio between computations and data-size.

If the computations per data-chunk is high, then using the GPU is a good solution. A simple way to find out if a lot of computations are done is to look at CPU-usage in the system monitor. The reason is that data needs to be transferred to and from the GPU, which takes time even with 3 to 6 GB throughput per second.

When computations per data-chunk is low, doubling of speed is still possible when OpenCL is used on CPUs. See the technical explanation how OpenCL on modern CPUs work and can even  outperform a GPU.

[/one_half]
[/columns]


Does it fit?

Found out OpenCL is right for you? Contact us immediately and we can discuss how we can make your software faster. Not sure? Request a code-review or Rapid OpenCL Assessment to quickly find out if it works.

Do you think openCL is not the solution, but still processing data at the limits of your system? Feel free to contact us, as we can give you feedback for free on how to solve your problem with other techniques.

More to read on our blog

OpenCL is supported on many CPUs and GPUs. See this blog article to have an extensive overview of hardware that supports OpenCL.

A list of application areas where OpenCL can be used is written down here.

Finally there is aso a series on parallel programming theories, which explain certain theories behind OpenCL.

Using async_work_group_copy() on 2D data

async_all_the_things1When copying data from global to local memory, you often see code like below (1D data):
[raw]

if (get_group_id(0)==0) {
  for (int i=0; i < N; i++) {
      data_local[i] = data_global[offset+i]
  }
}
mem_fence(CLK_LOCAL_MEM_FENCE);

[/raw]
This can be replaced this with an asynchronous copy with the function async_work_group_copy, which results in more manageable and cleaner code. The function behaves like an asynchronous version of memcpy() you know from C++.

event_t async_work_group_copy ( __local gentype *dst,
const __global gentype *src,
size_t  data_size,
event_t event
event_t async_work_group_copy ( __global gentype *dst,
const __local gentype *src,
size_t data_size,
event_t event

The Khronos registry async_work_group_copy provides asynchronous copies between global and local memory and a prefetch from global memory. This way it’s much easier to hide the latency of the data-transfer. In de example below, you effectively get free time to do the do_other_stuff() – this results in faster code.

As I could not find a good code-snippets online, I decided to clean-up and share some of my code. Below is a kernel that uses a patch of size (offset*2+1) and works on 2D data, flattened to a float-array. You can use it for standard convolution-like kernels.

The code is executed on workgroup-level, so there is no need to write code that makes sure it’s only executed by one work-item.

[raw]

kernel void using_local(const global float* dataIn, local float* dataInLocal) {
    event_t event;
    const int dataInLocalWidth = (offset*2 + get_local_size(0));
        
    for (int i=0; i < (offset*2 + get_local_size(1)); i++) {
        event = async_work_group_copy(
             &dataInLocal[i*dataInLocalWidth],
             &dataIn[(get_group_id(1)*get_local_size(1) - offset + i) * get_global_size(0) 
                 + (get_group_id(0)*get_local_size(0)) - offset],
             dataInLocalWidth,
             event);
   }
   do_other_stuff(); // code that you can execute for free
   wait_group_events(1, &event); // waits until the copy has finished.
   use_data(dataInLocal);
}

[/raw]

On the host (C++), the most important part:
[raw]

cl::Buffer cl_dataIn(*context, CL_MEM_READ_ONLY|CL_MEM_HOST_WRITE_ONLY, sizeof(float) 
          * gsize_x * gsize_y);
cl::LocalSpaceArg cl_dataInLocal = cl::Local(sizeof(float) * (lsize_x+2*offset) 
          * (lsize_y+2*offset));
queue.enqueueWriteBuffer(cl_dataIn, CL_TRUE, 0, sizeof(float) * size_x * size_y, dataIn);
cl::make_kernel kernel_using_local(cl::Kernel(*program,"using_local", &error));
cl::EnqueueArgs eargs(queue,cl::NullRange ,cl::NDRange(gsize_x, gsize_y), 
          cl::NDRange(lsize_x, lsize_y));
kernel_using_local(eargs, cl_dataIn, cl_dataInLocal);

[/raw]
This should work. Some have the preference to do local initialisation in the kernel, but I prefer not to do this JIT.

This code might not work optimal if you have special tricks for handling the outer border. If you see any improvement, please share via the comments.

PRACE Spring School 2014

prace-spring-school-2014On 15 – 17 April 2014 a 3-day workshop around HPC is organised. It is free, and focuses on bringing industry and academy together.

Research Institute for Symbolic Computation (RISC) / Johannes Kepler University Linz Kirchenplatz 5b (Castle of Hagenberg) 4232 Hagenberg Austria

The PRACE Spring School 2014 will take place on 15 – 17 April 2014 at the Castle of Hagenberg in Austria. The PRACE Seasonal School event is hosted and organised jointly by the Research Institute for Symbolic Computation / Johannes Kepler University Linz (Austria), IT4Innovations / VSB-Technical University of Ostrava (Czech Republic) and PRACE.

The 3-day program includes:

  • A 1-day HPC usage for Industry track bringing together researchers and attendees from industry and academia to discuss the variety of applications of HPC in Europe.
  • Two 2-day tracks on software engineering practices for parallel & emerging computing architectures and deep insight into solving multiphysical problems with Elmer on large-scale HPC resources with lecturers from industry and PRACE members.

The PRACE Spring School 2014 programme offers a unique opportunity to bring users, developers and industry together to learn more about efficient software development for HPC research infrastructures. The program is free of charge (not including travel and accommodations).

Applications are open to researchers, academics and industrial researchers residing in PRACE member countries, and European Union Member States and Associated Countries. All lectures and training sessions will be in English.

Applications are open to researchers, academics and industrial researchers residing in PRACE member countries, and European Union Member States and Associated Countries. All lectures and training sessions will be in English. Please visit http://prace-ri.eu/PRACE-Spring-School-2014/ for more details and registration.

At StreamHPC we support such initiatives.

Our Team

StreamHPC is one of the best known companies in GPU-computing (OpenCL/CUDA/HIP/SYCL), but we are also very active in embedded development, algorithm-design and technologies like graphics (OpenGL/VULKAN), Machine Learning, and HPC (OpenMP/MPI).

We are distributed between mainly Amsterdam, Budapest and Barcelona.

The developers, the heart of the company

The company consists of highly skilled developers and low-level performance engineers. We mostly manage ourselves, but always with help of the group. This way we have influence by showing ownership.

Each employee regularly shares their experience and checks the work of colleagues, to keep the standards high. This results in faster deliveries with higher quality of code, for which we’ve been complimented often.

Want to work at StreamHPC too? Check our jobs-page.

The Leads

The senior team deals with new directions/markets/strategies, training the employees and making sure the project teams get enabled. We use Holacracy and EOS to lead our company.

  • HR: Berrak Bas
  • Consultancy + Projects. Adel Johar
  • Marketing + Sales: Vincent Hindriksen
  • General strategy: Vincent Hindriksen
  • IT: Robin Voetter and Balint Soproni
  • Operations: Maurizio Campese
  • Finance: shared
  • Legal: shared
  • Open standards: shared
  • Offices: shared
  • Products: shared

The shared roles do not need to be filled right now, as they are done together or done outside the company. When these become full-time roles, we will make them vacant and publish them on our jobs-page.

Hire the experts

On average we have our pipeline full for 3-6 months, but always reserve time for shorter projects (maximum a month).

Call +31 854865760 or mail to info@streamhpc.com or fill in the contact form to have a chat on how we can solve your software performance problems or do your software development.

OpenCL.org internship/externship

intern
Our internship is according the description: it’s a rather complex homepage which should look good on your CV (if you manage to build it).

Want to help build an important website? OpenCL.org’s components have been designed and partly built, but still a lot of work needs to be done. We’re seeking an intern (or “extern” when not in Amsterdam) who can help us build the site. This internship is not about GPUs!

To complete the tasks, the following is required:

  • Technical expertise:
    • HTML5, CSS
    • PHP
    • Javascript
    • jQuery
    • Node.js
    • Mediawiki
    • XSLT
  • Can-do mentality
  • Able to plan own work
  • Good communication-skills
  • Available for 3 to 6 months

We don’t expect you know all tools, so we will guide you in learning new tools and techniques. Write us a “email of interest” to info@streamhpc.com, and write what you can and what your objectives for an internship would be.

We’re looking forward to see your letter!

We don’t work for the war-industry

Last week we emphasized that we don’t work for the war-industry. We did talk to a national army some years ago, but even though the project never started, we would have probably said no. Recently we got a new request, got uncomfortable and did not send a quote for the training.

https://twitter.com/StreamHPC/status/1055121211787763712

This is because we like to think about the next 100 years, and investment in weapons is not something that would solve things for the long term.

To those, who liked the tweet or wanted to, thank you for your support to show us we’re not standing alone here. Continue reading “We don’t work for the war-industry”

Rebranding the company name from StreamComputing to StreamHPC

Since 2010 the name StreamComputing has been used and is widely known now in the GPU-computing industry. But the name has three problems: we don’t have the .com domain, it does not directly show what we do, and the name is quite long.

Some background on the old domain name

While the initial focus was Europe, for years our projects are done for 95% for customers outside the Netherlands and over 50% outside Europe – with the .eu domain we don’t show our current international focus.

But that’s not all. The name sticks well in academics, as they’re more used to have longer names – just try to read a book on chemistry. Names I tested as alternatives were not well-received for various reasons. Just like “fast” is associated with fast food, computing is not directly associated with HPC. So fast computing gets simply weird. Since several customers referred to us as Stream, it made much sense to keep that part of the name.

Not a new begin, but a more focused continuation

Stream HPC defines more what we are: we build HPC software. Stream HPC combines the well-known name combined with our diverse specialization.

  • Programming GPUs with CUDA or OpenCL.
  • Scaling code to multiple CPU and GPUs
  • Creating AI-based software
  • Speeding up code and optimizing for given architectures
  • Code improvement
  • Compiler tests and compiler building (LLVM)

With the HPC-focus we were more able to improve ourselves. We have put a lot of time in professionalizing our development-environment and project-management by implementing suggestions from current customers and our friends. We were already used to work fully independent and be self-managed, but now we were able to standardize more of it.

The rebranding process

Rebranding will take some time, as our logo and name is in many places. For the legal part we will take some more time, as we don’t want to get into problems with i.e. all the NDAs. Email will keep working on both domains.

We will contact all organizations we’re a member of over the coming weeks. You can also contact us, if you read this.

StreamComputing will never really go away. The name was with us for 7 years and stands for growing with the upcoming of GPU-computing.

Ask your question

Do you have a question? We are happy to answer all your questions on any subject discussed at this website.

Due to spam floods, we removed the form.

info@streamhpc.com

We try to answer your question within 24 hours.

Do you have our GPU DNA?

This is the first question to warm up. Python-programmers are often users of GPU-libraries, not the builders of those libraries.

In January 2019 I gave a talk about culture in the company, which I wanted to share with you. It was intended to trigger discussions on what environment fits somebody, and examples were given on other companies. The nice part was that it became more clear that the culture of a company like CodePlay was very alike, except they are working on different things (compilers). Same for departments of larger companies we work with or know well.

Important: all answered are based on what my colleagues answered. So most of us are cat-people, but I wouldn’t say that defines a GPU-developer. I hope it still gives you an understanding of our perspective on what defines a GPU-dev in just a few minutes, while it also gives you more than enough matter to think about.

Continue reading “Do you have our GPU DNA?”

Optimized AI

At Stream HPC we optimize the performance of software such that data is processed in less time. For deep learning this is also important once the models have built. Optimizing a model algorithmically or find a new approach is fully in the domain of AI, but computationally optimizing the model’s throughput takes a specialism that can be found at Stream HPC.

We have built in-house tools and processes to find and solve compute bottlenecks in any type of software. One of these, benchmark.io, we are commercially selling. We also wrote foundational libraries for AMD GPUs that are used by software like TensorFlow and PyTorch, which means we are aware of how optimized each piece of library is – also for Nvidia’s version of the libraries.

Why performance is important

Your business-goals are implemented by your AI-models and software. If the models can be trained faster, if the inference can be done with less energy, if the training-costs go down, if more models can run at the same time – all these influence how well your business goals are attained.

The reason why we start offering this service is that progress in AI can be so opaque, such that “throwing more engineers at the problem” seems to become a solution where many AI-projects end. We think that control can be regained, by careful benchmarking and focusing on removing bottlenecks.

Would this work for you?

No AI is the same. We’d like to understand where your bottlenecks are. If compute optimizations are not the right direction for you, then we’ll advise you where to go next.

Contact us to initiate the conversation

LEAP-conference call for papers

921752_m
Building bridges in a new industry

Embedded processors always have had the focus on low-energy. Now a combination of Moore’s law, the frequency-wall and multi-processor developments have made it possible for these processors to compete in completely new market segments. Most notable due to impressive advancements in graphics IP.
We are now looking at four groups who are interested in learning from each other:

  • The embedded processor market
  • The FPGA market
  • The HPC and server market
  • The GPGPU market

And answer the question: how can we get more out of low-energy processors by looking at other industries?

The goal of the LEAP conference is to bring these three groups together. Creating the windows to each other and paving roads over the newly constructed bridges. This makes it one of its kind. Half of the conference is focused on quality information sharing and the other half on networking. For more information, check the website of the LEAP-conference. StreamHPC is a co-organiser.

Call for papers is now open! Programme is filled!

Continue reading “LEAP-conference call for papers”

Four conferences that will interest you

OpenCL Events
OpenCL Events

(if you get to Palo Alto, Manchester, Karlsruhe and Copenhagen)

We’re supporters of open standards and open discussions. When those thow come together, we melt. Therefore I’d like to share four hot conferences with you: IWOCL (Palo Alto, SF, USA), EMiT (Manchester, UK), ParallelCon (Karlsruhe, Germany), GPGPU-day 2015 (Copenhagen, Denmark).

On all these conferences I’ll be there too and are happy to meet you.

This post was shared first via the newsletter. Subscribe here.

Continue reading “Four conferences that will interest you”

Building the HPC ecosphere in Amsterdam

HPHere in Amsterdam a lot is going on around HPC. Including StreamHPC, we have companies like Vancis, Netherlands eScience Centre,  and ClusterVision, the research institute for Dutch HPC, Surf SARA, (hosting the Dutch supercomputer) and the very busy Amsterdam IX.

Here in Amsterdam we’re focused on building up more local companies around big compute and big data. I’d like to give two examples. One is Scyfer, an academic startup specialised in deep learning. They’ve developed algorithms to more efficiently train neural networks and help their customers find answers quicker. The second is Euvision Technologies, who developed unique computer vision solutions. Last year it has been sold to Qualcomm, for tens of millions.

We welcome new companies to Amsterdam, to further build up the HPC-ecosphere. If you have a company and are seeking a good location, contact us to talk about HPC in Amsterdam.There are many opportunities to develop in Europe, and we’re open for partnerships in new markets.

If you want to start your own HPC-related startup, Amsterdam thinks of you! There are three steps to do:

  1. Go to the Venture café on 30 April
  2. Apply for the bootcamp
  3. Become our neighbours
  4. Build your own HPC startup

Ping me, if you want advice on which preparations you need to make, before you can make such big decision. I like to have an open discussion, so please use the comment-area below for what you think of HPC in Amsterdam and building companies.

The 8 reasons why our customers had their code written or accelerated by us

Making software better and faster.

In the past six years we have helped out various customers solve their software performance problems. While each project has been very different, there have been 8 reasons to hire us as performance engineers. These can be categorised in three groups:

  • Reduce processing time
    • Meeting timing requirements
    • Increasing user efficiency
    • Increasing the responsiveness
    • Reducing latency
  • Do more in the same time
    • Increasing simulation/data sizes
    • Adding extra functionality
  • Reduce operational costs
    • Reducing the server count
    • Reducing power usage

Let’s go into each of these. Continue reading “The 8 reasons why our customers had their code written or accelerated by us”

Embedded

embeddedEmbedded is an industry often combined with Image Processing, Computer Vision or Machine Learning. The goal is to have performance computing on batteries.

At StreamHPC we often have helped speed-up algorithms, as faster software also means the same work with less power-usage.

See under the tab “low power” under the menu “technologies” what hardware architectures we master.

All OpenCL SDKs now in our Knowledge Base

For who hasn’t seen the latest addition to our knowledge base, we have added a list of all (almost) available OpenCL-SDKs. You can find it in the menu under “Knowledge Base” -> “SDKs…“.

This list shows how important OpenCL is getting, as developers now can write compute-intensive parallel software on CPUs, GPUs, ARM-based accelerators and even FPGAs. This growth of OpenCL-devices is very exciting and important news, and that’s why it has got its own section on the site.

The the current list is (in random order):

Currently looking into:

  • Intel Xeon Phi
  • Nintendo Wii U dev
  • Sony Playstation 4 Orbis
  • Vivante
  • Xilinx
  • NVidia GPUs
  • Qualcomm

The SDK of NVIDIA is on the second list, what you maybe did not unexpected. We have to wait until they have put their official statement on what they are going to do with CUDA and OpenCL.

While you are there, also check the other parts of the Knowledge Base:

  • What is… -> Explanations of terminology. Put your requests in a comment.
  • Event&Talks -> A list of events which StreamHPC attends, give talks at and helps organise. Interesting for both managers and engineers.
  • Self Study – The part of the site most visited after the blog. This is for the engineers who want to start learning programming GPUs.

This section will be updated and extended continuously with information not available anywhere else.

StreamHPC has been in the OpenCL business since 2010 as one of the few. We have been the most visible and known OpenCL-specialist ever since.

OpenCL in the Clouds

Buzz-words are cool; they are loosely defined and are actually formed by the many implementation that use the label. Like Web 2.0 which is cool javascript for the one and interaction for the other. Now we have cloud-computing, which is cluster-computing with “something extra”. More than a year ago clouds were in the data-centre, but now we even have “private clouds”. So how to incorporate GPGPU? A cluster with native nodes to run our OpenCL-code with pre-distributed data is pretty hard to maintain, so what are the other solutions?

Distributed computing

Folding@home now has support for OpenCL to add the power of non-NVIDIA GPUs. While in clusters the server commands the clients what they have to do, here the clients ask the server for jobs. Disadvantage is that the clients are written for a specific job and are not really flexible to take different kind of jobs. There are several solutions for this code-distribution-problem, but still the solution is not suitable for smaller problems and small clusters.

Clusters: MPI

The project SHOC (Scalable HeterOgeneous Computing) is a collection of benchmark programs testing the performance and stability of systems using computing devices with non-traditional architectures for general purpose computing, and the software used to program them. While it is only a benchmark, it can be of great use when designing a cluster. For the rest I only  found CUDA MPI-solutions, which are not ported to OpenCL yet.

Also check out Hoopoe, which is a cloud-computing service to run your OpenCL-kernels in their cloud. It seems to be more limited to .NET and have better support for CUDA, but it is a start. In Europe there is a start-up offering a rent-model for OpenCL-computation-time; please contact us if you want to get in contact with them.

Clusters: OpenMP

MOSIX has added a “Many GPU Package” to their cluster management system, so it now allows applications to transparently use cluster-wide OpenCL devices. When “choosing devices” not only the local GPU pops up, but also all GPUs in the cluster.
It works disk-less, in the way no files are copied to the computation-clients and all stays in-memory. Disk-less computations have an advantage when cloud-computer are not fully trusted. Take note that on most cloud-computers the devices need to be virtualised (see next part).

Below is its layered model, VCL being the “Virtual OpenCL Layer”.

They have chosen to base it on OpenMP; while the kernels don’t need to be altered, some OpenMP-code needs to be added. They are very happy to tell it takes much less code to use openMP instead of MPI.

You see a speed-up between 2.19 and 3.29 on 4 nodes is possible. We see comparable cluster-speed-ups in an old cluster-study. The actual speed-up on clusters depends mostly on the amount of data that needs to be transferred.

The project references to a project called remote CUDA, which only works with NVIDIA-GPUs.

Device Virtualisation

Currently there is no good device virtualisation for OpenCL. The gVirtuS-project currently only supports CUDA, but they claim it is easily rewritten to OpenCL. Code needs to be downloaded with a Mercurius-client (comparable to GIT and in repositories of most Linux-distributions):
> hg clone http://osl.uniparthenope.it/hg/projects/gvirtus/gvirtus gvirtus
Or download it here (dated 7-Oct-2010).

Let me know when you ported it to OpenCL! Actually gVirtuS does not do the whole trick since you need to divide the host-devices between the different guest-OSes, but luckily there is an extension which provides sharing of devices, called fission. More about this later.

We can all agree there still needs to be done a lot in this area of virtualised devices to get OpenCL in the cloud. If you can’t wait, you can theoretically use MOSIX locally.

Afterword

A cloud is the best buzz-word to market a scalable solution to overcome limitations of internet connected personal devices. I personally think the biggest growth will be in personal clouds, so companies will have their own in-house cloud-server (read: clusters); people just want to have a feeling of control, comparable with preference of a daily traffic jam above public transport. But nevertheless shared clouds have potential if it comes to computation-intensive jobs which do not need to be done all year round.

The projects presented here are a start to have OpenCL-power at a larger scale for more demanding cases. Since we can have more power at our fingertips with one desktop-pc stuffed with high-end video-cards than a 4-year-old supercomputer-cluster, there is still time

Please send your comment if I missed a project or method.

AMD Hawaii power-management fix on Linux

od6configThe new Hawaii-based GPUs from AMD (Radeon R9 2xx, FirePro W9100 and Firepro S9150) have a lot of improvements, one being a new OverDrive 6 (AMD’s version of NVIDIA GPU Boost). Problem is that it’s not supported yet in the Linux drivers and you will get too low performance – it probably will be solved in the next version. Luckily there is od6config, made by Jeremi M Gosney.

Do the below steps to get the GPU at normal speed.

  1. Download the zip or tar.gz from http://epixoip.github.io/od6config/ and unpack.
  2. Go to the directory where you unpacked the archive.
  3. run:
    make
  4. run:
    sudo make install
  5. check if it’s needed to fix the power management:
    od6config --get clocks,temp,fan
  6. if the values are too low, run:
    od6config --autofix --set power=10
  7. check if it worked:
    od6config --get clocks,temp,fan

Only OverDrive6 devices are set, devices using OverDrive5 will be ignored.

The PowerTune of 10 was what we found convenient for us, but you might find better values for your case. There are several more options, which are on the homepage of 0d6config. You need to run “od6config –autofix –set power=10” on each reboot.

Remember it’s third party software, so no guarantees to you and no “you killed my GPU” to us.

Intel’s OpenCL SDK examples for GCC

Update august 2012: There is a new post for the latest Linux examples.

Note: these patches won’t work anymore! You can learn from the patches how to fix the latest SDK-code for GCC and Linux/OSX.

Code-examples are not bundled with the Linux OpenCL SDK 1.1 beta. Their focus is primarily Windows, so VisualStudio seems to be a logical target. I just prefer GCC/LLVM which you can get to work with all OSes. After some time trying to find the alternatives for MS-specific calls, I think I managed. Since ShallowWater uses DirectX and is quite extensive, I did not create a patch for that one – sorry for that.

I had a lot of troubles getting the BMP-export to work, because serialisation of the struct added an extra short. Feedback (such as a correct BMP-export of a file) is very welcome, since I the colours are correct. For the rest: most warnings are removed and it just works – tested with g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 on 64 bit (llvm-g++-4.2 seems to work too, but not fully tested).

THE PATCHES ARE PROVIDED AS IS – NO WARRANTIES!

Continue reading “Intel’s OpenCL SDK examples for GCC”