What is Khronos as of today?

The Khronos Group is the organization behind APIs like OpenGL, Vulkan and OpenCL. Over one hundred companies are a member and decide together what your next year phone, camera, computer or media device will be capable of.

We’re at the right, near the bottom.

We work most with OpenCL, but you probably noticed we work with OpenGL, Vulkan and SPIR too. Currently they have the following APIs:

  • COLLADA, a file-format intended to facilitate interchange of 3D assets
  • EGL, an interface between Khronos rendering APIs such as OpenGL ES or OpenVG and the underlying native platform window system
  • glTF, a file format specification for 3D scenes and models
  • OpenCL, a cross-platform computation API.
  • OpenGL, a cross-platform computer graphics API
  • OpenGL ES, a derivative of OpenGL for use on mobile and embedded systems, such as cell phones, portable gaming devices, and more
  • OpenGL SC, a safety critical profile of OpenGL ES designed to meet the needs of the safety-critical market
  • OpenKCam, Advanced Camera Control API
  • OpenKODE, an API for providing abstracted, portable access to operating system resources such as file systems, networks and math libraries
  • OpenMAX, a layered set of three programming interfaces of various abstraction levels, providing access to multimedia functionality
  • OpenML, an API for capturing, transporting, processing, displaying, and synchronizing digital media
  • OpenSL ES, an audio API tuned for embedded systems, standardizing access to features such as 3D positional audio and MIDI playback
  • OpenVG, an API for accelerating processing of 2D vector graphics
  • OpenVX, Hardware acceleration API for Computer Vision applications and libraries
  • OpenWF, APIs for 2D graphics composition and display control
  • OpenXR, an open and royalty-free standard for virtual reality and augmented reality applications and devices
  • SPIR, a intermediate compiler target for OpenCL and Vulkan
  • StreamInput, an API for consistently handling input devices
  • Vulkan, a low-overhead computer graphics API
  • WebCL, a JavaScript binding to OpenCL within a browser
  • WebGL, a JavaScript binding to OpenGL ES within a browser on any platform supporting the OpenGL or OpenGL ES graphics standards

Too few people understand that the organization is very unique, as the biggest processor vendors are discussing collaborations and how to move the market, while they’re normally the fiercest competitors. Without Khronos it would have been a totally different world.

Improving FinanceBench for GPUs Part II – low hanging fruit

We found a finance benchmark for GPUs and wanted to show we could speed its algorithms up. Like a lot!

Following the initial work done in porting the CUDA code to HIP (follow article link here), significant progress was made in tackling the low hanging fruits in the kernels and tackling any potential structural problems outside of the kernel.

Additionally, since the last article, we’ve been in touch with the authors of the original repository. They’ve even invited us to update their repository too. For now it will be on our repository only. We also learnt that the group’s lead, professor John Cavazos, passed away 2 years ago. We hope he would have liked that his work has been revived.

Link to the paper is here: https://dl.acm.org/doi/10.1145/2458523.2458536

Scott Grauer-Gray, William Killian, Robert Searles, and John Cavazos. 2013. Accelerating financial applications on the GPU. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units (GPGPU-6). Association for Computing Machinery, New York, NY, USA, 127–136. DOI:https://doi.org/10.1145/2458523.2458536

Improving the basics

We could have chosen to rewrite the algorithms from scratch, but first we need to understand the algorithms better. Also, with the existing GPU-code we can quickly assess what are the problems of the algorithm, and see if we can get to high performance without too much effort. In this blog we show these steps.

Continue reading “Improving FinanceBench for GPUs Part II – low hanging fruit”

Hello

Welcome to the webpage of Stream HPC. We’re a company in Europe that work on solving the most difficult HPC problems with emphasis on scaling to GPUs and clusters. We have built up experience in speeding up software, designing performance oriented architectures, writing maintainable low-level code, selecting the best hardware for the job, and building benchmarks. Above all, we’re a customer oriented company, as we want our clients to feel in control, while we do that heavy lifting.

The company is multi-cultural and designed to be a safe space for everybody of our team – from LBGT+ to Asperger’s, we focus on making our differences our strengths. As you can read in the job self-assessment, we have 4 main strengths:

  • CPU development: algorithms, low-level code, architectures for CPU-based software. This includes clusters.
  • GPU development: algorithms, low-level code, architectures for GPU-based software. This includes graphics programming
  • Problem-solving: get from full understanding to full exploration quickly.
  • Self-managed teams: we don’t hire managers, but provide frameworks.

Our customers are all around the world, but especially North-America, West-Europe and East-Asia. We have built many high performance software that run from edge-computers to super-computers. See “What we do” for examples.

Our offices are in:

  • Amsterdam
  • Budapest
  • Barcelona

If you want to know more, feel free to get in contact.

See this page for Netherlands/Belgium, Hungary or Spain.

MPI in terms of OpenCL

OpenCL is a member of a family of Host-Kernel programming language extensions. Others are CUDA, IMPC and DirectCompute/AMP. It lets itself define by a separate function or set of functions referenced to as kernel, which are prepared and launched by the host to run in parallel. Added to that are deeply integrated language-extensions for vectors, which gives an extra dimension to parallelism.

Except from the vectors, there is much overlap between Host-Kernel-languages and parallel standards like MPI and OpenMP. As MPI and OpenMPI have focused on how to get software parallel for years now, this could give you an image of how OpenCL (and the rest of the family) will evolve. And it answers how its main concept message-passing could be done with OpenCL, and more-over how OpenCL could be integrated into MPI/OpenMP.

At the right you see bees doing different things, which is easy to parallellise with MPI, but currently doesn’t have the focus of OpenCL (when targeting GPUs). But actually it is very easy to do this with OpenCL too, if the hardware supports it such like CPUs.

Continue reading “MPI in terms of OpenCL”

Help write the book “Numerical Computations with GPUs”

9783319065472There is an interesting book coming up: “Numerical Computations with GPUs” – a book explaining various numerical algorithms with code in CUDA or OpenCL.

edit: At the moment there are 21 articles to be included in the book.

edit 2: book should be out in July

edit 3: Order via Springer International or Amazon US.
TOC:

  • Accelerating Numerical Dense Linear Algebra Calculations with GPUs.
  • A Guide to Implement Tridiagonal Solvers on GPUs.
  • Batch Matrix Exponentiation.
  • Efficient Batch LU and QR Decomposition on GPU.
  • A Flexible CUDA LU-Based Solver for Small, Batched Linear Systems.
  • Sparse Matrix-Vector Product.
  • Solving Ordinary Differential Equations on GPUs.
  • GPU-based integration of large numbers of independent ODE systems.
  • Finite and spectral element methods on unstructured grids for flow and wave propagation problems.
  • A GPU implementation for solving the Convection Diffusion equation using the Local Modified SOR method.
  • Pseudorandom numbers generation for Monte Carlo simulations on GPUs: Open CL approach.
  • Monte Carlo Automatic Integration with Dynamic Parallelism in CUDA.
  • GPU-Accelerated computation routines for quantum trajectories method.
  • Monte Carlo Simulation of Dynamic Systems on GPUs.
  • Fast Fourier Transform (FFT) on GPUs.
  • A Highly Efficient FFT Using Shared-Memory Multiplexing.
  • Increasing parallelism and reducing thread contentions in mapping localized N-body simulations to GPUs.

 

Continue reading “Help write the book “Numerical Computations with GPUs””

Guest-blog: Accelerating sequential machine vision algorithms with OpenMP and OpenCL

Jaap van de LoosdrechtGuest-blogger Jaap van de Loosdrecht wants to share his thesis with you. He leads the Centre of Expertise in Computer Vision department at NHL University of applied sciences and is the owner of his own company, and still managed to study and write a MSc-thesis. The thesis is interesting because it extensively compares OpenCL with OpenMP, especially chapters 7 an 8.

For those who are interested, my thesis “Acceleration sequential machine vision algorithms using commodity parallel hardware” is available at www.vdlmv.nl/thesis.

Keywords: Computer Vision, Image processing, Parallel programming, Multi-core CPU, GPU, C++, OpenMP, OpenCL.

Many other related research projects have considered using one domain specific algorithm to compare the best sequential implementation with the best parallel implementation on a specific hardware platform. This work was distinctive because it investigated how to speed up a whole library by parallelizing the algorithms in an economical way and execute them on multiple platforms.This work has:

  • Examined, compared and evaluated 22 programming languages and environments for parallel computing on multi-core CPUs and GPUs.
  • Chosen to use OpenMP as the standard for multi-core CPU programming and OpenCL for GPU programming.
  • Re-implemented a number of standard and well-known algorithms in Computer Vision using both standards.
  • Tested the performance of the implemented parallel algorithms and compared the performance to the sequential implementations of the commercially available software package VisionLab.
  • Evaluated the test results with a view to assessing:
    • Appropriateness of multi-core CPU and GPU architectures in Computer Vision.
    • Benefits and costs of parallel approaches to implementation of Computer Vision algorithms.

Using OpenMP it was demonstrated that many algorithms of a library could be parallelized in an economical way and that adequate speedups were achieved on two multi-core CPU platforms. With a considerable amount of extra effort, OpenCL was used to achieve much higher speedups for specific algorithms on dedicated GPUs.

At the end of the project, the choice of standards was re-evaluated including newly emerged ones. Recommendations are given for using standards in the future, and for future research and development.

Algorithmic improvements are suggested for Convolution and Connect Component Labelling.

Your feedback and/or questions are welcome.

If you put comments here, I’ll make sure Jaap van de Loosdrecht will get to know and answer your questions on the subjects discussed in his thesis.

The knowns and unknowns of the PEZY-SC accelerator at RIKEN

PEZY-SC_QuadPCB-1_smallThe green500 is out and one unknown processor takes the number one position with a huge improvement over last year. It is a new super-computer installed at RIKEN with an incredible 7 GFLOPS/Watt. It is powered by the processor-boards at the right: two Xeons, 4 PEZY-SC 1.4 accelerators and 128GB DRAM, which have a combined performance of about 6.2 TFLOPS. It has been designed for immersive cooling.

The second and third positions are also powered by the PEZY-SC, before we find the winner of last year: the AMD FirePro S9150 and a bit after that the rest (mostly NVidia Tesla). One constant is the CPUs used: Intel XEON is taking most. To my big surprise no ARM64.

green500_2015june_top5

From the third to the first PEZY-SC installation there is an improvement of 13%. It seems the first two are the new type, called “bricks”, while the third is the same as last year. Comparing with that super from last year (4.4945 GFLOPS/W) there is an improvement of 42% and 25%. The 13% improvement from the previous version is interesting enough, but the 25% improvement on exactly the same system raised questions. Probably it is due to compiler-optimisations. As the November-version of the Green500 is much more strict, it will be clear if the rules were bent – let’s hope it’s for real!

It supports OpenCL!

When new accelerators support OpenCL, it gets accepted more easily. So it is very interesting the PEZY-SC runs on OpenCL. I asked at ISC and got explained it was a subset of OpenCL, but could not get the finger on which subset, nor could I get access to test it. It does mean that code that would run well on this machine is easy to port. And then I mean the same “easy” Intel uses for explaining the easyness of porting OpenMP software to XeonPhi: PEZI-specific optimisations and writing around the missing functionality would still take effort – the typical stuff we do at StreamHPC.

RIKEN Shoubu

Some information on “Shoubu” (“Iris” in Japanese), the top 1 on the Green 500. According to the Green500 it is 353.8 TFLOPS (based on 50kW, using an actual benchmark). On 25 June RIKEN announced the Shoubu is 2 PFLOPS (theoretical). If the full machine is used for the Green500, then the efficiency was only 18%!

Below are some images of the installation.

shoubu2  shoubu3  shoubu1

Source: http://www.exascaler.co.jp/wp-content/uploads/2015/06/20150625.pdf

An important part is Exascaler’s immersion technology, what I understood is a spin-off of PEZY. I’m very curious what the AMD FirePro S9150 does when it uses immersion-cooling – I think we have to do some frying at the office to find out.

PEZY-SC1.4 and PEZY-SC2

PEZY started with a multi-core processor of 512 cores, the PEZY-1. The PEZY-SC has 1024 cores and has had a few gradual upgrades – currently PEZY-SC 1.4 (“the brick”) is installed.

PEZY-SC Specification:

Logic Cores(PE) 1,024
Core Frequency 733MHz
Peak Performance Floating Point. Single 3.0TFlops / Double 1.5TFlops
Host Interface PCI Express GEN3.0 x8Lane x 4Port (x16 bifurcation available)
JESD204B Protocol support
DRAM Interface DDR4, DDR3 combo 64bit x 8Port Max B/W 1533.6GB/s
+Ultra WIDE IO SDRAM (2,048bit) x 2Port Max B/W 102.4GB/s
Control CPU ARM926 dual core
Process Node 28nm
Package FCBGA 47.5mm x 47.5mm, Ball Pitch 1mm, 2,112pin

Source: http://pezy.co.jp/en/products/pezy-sc.html

Development on PEZY-SC2 is ongoing, which will have a staggering 4096 cores. Ofcourse efficiency has to go up (if the 18% is correct), to make this a good upgrade.

There is no promise on when the PEZY-SC2 will be announced, but it will certainly surprise us again hen it arrives.

About Us

Stream HPC  is a software development company in parallel software for many-core processors. We provide professional software development services, training and consulting to help you increase compute performance in software while lowering hardware-costs.

We have 3 locations.

Stream HPC B.V. (Amsterdam)

Koningin Wilhelminaplein 1 – 40601
1062 HG Amsterdam
Netherlands, Europe

phone: +31 854865760 (office) or +31 6 45400456 (cell)

Visit us in Amsterdam

Stream HPC Hungary Kft. (Budapest)

Science Park
1117 Budapest
Irinyi József u. 4-20.
Hungary, Europe

Stream HPC Spain S.L. (Barcelona)

Plaza de Catalunya 1, 4th floor
Barcelona 08002
Spain, Europe

History

2010 – 2013: the freelancing years

The company started as a freelancing business, with one focus: Programming GPUs with OpenCL. It was though, as back then the G in GPU stood for “Graphics only”.

The name was “StreamComputing” = A high-performance computer system that analyzes multiple data streams from many sources live. The main goal was to create software algorithms that analyze the data in real time as it streams in to increase speed and accuracy when dealing with data handling and analysis, which was in line with that.

2014: first hope

Four years later the first employee, Anca, was hired. Later that year the freelancing business was was turned into a limited company. GPUs got more seen as data-processors and trainings were the main income. Projects were still small, GPGPU was a world of early adopters and most time was invested on trainings.

First contact was made with AMD, now one of our biggest clients.

2015-2017: initial growth

Stream grew to a handful of employees, and we did projects for HSA foundation, Stanford, AMD, Zeiss, Nokia, Philips and many lesser known companies.

Trainings were still done, but were by far not the main resource of income anymore. We tried some FPGA-work, but found that most promises were not implemented yet.

2017: a new name

We renamed the company to Stream HPC. There were several reasons. As we focused more on customers from Asia and North America, we needed the .com, which was unavailable. Getting the new name was quite a quest, but we got there: by customers we were often referred to as “Stream”, a business coach assured us that CPU-work would remain important and thus “HPC” was more important that “GPU”, and it was quite difficult to type streamcomputign correctly.

2017-2020: hitting all kinds of ceilings

The goal was to grow further, but this turned out to be more difficult than expected. All kinds of obstacles got in our way, and we even once shrunk in size. With trainings, coaching, reading and persistence, we got to understand the hurdles and finally could implement solutions. Looking back it was easy.

2021: Stream HPC Hungary

Hungary started as a group of freelancers. We were very happy with the quality provided by our Hungarian colleagues, and that was enough reason to invest more. We opened the new office in Q3.

The company now turned into a group of companies, and all was set up to extend the group more easily.

We grew back to 15 people by the end of the year.

2022: Benchmark.io

At ISC Benchmark.io was started. To help our customers do better benchmarks, we put all our knowledge into a separate product. Due to high demand for our consultancy services, it is in private beta only.

2022: Stream HPC Spain

Barcelona was opened in Q3.

The estimation is to grow to 25-35 people by the end of the year.

Contact us

Thank you for your interest in our company and services. We will try to answer your question within 24 hours.

There are three ways to get in contact:

    First Name (required)

    Last Name (required)

    Email (required)

    Company (required)

    Phone number

    Your Message

    See ‘about us‘ for the address and other business-specific information.

    New grown-ups on the block

    Members of the band There is one big reason StreamHPC chose for OpenCL and that is (future) hardware-support. I talked about NVIDIA versus AMD a lot, but knowing others would join soon. AMD is correct when they say the future is fusion: hybrid computing with a single chip holding both CPU- and GPU-cores, sharing the same memory and interconnected at high speed. Merging the technologies would also give possible much higher bandwidths to memory for the CPU. Let us see in short which products from experienced companies will appear on the OpenCL-stage.

    Continue reading “New grown-ups on the block”

    OpenCL on the CPU: AVX and SSE

    When AMD came out with CPU-support I was the last one who was enthusiastic about it, comparing it as feeding chicken-food to oxen. Now CUDA has CPU-support too, so what was I missing?

    This article is a quick overview on OpenCL on CPU-extensions, but expect more to come when the Hybrid X86-Processors actually hit the market. Besides ARM also IBM already has them; also more about their POWER-architecture in an upcoming article to give them the attention they deserve.

    CPU extensions

    SSE/MMX started in the 90’s extending the IBM-compatible X86-instruction, being able to do an add and a multiplication in one clock-tick. I still remember the discussion in my student-flat that the MP3s I could produce in only 4 minutes on my 166MHz PC just had to be of worse quality than the ones which were encoded in 15 minutes. No, the encoder I “found” on the internet made use of SSE-capabilities. Currently we have reached SSE5 (by AMD) and Intel introduced a new extension called AVX. That’s a lot of abbreviations! MMX stands for “MultiMedia Extension”, SSE for “Streaming SIMD Extensions” with SIMD being “Single Instruction Multiple Data” and AVX for “Advanced Vector Extension”. This sounds actually very interesting, since we saw SIMD and Vectors op the GPU too. Let’s go into SSE (1 to 4) and AVX – both fully supported on the new CPUs by AMD and Intel.

    Continue reading “OpenCL on the CPU: AVX and SSE”

    Feedback & Privacy

    thankyouThis field is huge and ever-changing, which means that certain old posts might need to get updated information. Also, most of our team is made by humans: a species famous for making all kinds of mistakes.

    We care about what you say!

    "Feedback is the breakfast of champions."
    --Ken Blanchard
    • We are not native English speakers. Did we say something strange?
    • Is the site somewhat too technical or is it missing a bit of hard-core code examples?
    • Is some important information missing?
    • Is the publishing of the book or Eclipse-plugin taking too long?
    • Is the site too slow? (or broken in any other way?)
    • Do you have any compliments? We blush easily!

    Tell us what you think by using the contact-page or sending an e-mail to feedback@streamhpc.com.

    Privacy

    We track the pages you visit with Piwik and Google Analytics, and we use this information to improve our webpage. For Google Analytics there is an opt-out tool, to be excluded from any webpage that uses it. For Piwik you have the choice to opt-out by using the form below. If you opt-out, please be kind to give us feedback on how we can improve our page.

    Dutch: Gratis kennisochtend over de nieuwe generatie processoren

    Ergens opgevangen dat grafische kaarten tegenwoordig ingezet kunnen worden voor zware berekeningen? Tijdens een koffiegesprek gehoord over vector-processors als aanvulling op scalaire processors? Dan wordt het tijd dat u de grote veranderingen op processorgebied op een rijtje krijgt om uw organisatie beter op innovatief gebied te kunnen sturen.

    Zie https://streamhpc.com/education/gratis-kennislunch/ voor een uur uitleg op lokatie.

    Voor wie is deze kennis-ochtend?

    Bedrijven voor wie snelheid belangrijk is en grote hoeveelheden data moeten verwerken. Bijvoorbeeld rekencentra, R&D-afdelingen, financiele instituten, ontwikkelaars van medische software, algoritme-ontwikkelaars en vision-bedrijven. Ook investeerder met hitech-bedrijven in hun portfolio kunnen gratis op de hoogte gebracht worden van de huidige ontwikkelingen.

    U heeft geen technische achtergrond nodig, maar u zult zich niet vervelen indien u bits&bytes spreekt. Wij vragen uw achtergrond aan te geven, zodat we de juiste details in het programma kunnen toevoegen.

    Wat is het programma?

    In het eerste uur hoort u hoe de huidige processor-markt veranderd zijn ten opzichte van enkele jaren geleden – en welke nieuwe software-ontwikkelmethodes zijn geintroduceerd. Daarna krijgt u een overzicht van de nieuwe oplossingen die beschikbaar zijn en hoe dit zich verhouden tot de bestaande. Dit geeft u dan voldoende inzichten om te bepalen of het toepasbaar is binnen uw bedrijf. Het uur wordt afgesloten met wat StreamHPC voor u kan betekenen, maar ook wat u zelf kunt doen.

    In het tweede uur bespreken we enkele use-cases en is er tijd voor vragen. De use-cases die worden besproken zijn afhankelijk van de achtergronden van de aanwezigen; denk aan bijvoorbeeld Monte Carlo, physics, enzym-werkingen, matrix-berekeningen en neurale netwerken.

    Wanneer?

    Indien er minimaal 10 aanmeldingen zijn, wordt er een datum geprikt.

    Indien u binnen uw bedrijf direct interesse heeft, is het mogelijk dat StreamHPC bij u langs komt om deze presentatie te geven aangepast aan uw achtergrond. Neem daarvoor contact met ons op.

    Rapid Performance Assessment

    tesla-xeonphi-fireproYou might have heard about the major speed-ups GPUs and FPGAs have promised, but also about the fact that this speed-up will depend a lot on the type of software/algorithm. Investing in OpenCL or CUDA can therefore feel risky, since going in costs time and money, while keeping out can potentially give too much space to the competition. But if you want your customers to get the best experience without paying an unnecessary high price, you’ll need to know what the return of your investment could be. With this quick assessment we will help you determine exactly that.

    What we’ve done before

    Most assessments were on answering the question “How much speed-up can I get using GPUs?“. Other questions were:

    • Does this algorithm work on this specific mobile processor?
    • Can we better use CUDA, OpenCL or OpenGL shaders for this algorithm?
    • Does the HPC code run best on a Tesla K40 or FirePro S9150?
    • How many weeks/months would it take to port all code?
    • How many GPUs do I need for under 1 second responses?
    • Does this code port to an FPGA?
    • Which OpenCL device best suites by algorithm: CPU, GPU, APU, DSP, FPGA or something else?

    Is your question in the list?

    Program

    Within a week we can fully analyse your code, or two weeks if the codebase is large or complex. During the assessment we write/port/optimise code, to be able to support our conclusions with numbers.

    After the assessment you get an overview of the hotspots, an indication of total speed-up when using OpenCL (or comparable technology), and the answers to your questions.

    Preparations

    Send a mail to contact@streamhpc.com for more information, and we’ll call you back to talk about your requirements. Please provide times when you want to be called back.

    [button text=”Contact form” url=”https://streamhpc.com/about-us/contact/” color=”red” target=”_self”]

    StreamHPC flirts with ARM

     With the launch of twitter-channel @OpenCLonARM we now officially show a strong interest in ARM for compute. And we are not the only ones, as the twitter already has 80 followers (60 in 1.5 day and 12 retweets of the welcome-message).

    ARM has made tremendous progress in both technology and market-share. With ARM-64, companies like NVidia (and maybe AMD) in the field, X86 seems to be getting a real competitor. This could happen because since a few years computers are fast enough and are not being replaced by a faster one, but a smaller one (tablet, phone) or extra one. By the rules of the market, current technologies are replaced by the ones that give those other needs. ARM is fast (enough), flexible in design, very cheap, low-power and passively cooled. The biggest obstacle seems to be only getting a standard for a docking-station to connect your mobile, tablet or watch to keyboard, mouse and large screen.

    OpenCL is perfect for ARM, as it gives the computation-power to the intensive computations not already covered by hardware-support. In the world of X86 this interests high performance and big data companies, where on ARM this interests also more. Without the need for OpenCL you can already watch HD video, with OpenCL you can encode the video with MP4. This year you will certainly hear more about new possibilities of OpenCL on ARM.

    What do you think. Why does Intel not sell IP to ARM-companies as many technologies could be reused? Could Intel be the next ARM as an IP-seller, or will they stay the defender of X86 for many years to come?

    streamhpc.com is not affiliated with ARM.

    Events&Talks

    StreamHPC gives talks at public and in-company events to explain what GPU-programming is, while focusing on the day’s theme.

    You are welcome to attend these days, or you can request a talk about OpenCL and GPU-programming to be given at your event.

    Agenda Talks

    At the events in the list below Vincent Hindriksen will give a talk, or has given a talk.

    DateLocation - LanguageDescriptionLink to programType
    6 November 2013Nijkerk - EnglishAparapi and Project Sumatra: using GPGPU in JavaNLJUG J-Fall 2013Registration and NLJUG-membership required
    31+31 October 2013Cambridge (UK) - EnglishUsing the GPU for Physics computations via OpenCLMosaic3DXRegistration required
    4 October 2012Amsterdam - EnglishIntroduction to OpenCL on mobile processors, to make hackers&funders think of new ideas for products which were never possible before.Hackers and Founders (Amsterdam, NL) MeetupFree
    20 September 2012EnglishOngoing work on the OpenCL plugin for Eclipse - presented remotely.
    I cancelled this, because my work has not advanced enough.
    PTP User-Developer Workshop Sept 18-20, ChicagoRegistration required.
    28 June 2012Amsterdam - EnglishGPGPU-day organised by StreamHPC. No talks by us, but by many interesting speakers from many Dutch universities.Platform Parallel NLRegistration required.
    Free for students and researchers in the Netherlands.
    20 June 2012Delft - EnglishIndustry-session at HPDC'12 (session 4). "Parallel Programming for the Masses" about how to walk the road forward while on the road of legacy.HPDC'12Registration required.
    15 June 2012Ede - DutchTalk at SDN about how to use GPU-programming in .NET, including introduction to GPU-programming.SDNRegistration required.
    Free for SDN-members.
    25 May 2012Amsterdam - EnglishIn-company talk about parallel and GPU programming to decrease power usage.Internal

    Reservations

    For reservations and requests, please mail to events@streamhpc.com.

    Agenda Events

    We are visiting or have visited the following events. This is perfect if you want to have a quick discussion with us.

    DateLocationDescriptionLink to program
    20-22 January 2014Vienna, AustriaA premier forum for experts in computer architecture, programming models, compilers and operating systems for embedded and general-purpose systems.HiPEAC
    12+13 May 2014Bristol, UKAn annual meeting of OpenCL users, researchers, developers and suppliers to share OpenCL best practise, and to promote the evolution and advancement of the OpenCL standard.IWOCL
    20 June 2013Amsterdam, NetherlandsAll about GPGPU in the NetherlandsApplied GPGPU-days
    21+22 May 2013London, UKAll about HPC-techniques on low-power devicesLEAP-conference
    26-28 February 2013Nürnberg, Germany873 exhibitors from 37 countries. Will focus on the processors with high-end compute-capabilities.Embedded World'13 Exhibition
    21-23 January 2013Berlin, GermanyInternational conference on high-performance and embedded architectures and compilers.HiPEAC'13
    18 December 2012Paris, FranceCancelled. Meetup by Paris HPC group. This talk will be about most efficient (GFLOPS / Watt) processors existing today.Very High Performance/Watt Processors: the Road to Exascale
    17 December 2012Amsterdam, NetherlandsThe e-BioGrid project is part of the BiG Grid project to establish an e-infrastructure for life sciences.e-BioGrid and beyond
    13 December 2012Brussels, BelgiumAll around GPUs, FPGAs and upcoming architectures.Symposium on Personal High-Performance Computing
    5 + 6 November 2012Amsterdam, NetherlandsBig data event around Smart Systems, Cloud Computing, Mobile and Social Media. (I did a pitch talk here)Perfect Storm Europe
    29 November 2012Eindhoven, NetherlandsAltera explains how FPGAs with an ARM-core can work for many types of problems.Altera Soc FPGA eventAltera SoC FPGA event
    24 - 26 October 2012Eindhoven, NetherlandsTalks around multiscale science.Opening Symposium Eindhoven Multiscale Institute
    26 September 2012Amsterdam, NetherlandsAlles over Grid-computing in Nederland.BiG Grid and beyond

    Ask your question

    Do you have a question? We are happy to answer all your questions on any subject discussed at this website.

    Due to spam floods, we removed the form.

    info@streamhpc.com

    We try to answer your question within 24 hours.

    Code Review

    Code reviews are one of the fastest ways to get the dev-team back on track in order to add performance to the code. We offer two types of code reviews, all safely under an NDA. This way you keep in control of the development, while getting expert-knowledge in.

    A quick scan gives you an overview of the main ways to speed up the code and how it can be done.
    This quick scan can be delivered in one week, if necessary, to give you the direction you may require in times of pressure.

    Also, an extensive code review can provide all the necessary information for a redesigned architecture.

    GPU-code (OpenCL, CUDA, Aparapi, and more)

    Writing GPU-code and performing host-code can be tricky. The best method to learn CUDA or openCL is by doing. Nevertheless, you may need feedback sometimes to be sure you’re doing the right thing. We can check your code and give you a report with hand-on tricks to make it optimal.

    CPU-code (Java, C, C++ and more)

    Many CPU-codes, like Java, C, C++ and C# are written with functionality in mind, but not performance. Adding performance (cache-optimisation, memory-usage reduction, parallelisation of computations, adding OpenMP-threads, etc) is quite doable, but only when you know how. We can help you increase performance of the software through feedback and clear steps.

    Let us help you!

    If you are interested in this service, request more information today and we will get back to you as soon as possible. Of course, you can also contact us via phone (+31 6 45400456), or e-mail (info@streamhpc.com).

    21-23 August: OpenCL Training London

    From 21 to 23 August StreamHPC will give a 3-day training in OpenCL. Here you will learn how to develop OpenCL-programs.

    A separate ticket for only the first day can be bought, as then will be a crash-course into OpenCL. Module basics.

    The second and third day will all about parallel-algorithm design, optimisation and error-handling. Module optimisation with several new subjects added.

    The last part of the third day is reserved for special subjects, as requested by the attendees. Continue reading “21-23 August: OpenCL Training London”