We sponsor HiPEAC again this year

Posted by Vincent Hindriksen on 17 December 2014

HiPEAC is an academic oriented, 3-day, international conference around HPC, compilers and processors. Last year was in Vienna, this year in Amsterdam – where StreamHPC also is based. That was an extra reason to go for silver sponsorship, besides I find this conference very important.

Compilers have the job to do magic. Last year I had nice feedback on my request to give the developers feedback where in the code the compiler struggles – effectively slapping the guy/gal instead of trying to solve it with more magic. Also learned a lot about compilers in general, listened to GPGPU-talks, discussed about HPC, and most of all: met a lot of very interesting people.

Why you should come too? I give you five reasons:

Learn about compilers and GPU-techniques, in depth.
Have great discussions about the latest and greatest research, before it’s news.
Meet great people who create the compilers you use (or the reverse).
Visit Amsterdam, Netherlands – I can be your guide. Flights are cheap.
Only spend €400 for the full 3 day programme and a unique dinner with 500 people – compare that to SC14 and GTC!

If you are seeking for a job in HPC, compilers and GPGPU, you should really come over. We’re there, but several other sponsors are also looking for new employees too.

See the tracks at HiPEAC, which has a lot more GPU-oriented talks than last year. I selected a few from the list in bold.

Monday

Opening address
William J. Dally, Challenges for Future Computing Systems
Euro-TM: Final Workshop of the Euro-TM COST Action
Session 1. Processor Core Design
CS²: Cryptography and Security in Computing Systems
IMPACT: Polyhedral Compilation Techniques
MCS: Integration of mixed-criticality subsystems on multi-core and manycore processors
EEHCO: Energy Efficiency with Heterogeneous Computing
INA-OCMC: Interconnection Network Architecture: On-Chip, Multi-Chip
WAPCO: Approximate Computing
SoftErr: Mitigation of soft errors: from adding selective redundancy to changing the abstraction stack
Session 2. Data Parallelism, GPUs
James Larus, It’s the End of the World as We Know It (And I Feel Fine)
ENTRE: EXCESS & NANOSTREAMS
SiPhotonics: Exploiting Silicon Photonics for energy-efficient high-performance computing
HetComp: Heterogeneous Computing: Models, Methods, Tools, and Applications
Session 3. Caching
Session 4. I/O, SSDs, Flash Memory
Student poster session / Welcome reception

Tuesday

Don’t forget to meet us at the industrial poster-sessions.

Rudy Lauwereins, New memory technologies and their impact on computer architectures
Thank you HiPEAC
Session 5. Emerging Memory Technologies
EMC²: Mixed Criticality Applications and Implementation Approaches
ADEPT: Energy Efficiency in High-Performance and Embedded Computing
MULTIPROG: Programmability Issues for Heterogeneous Multicores
WRC: Reconfigurable Computing
TISU: Transfer to Industry and Start-ups
HiStencils: High-Performance Stencil Computations
MILS: Architecture and Assurance for Secure Systems
Programmability: Programming Models for Large Scale Heterogeneous Systems
Industrial Poster Session
INNO2015: Innovation actions in Advanced Computing CFP
Session 6. Energy, Power, Performance
DCE: Dynamic Compilation Everywhere
EUROSERVER: Green Computing Node for European Micro-servers
PolyComp: Polyhedral Compilation without Polyhedra
HiPPES4CogApp: High-Performance Predictable Embedded Systems for Cognitive Applications
Industrial Session
Session 7. Memory Optimization
Session 8. Speculation and Transactional Execution
Canal tour / Museum visit / Banquet

Wednesday

Burton J. Smith, Resource Management in PACORA
HiPEAC 2016
Session 9. Resource Management and Interconnects
PARMA-DITAM: Parallel Programming and Run-Time Management Techniques for Many-core Architectures + Design Tools and Architectures for Multi Core Embedded Computing Platforms
ADAPT: Adaptive Self-tuning Computing System
PEGPUM: Power-Efficient GPU and Many-core Computing
HiRES: High-performance and Real-time Embedded Systems
RAPIDO: Rapid Simulation and Performance Evaluation: Methods and Tools
MemTDAC: Memristor Technology, Design, Automation and Computing
DataFlow, Computing in Space: DataFlow SuperComputing
IDEA: Investigating Data Flow modeling for Embedded computing Architectures
TACLe: Timing Analysis on Code-Level
EU Projects Poster Session
Session 10. Compilers
HIP3ES: High Performance Energy Efficient Embedded Systems
HPES: High Performance Embedded Systems
Session 11. Concurrency
Session 12. Methods (Simulation and Modeling)

Hopefully till then!

AMD now leads the Green500

Posted by Vincent Hindriksen on 25 November 2014

With SC14 behind us, there are a few things I’d like to share with you. I’d like to start with the biggest win for OpenCL: AMD leading in the most power-efficient GPU-cluster.

A few months ago I wrote a theoretical article on how to build the cheapest and greenest supercomputer to enter the Top500 and Green500. There I showed that AMD would theoretically win on both GFLOPS/costs and GFLOPS/Watt. Last week I learned a large cluster is actually being built in Germany, which now leads the Green500 (GFLOPS/Watt). It is powered by Intel Ivy Bridge CPUs, an FDR Infiniband network and accelerated by air-cooled(!) AMD FirePro S9150 GPUs, as can be seen on the Green 500 report of November. The score: 5.27 GFLOPS per Watt, mostly because of AMD’s surprise act: extremely efficient SGEMM and DGEMM.

The first NVIDIA Tesla-based system on the list is at #3 with 4.45 GFLOPS per Watt for a liquid cooled system. If the AMD FirePro S9150 would be oil or water cooled, the system could go to over 6 GFLOPS per Watt. I’m expecting such system on the Green500 of June. The PEZY-SC (#2 on the list) is a very interesting, unexpected newcomer to the field – I’ll share more with you later, as I heard it supports OpenCL.

The price metric

The cluster at GSI Helmholtz Center has around 1.65 double precision PetaFlops (theoretical). Let’s do the same calculation as with the 150 GFLOPS system using the latest prices, only taking the accelerator part.

640 x AMD FirePro S9150.

2.53 GFLOPS * 640 = 1.62 TFLOPS (I rounded down to 2.0 GFLOPS in the other article)
US$ 3300. Total price: $2.112M. Price per TFLOPS: $1.304M
235 Watt * 640 = 150 kWatt (excluding network, CPU, etc)

640 x NVIDIA Tesla K40x

1.42 GFLOPS * 640 = 0.91 TFLOPS
US$ 3160 (got down a lot due to introduction K80!). Total price: $2.022M. Price per TFLOPS: $2.225M
235 Watt * 640 = 150 kWatt

640 x Intel XeonPhi 7120P

1.21 GFLOPS * 640 = 0.65 TFLOPS
US$ 3450. Total price: 2.208$M. Price per TFLOPS: $3.397M
300 Watt * 640 = 192 kWatt

So it’s pretty clear, why GSI chose AMD: $92M or $209M less costs for the same GFLOPS. Also note that more GFLOPS per accelerator is important to lower overhead.

What to expect from June’s Green500

Next year Nvidia probably comes with Maxwell, which probably will do very well in the Green500. Intel has their new XeonPhi, but it’s a very new architecture and no samples have arrived yet – I would be surprised, as they over-promised for too long now. Besides bringing surprises, Intel’s other strengths are its vast collaborations and strong fanbase – the past years I heard the most ridiculous responses on why such underperforming accelerator was chosen instead of FirePro or Tesla, so it’s certainly aiming for a rampage (based on hope). AMD did not enclose any information on a new version of the S9150 (Something like S9200 or S9250).

Then there are the dual GPUs, which have no advantages but lower energy-usage. The K80 just arrived, but the number don’t add up yet – we’ll have to see when the samples arrive. AMD did not say anything about the next version of the S10000, but probably arrives next year – no ETA. Intel did not do dual-chip cards until now. These systems can be built more compact, as 4 GPUs per system is becoming a standard.

Another important change will be the CPUs with embedded CPU being used in the clusters, where now mostly Intel Xeons rule the world. Intel’s Iris Pro line and AMD new Carrizo APU could certainly get more popular, as more complex code can be accelerated very well by such processors. Also 64-bit ARM-processors we’ll see more – hopefully with GPU. This subject I’ll handle in a separate article, as OpenCL could be a big enabler for easy offloading.

Based on the current information I have available, Nvidia aims for Maxwell based Teslas, AMD with S9150 and the dual-GPU variant, Intel with none (aiming for November 2015). It’ll be exciting to see HPC get to 6+ GFLOPS/Watt as a standard – I find that more important than building the biggest cluster.

OpenCL will help select hardware from that year’s winner, not being locked in to that year’s loser. Meanwhile at StreamHPC we will keep building OpenCL-based software, to help our customers pick that winner.

Let’s meet at ISC in Frankfurt

Posted by Vincent Hindriksen on 13 June 2016

Vincent Hindriksen will be walking around at ISC from 20 to 22 June. With me I bring our latest brochure, some examples of great optimisations and some Dutch delicacies. Also we will also have some exciting news with an important partner – stay tuned!

It will be a perfect time to discuss how StreamHPC can help you solve tough compute problems. Below is a regularly updated schedule of my time at ISC.

Get in contact to schedule a meeting.

If you’d like to talk technologies and bits&bytes, we’re trying to make a get-together – date&time TBD.

VectorFabrics: 2014 will be parallel

Posted by Vincent Hindriksen on 9 January 2014

Toolmaker VectorFabrics sent 9 predictions for this year in their newsletter. I’d like to share it with you.

Nine predictions for 2014 that prove the programming landscape is changing

It is not hard to predict that this year will see a lot of activity around multicores and manycores. 2014 will be the year that software has to catch up with highly concurrent hardware. So we expect to see some major changes in how people view multicore programming:

Neither Intel, AMD nor Qualcomm releases any new single core processors in 2014. Therefore, it is less and less acceptable to release pure sequentially operating applications.

Intel releases a 15-core Xeon. On a typical 4-socket motherboard your OS sees 120 available cores. OpenMP is the preferred programming paradigm on such a platform for data-intensive shared-memory calculations. You have to deal with performance bottlenecks, including Amdahl’s law, cache performances and memory bandwidth issues.

Next to ARM big.LITTLE systems, 2014 sees the first true octacore cell phones and tablets. At that point, it becomes painfully clear that applications need changes to benefit from so many cores. Both true octacore and big.LITTLE processors see very little adoption in mobile devices as long as software that can benefit is missing.

At least one major mobile phone vendor loses market share because their hardware may be good, but the software (especially the web browser) cannot utilize the hardware to the max.

Both the XBox One and PlayStation 4 feature AMD Jaguar octacore processors with GP-GPUs. Very few games will using all the compute power, and customers will wonder why to upgrade as the performance difference to their existing console is not so different.

Two great open standards, OpenACC and OpenMP see a nice boost in adoption thanks to upcoming support in the latest open source compilers. Clang 3.5 features OpenMP 4.0 support. In addition GCC 4.9 also receives OpenACC support.

In the mobile space, offloading to GP-GPUs is hot as new architectures from Qualcomm Adreno 420, Nvidia Tegra K1 and Imagination PowerVR Series 6 will each allow offloading. Managing and programming the offloading will remain a problem: OpenCL? OpenACC? CUDA? RenderScript?

Major desktop applications jump on the compute-offloading bandwagon to win performance, using either OpenCL or OpenACC

Programmability of the next-gen Intel Xeon Phi (Knights Landing) will raise some eyebrows. This 2015 chip will have 72 Atom Silvermont cores with local memory, cache and up to 384 GB shared memory. This is outside the comfort zone of most programmers.

You guessed right: their tool has to do with parallel programming.

What are your predictions for 2014?

Learning both OpenCL and CUDA

Posted by Vincent Hindriksen on 23 September 2010 with 2 Comments

Be sure to read Taking on OpenCL where I’ve put my latest insights – also for CUDA.

The two¹ “camps” OpenCL and CUDA both claim you should first learn their language first, after which the other would be easy to learn. I’m from the OpenCL-camp, so I say you should learn OpenCL first, but with a strong emphasis on hardware-architecture understanding. If I had chosen for CUDA I would have said the opposite, so in other words it does not matter which you do first. But psychology tells us that you probably like the first language more since there is where you discovered the magic; also most people do not like to learn a second language which is much alike and does not add a real difference. Most programmers just want to get the job done and both camps know that. Be aware of that.

NVIDIA is very good in marketing their products, AMD has – to say it modest – a lower budget for GPGPU-marketing. As a programmer you should be aware of this difference.

The possibilities of OpenCL are larger than those of CUDA, because of task-parallel programming and support for far more different architectures. At the other side CUDA is much more user-friendly and has a lot of convenience built-in.

Continue reading “Learning both OpenCL and CUDA” →

Neil Trevett on OpenCL

Posted by Vincent Hindriksen on 21 April 2012 with 4 Comments

The Khronos Group gave some talks on their technologies in Shanghai China on the 17th of March 2012. Neil Trevett did some interesting remarks on the position of NVidia on OpenCL I would like to share with you. Neil Trevett is both an important member of Khronos and employee of NVidia. To be more precise, he is the Vice President Mobile Content of NVidia and the president of Khronos. I think we can take his comments serious, but we must be very careful as these are mixed with his personal opinions.

Regular readers of the blog have seen I am not enthusiastic at all about NVidia’s marketing, but am a big fan of their hardware. And exactly I am very positive they are bold enough in the industry to position themselves very well with the fast-changing markets of the upcoming years. Having said that, let’s go to the quotes.

All quotes were from this video. Best you can do is to start at 41:50 till 45:35.

http://www.youtube.com/watch?v=_l4QemeMSwQ

At 44:05 he states: “In the mobile I think space CUDA is unlikely to be widely adopted“, and explains: “A party API in the mobile industry doesn’t really meet market needs“. Then continues with his vision on OpenCL: “I think OpenCL in the mobile is going to be fundamental to bring parallel computation to mobile devices” and then “and into the web through WebCL“.

Also interesting at 44:55: “In the end NVidia doesn’t really mind which API is used, CUDA or OpenCL. As long as you are get to use great GPUs“. He ends with a smile, as “great GPUs” refers to NVidia’s of course. 🙂

At 45:10 he puts NVidia’s plans on HPC, before getting back to : “NVidia is going to support both [CUDA and OpenCL] in HPC. In Mobile it’s going to be all OpenCL“.

At 45:23 he repeats his statements: “In the mobile space I expect OpenCL to be the primary tool“.

Continue reading “Neil Trevett on OpenCL” →

ZiiLabs Tablet

[infobox type=”information”]

Need a ZiiLabs ZMS-40 programmer? Hire us!

[/infobox]

Intel has bought ZiiLabs, but you can still order the ZMS-40.

ZiiLabs has an early access program for OpenCL on their StemCell processor, the 100-Core ZMS-40. It could do more than 20 GFLOPS/Watt, but no official numbers have been released.

It consists of:

ZMS-40 powered tablet
OpenCL compiler (no information if it is cross or native)
Code samples

Read more at http://www.ziilabs.com/products/software/opencl.php about their program. Also check the information on the ZMS-40 to see what the processor is capable of. Here are a few characteristics:

Quad 1.5 GHz ARM Cortex-A9 MP Cores
96x fully-programmable StemCell Media Processing cores
58 GFlops StemCell compute power

Social Media: Facebook, LinkedIn and Twitter

Facebook

We have presence on Facebook, via a company page: StreamHPC

Also check out Khronos’ OpenCL fanpage to hear more news on OpenCL.

Via http://www.linkedin.com/company/StreamHPC you can hear more about company-specific news. It has the news comparable to the newsletter.

Twitter

You can also follow us on Twitter. We have several accounts:
[columns]
[one_half title=”General”]
StreamHPC.
Our main account. Everything GPGPU, OpenCL and extreme software performance. .
OpenCL:Pro
with focus on jobs and internships. .
OpenCLHPC.
on OpenCL usage in HPC. .
WebCLNews
on the current state of WebCL. .
OpenCLGuru
to answer your questions on OpenCL – at your service. .
[/one_half]
[one_half title=”Hardware specific”]

OpenCLonAMD
on the current state of OpenCL on AMD-processors. .
OpenCLonARM
on the current state of OpenCL on ARM-processors. .
OpenCLonFPGAs
on the current state of OpenCL on FPGAs. .
OpenCLonDSPs
on the current state of OpenCL on DSPs. .
OpenCLonRISC
on the current state of OpenCL on RISC. .
[/one_half]
[/columns]
We hope you enjoy our Twitter channels! If you have suggestions, just tweet us!

Disruptive Technologies

Posted by Vincent Hindriksen on 22 February 2011 with 4 Comments

Steve Streeting tweeted a few weeks ago: “Remember, experts are always wrong about disruptive tech, because it disrupts what they’re experts in.”. I’m happy I evangelise and work with such a disruptive technology and it will take time until it is bypassed by other technologies. And that other technologies will be probably be source-to-OpenCL-source compilers. At StreamHPC we therefore keep track of all these pre-compilers continuously.

Steve’s tweet got me triggered, since the stability-vs-progression-balance make changes quite hard (we see it all around us). Another reason was heard during the opening-speech of engineering world 2011 about “the cloud”, with a statement which went something like: “80% of today’s IT will be replaced by standardised cloud-solutions”. Most probably true; today any manager could and should click his/her “data from A to B”-report instead of buying a “oh, that’s very specialised and difficult” solution. But at the other side companies try to let their business live as long as possible. It’s therefore an intriguing balance.

So I came up with the idea to play my own devil’s advocate and try to disrupt GPGPU. I think it’s important to see what can disrupt the current parallel-kernel-execution model of OpenCL, CUDA and the others.

Continue reading “Disruptive Technologies” →

Hello

Welcome to the webpage of Stream HPC. We’re a company in Europe that work on solving the most difficult HPC problems with emphasis on scaling to GPUs and clusters. We have built up experience in speeding up software, designing performance oriented architectures, writing maintainable low-level code, selecting the best hardware for the job, and building benchmarks. Above all, we’re a customer oriented company, as we want our clients to feel in control, while we do that heavy lifting.

The company is multi-cultural and designed to be a safe space for everybody of our team – from LBGT+ to Asperger’s, we focus on making our differences our strengths. As you can read in the job self-assessment, we have 4 main strengths:

CPU development: algorithms, low-level code, architectures for CPU-based software. This includes clusters.
GPU development: algorithms, low-level code, architectures for GPU-based software. This includes graphics programming
Problem-solving: get from full understanding to full exploration quickly.
Self-managed teams: we don’t hire managers, but provide frameworks.

Our customers are all around the world, but especially North-America, West-Europe and East-Asia. We have built many high performance software that run from edge-computers to super-computers. See “What we do” for examples.

Our offices are in:

Amsterdam
Budapest
Barcelona

If you want to know more, feel free to get in contact.

See this page for Netherlands/Belgium, Hungary or Spain.

What is Khronos as of today?

Posted by Vincent Hindriksen on 4 May 2017 with 1 Comment

The Khronos Group is the organization behind APIs like OpenGL, Vulkan and OpenCL. Over one hundred companies are a member and decide together what your next year phone, camera, computer or media device will be capable of.

We work most with OpenCL, but you probably noticed we work with OpenGL, Vulkan and SPIR too. Currently they have the following APIs:

COLLADA, a file-format intended to facilitate interchange of 3D assets
EGL, an interface between Khronos rendering APIs such as OpenGL ES or OpenVG and the underlying native platform window system
glTF, a file format specification for 3D scenes and models
OpenCL, a cross-platform computation API.
OpenGL, a cross-platform computer graphics API
OpenGL ES, a derivative of OpenGL for use on mobile and embedded systems, such as cell phones, portable gaming devices, and more
OpenGL SC, a safety critical profile of OpenGL ES designed to meet the needs of the safety-critical market
OpenKCam, Advanced Camera Control API
OpenKODE, an API for providing abstracted, portable access to operating system resources such as file systems, networks and math libraries
OpenMAX, a layered set of three programming interfaces of various abstraction levels, providing access to multimedia functionality
OpenML, an API for capturing, transporting, processing, displaying, and synchronizing digital media
OpenSL ES, an audio API tuned for embedded systems, standardizing access to features such as 3D positional audio and MIDI playback
OpenVG, an API for accelerating processing of 2D vector graphics
OpenVX, Hardware acceleration API for Computer Vision applications and libraries
OpenWF, APIs for 2D graphics composition and display control
OpenXR, an open and royalty-free standard for virtual reality and augmented reality applications and devices
SPIR, a intermediate compiler target for OpenCL and Vulkan
StreamInput, an API for consistently handling input devices
Vulkan, a low-overhead computer graphics API
WebCL, a JavaScript binding to OpenCL within a browser
WebGL, a JavaScript binding to OpenGL ES within a browser on any platform supporting the OpenGL or OpenGL ES graphics standards

Too few people understand that the organization is very unique, as the biggest processor vendors are discussing collaborations and how to move the market, while they’re normally the fiercest competitors. Without Khronos it would have been a totally different world.

Molybdenite and graphene to the helping hand?

Posted by Vincent Hindriksen on 26 April 2011

The Last Nimzy — The rabbit in “The Last Mimzy” was very special. What material was it made of?

You might have read about Molybdenite a few months ago. It is more efficient than Graphene which is in turn more efficient than good old Silicon, most notable energy-wise. Magazine ‘Nature’ had an article on it, which is summarised by Psychorg, so check it out. The claim it is 100 000 times more efficient than Silicon (and more efficient than the already very promising Graphene). This fan-free Silicon-replacer would be a major disaster for the cooling-industry!

But what would change for us? We are now on the edge to move to ARM (started by the smartphone- and tablet-industry), but is al this needed if the energy-costs drop to prices comparable to the costs to keep ice-cream cold on the North-Pole (20 years ago). This technique would give huge potential to Fusion-chips which now have a long way to go, to solve the heat-problem. But since it would take several years (and thus decades in hi-tech years) to get these chips on the market, no assumptions for market-share can be made based on what will happen in a few years.

Low-power ARM and Molybdenite X86

So this is European ARM (and licensees around the world) vs US Intel and AMD. The sarcastic joke among me and a few friends make, is that the fight of the past 20, 30 years between the economic US and EU is actually about who has the money to hire the most Asians, to develop the revolutionising devices. But as long as the US and EU have the feeling we are actually the equation of the competition as we are a massive 12% of the world-population, I won’t be behind the facts too much.

Since batteries don’t evolve as fast as processors, the power-problem needed to get slashed differently. A mayor reason for choosing ARM is that it uses less energy than X86, just like LCD/TFT is replaced by e-ink and organic LEDs and memory is non-volatile in portable devices.

In case we get a big reduction for CPU and memory, then the efficiency of the architecture is less of a problem. So then Intel and AMD can re-enter the market again, but then with much more powerful devices. Until then ARM-licensees like NVIDIA and ImTec have a better market if it comes to near-future devices. As I expected more tablet-manufacturers come up with docking-stations to replace the PC with a tablet. AMD and Intel have to keep surprising (and probably protect their market) the coming years to avoid losing from ARM. In other words: the coming years will be exciting how the consumer-market looks like and which companies deal in it. When thinking about these years, keep in mind what Windows XP has thought us: computers are fast enough for what average Joe wants to do with it. Hey, I use my laptop for OpenCL and the big screen, for the rest I use my mobile phone.

Hybrid chips

While I did not see it as a serious problem last year, the heat-problem for a GPU+CPU on one chip is quite a challenge. Waiting for the Molybdrenite or Graphene chips to mature will be like digging your own grave. Each step forward will result in two new products: one which is more power and/or heat efficient, and one which is more powerful. Since the competition from ARM-companies is heavy, the chances that the focus will be on more powerful Hybrid CPUs is bigger. As I stated above the losses are in the low-power area. Intel and AMD are very aware of this challenge.

Have you checked the differences between DirectX 10 and 11 games? Just check the discussions on the growing side of not needing to support DirectX 11, because 10 is good enough. Also here, the demand is higher to have the same graphics-quality for less money on more portable devices. Hybrid CPUs will eat the GPU-market for sure.

ARM-processors are hybrid processors. That’s all I tell, so you can -in combination with all stated above- formulate your own conclusions. I was very surprised NVIDIA started targeting ARM with their high-end GPUs, but was this a real bad idea?

Device vs Data-centre

Reduction of energy-costs for processors will reduce the head-less servers in the data-centre enormously. Internet costs loads of energy, both the transport and the servers – this will reduce the server-part of energy-consumption-sum with quite some factors. All positive news.

But if it all this becomes true, that chips don’t use much energy anymore and actually mobile internet and other radios take the most, what will happen to the cloud? Will you upload your video to get it processed or put your mobile in the sun to charge it while waiting a shorter period?

Current developments, future needs

We need arithmetic, media-processing and input/output; we all have that. We need long battery-life, a good screen and a fast way to input our data and commands; we get more of that each day. But heat-production is Silicon limits a lot, so we get the perfect electronic device the moment we can replace Silicon. Getting rid of the heat could give us square chips, with challenges like reinventing the socket and multi-multi-layerness.

So the question to you: is in The Last Nimzy sequel (you know, the movie with the molybdenite rabbit) a logo of Intel, AMD, ARM or another company found?

The entanglement of Bitcoins and compute-capabilities

Posted by Vincent Hindriksen on 8 December 2012

Every now and then I read stories on Bitcoins (Wikipedia-article), as GPUs are used a lot to “mine” Bitcoins. They have some extensive benchmarks, and also their discussions giving me insights in specific parts of accelerators like GPUs. Also is this group very upwards if it comes to accepting new techniques. Today something changed: they are a bank now. One of the thoughts I had with this, I’d like to share with you.

If you look at various types of currencies, you see they all have various goals (trade, power, resources, energy, properties, etc). The inequality and differences are even more important than the amount. Various currencies are entangled to a certain goal or resource, but there is nothing entangled strongly to technology. Here is where Bitcoins come in…

Bitcoins are entangled with compute-power – a current benchmark for technological progress.

In this article I’d like to share how the tech-economy and Bitcoins are entangled, seen from the perspective of computing. I left out a lot of the “rules of economy” and hope you can put these in – the below text is just to guide you through the thought-process only. Disagreement is only good – as we learn all from it.

Continue reading “The entanglement of Bitcoins and compute-capabilities” →

Happy New Year!

Posted by Vincent Hindriksen on 1 January 2011

About a year ago this site was launched and a half year ago StreamHPC as a company was official for the Chamber of Commerce. It has been a year of hard work, but the reason for this all started after seeing the cover of a book about bore-outs. The result is there with a growing number of visitors from all over the world (from 62 countries since 23-Dec-2010) and new twitter-followers every week. Now some mixed news for 2011:

We are soon going to release a few plugins for Eclipse, both free and paid, to simplify your development.
2011 will be the year of hybrid processors (Intel SandyBridge and AMD Fusion), which will make OpenCL much more popular.
2011 is also going to be the year of the smart-phone (prognosis: in 2011 more smart-phones will be sold than PCs). So even more OpenCL-potential.
At 31-Dec-2010 we migrated the site to a faster server to reduce waiting-time also online.
The book will be released in parts, to avoid more delays.
There will be around ten (short) articles published in January. Both developers and managers will be served.
Our goal is to expand. We have shown you our vision, but we want to show you more.

In a few words: 2011 is going to be exciting! We wish all our readers, business-partners, friends, family and (new) customers a super-accelerated 2011!

StreamHPC – we accelerate your computations

An introduction to Grid-processors: Parallella, Kalray and KnuPath

Posted by Vincent Hindriksen on 9 June 2016

grid We have been talking about GPUs, FPGAs and CPUs a lot, but there are more processors that can solve specific problems. This time I’d like you to give a quick introduction to grid-processors.

Grid-processors are different from GPUs. Where a multi-core GPU gets its strength from being able to compute lots of data in parallel (SIMD data-parallellism), a grid-processors is able to have each core do something differently (MIMD, task-based parallelism). You could say that a grid-processor is a multi-core CPU, where the number of cores is at least 16, and the cores are only connected to their neighbours. The difference with full-blown CPUs is that the cores are smaller (like the GPU) and thus use less power. The companies themselves categorise their processors as DSPs or Digital Signal Processors, but most popular DSPs only have 1 to 8 cores.

For the context, there are several types of bus-configurations:

single bus: like the PCIe-bus in a PC or the iMX6.
ring bus: like the XeonPhi till Knights Corner, and the Cell processor.
star bus: a central communication core with the compute-cores around.
full mesh bus: each core is connected to each core.
grid bus: all cores are connected to their direct neighbours. Messages hop from core to core.

Each of them have their advantages and disadvantages. Grid-processors get great performance (per Watt) with:

video encoding
signal processing
cryptography
neural networks

Continue reading “An introduction to Grid-processors: Parallella, Kalray and KnuPath” →

We more than halved the FPGA development time by using OpenCL

Posted by Vincent Hindriksen on 31 October 2015

Over the past year we developed and fine-tuned a project setup for FPGA development that is much faster than any other method, including other high-level languages for making FPGA-based systems.

How we did it

OpenCL makes it easy to use the CPU and GPU and their tools. Our CPU and GPU developers would design software with FPGAs in mind, after which the FPGA developer took over and finalised the project. As we have expertise in the very different phases of such project, we could be much more effective than when sticking to traditional methods.

The bonus

It also works on CPU and GPU. It has to be said, that the code hasn’t been fully optimised for CPUs and GPUs – this can be done in a separate project. In case a decision has to be made on which hardware to use, our solution has the least risk and the most answers.

Our Unique Selling Points

For the FPGA market our USPs are clear:

We outperform traditional FPGA development companies in time-to-market and price.
We can discuss problems on hardware level, software level and algorithm level. This contrasts with traditional FPGA houses, where there are less bridges.
Our software also works on CPUs and GPUs for no additional charge.
The latencies of the resulting project are very comparable.

We’re confident we can make a difference in the FPGA market. If you want more information or want to discuss, feel free to contact us.

OpenCL Developer support by NVIDIA, AMD and Intel

Posted by Vincent Hindriksen on 14 April 2011 with 2 Comments

There was some guy at Microsoft who understood IT very well while being a businessman: “Developers, developers, developers, developers!”. You saw it again in the mobile market and now with OpenCL. Normally I watch his yearly speech to see which product they have brought to their own ecosphere, but the developers-speech is one to watch over and over because he is so right about this! (I don’t recommend the house-remixes, because those stick in your head for weeks.)

Since OpenCL needs to be optimised for each platform, it is important for the companies that developers start developing for their platform first. StreamComputer is developing a few different Eclipse-plugins for OpenCL-development, so we were curious what was already there. Why not share all findings with you? I will keep this article updated – know this article does not cover which features are supported by each SDK.

Continue reading “OpenCL Developer support by NVIDIA, AMD and Intel” →

SDKs

!!!THESE PAGES WILL BE MOVED TO OPENCL.ORG!!!

OpenCL is growing fast and various architectures now support compute-acceleration. This means that you have a lot of choice to find the right solution for your algorithm.

!!!THESE PAGES WILL BE MOVED TO OPENCL.ORG!!!

Working

X86
- AMD GPUs & CPUs – hardware and OpenCL 2.0 drivers available now.
- Intel CPUs – hardware and OpenCL 2.0 drivers available.
- NVidia GPUs – hardware and OpenCL 1.1 drivers available.
ARM
- ARM CPU – drivers available for ST-Ericsson
- ZiiLabs ARM Tablet – OpenCL-drivers available for B2B (minimal order volume size unknown).
- Imagination Technologies PowerVR – drivers available in H2 2013, but possibly earlier.
- Qualcomm Snapdragon – drivers available.
- Vivante GPUs – hardware (Freescale i.MX6) and drivers.
- ARM MALI – drivers available for Exynos 5 Dual. Not yet for Exynos 5420.
Grid-accelerator
- Adapteva Parallella board – hardware and drivers available now.
- (Kalray – currently has an OpenCL driver under a closed beta program).
- Intel Xeon Phi – hardware and OpenCL 2.0 drivers available.
FPGA
- Altera FPGA board – drivers and SDK public for Stratix V and Arria 10.
- XilinX FPGA board – closed program, but available.
DSP
- Texas Instruments DSP Board – drivers and SDK

Possibly in the (near) future

Currently we are looking into:

Game Consoles
- Nintendo Wii U dev – only vague rumours.
- Sony Playstation 4 Orbis – strong rumours.
Movidius – has internal builds, but will only release on customer’s request.
Texas Instruments – support on C66x multicore DSPs (PDF source) and on their ARM-chips.
ST-Ericsson

If you have more information, let us know.

Abandoned

IBM POWER-processor (and PS2) – hardware and drivers available, but not actively developed anymore.

Useful peripherals

When working with various devices, you might find the below tips useful.

ARM

When working with those small cute computers, three things come in handy:

a HDMI-switch (or monitor with more HDMI-inputs).
A small keyboard+mouse which uses Bluetooth or only one USB-port. I use the Logitech-keyboard as shown at the right.
A network-switch with enough free ports. Even though most boards have WIFI, good internet proofs itself to be valuable.

Copyright

All content, media, theme and blogs are copyright 2010-2012 StreamHPC and Vincent Hindriksen, unless otherwise stated. For questions about using material for your own business, blog or personal usage, please contact us to ask for permission. We protect our copyrights by any means necessary.

OpenCL is a trademark of Apple Computers Inc.

We work a lot with open source software, such as WordPress and Eclipse. The brochures are created with Inkscape and svgslides. We believe that the base of innovation should be for everybody, so everybody can build on top of that. That’s why you get all the information from the blog for free, as sharing information could give us necessary information back in return.

Used photos

Many photos link to the origin and sometimes tell a story or show the webpage of an artist. The images bellow are not linked, as they are used in the slider.

All other photos and images are bought from paid services: 123RF and Big Stock Photo. Please contact us if you would like a to know the link of a certain photo or image.

Online Tutorials are here

Posted by Vincent Hindriksen on 16 September 2016

46188854 - beautiful smiling female student using online education service. young woman looking in laptop display watching training course and listening it with headphones. modern study technology concept — Online training

We’re going online with our presentations and tutorials. This makes it easy to reach more people and make our trainings more flexible.

We’re starting with short introductory trainings, but we have bigger plans. Keep an eye on our events (shared on Twitter, LinkedIn, this blog and the newsletter) to see what the offerings are. And you’re very welcome to join!

On 4 October (new date) there will be an OpenCL 101 of two hours for free. Target timezone is East-America and Europe.

Agenda Online OpenCL 101

Introductions (20 minutes)
- StreamHPC
- GPUs and paralellism
- OpenCL
By example: Getting started with OpenCL (30 minutes)
By example: Porting a simple program to OpenCL (30 minutes)
Q&A in parallel (30 minutes). Ask us any question, for instance:
- General OpenCL.
- OpenCL on GPUs.
- OpenCL on FPGAs.
- What algorithms work well with GPUs, CPUs and FPGAs.
- StreamHPC services.
The next steps (5 minutes).
Closing words (5 minutes).

Tutorial server

You can already test if the tutorial server works for you by looking around in our demo room. The tutorial itself will be in another room. Use your own name and password “ap“.

The room linked to this resource is not configured correctly.

See you soon!

Computer Vision

Face_detection Computing demands in computer vision are high, and often real-time processing with low latency is desirable. Computer vision can greatly benefit from parallelization as higher processing speeds can improve object recognition rates while FPGA solutions may reduce energy demands or support the perception of lag-free processing. At StreamHPC, we have supported several customers in optimizing their software to work on a lower power budget and on a higher speed. We can support you in dedicated solutions based on GPUs or FPGAs to meet your demands.