Let’s approach the concept of programming through looking at the brain, the code and the computer.

The idea of a program lives in the brain of a programmer. The way to get the program to the computer is using a system process called coding. When the program coded on the computer and the program embedded as an idea in the brain are alike, the programmer is happy. When over time the difference between the brain-version and the computer-version grows, then we go for a maintenance phase (although this is still this mostly from brain to computer).

When the coding-language or important coding-paradigms change, something completely different happens. In such case the program in the brain is updated or altered. Humans are not good at that, or at least not many textbooks discuss how to change from one model to another.

In this article I want to discuss one of these new coding-paradigm: dependencies in parallel software.

Iterative programming learning path

Iterative or ‘and then’-programming is the most used form of programming. This method of making chronological software is very close to our understanding, as questions like what happened next? are in our nature. If you start learning programming, most of the times you begin with this. Your first program after Hello World can probably end up being a huge file with one long serialized story. This program contains the basic functions we try to learn: computations, loops, string-operations, screen-output, etc. The moment we advance in programming, we get choices to structure our code: functions, sub-routines, templates, etc. Since this gives code that can be readable enough, most of the time left is put in learning new functions or adding libraries with more functions (the specializations) which looks more like the first learning phase.

How proud I was when I made my first BASIC-program on an Apple IIe at age 8! I didn’t listen to my dad’s feedback. I cared only about what was on the screen and did not care about the code. Who wants to clean the room when there’s so much to play with?!

These days all starting programmers make their first program on a multi-core machine with a powerful GPU. Still 99% follows the same learning-path. My message here is that the second phase of learning should be altered to add multi-core programming.

Find my 2 cents bellow…

Queues and false dependencies

An average program is a serialized version of the idea in the programmer’s head: readable and fast running on pre-2005 computers. The problem becomes clear when you want to port to a parallel processor: programming is filtering unimportant details and adding glue to stick everything together. The filtering causes information on dependencies to get lost and the glue adds hints to the compiler which are not there. This creates false dependencies and causes jobs that will wait unnecessarily. Compilers are very good in finding dependencies in code and optimizing it for parallel execution, but for the best result it needs to know those missing pieces and to have the glue marked as such. If it doesn’t know, it makes errors in guessing which depends on… false dependencies.

It is like the above image of the queues on airports. If there would be one service-desk and one security-port, then nobody in the queue needs to know where their plane will take off. But as there are many service-desks and several security-checkpoints, all passengers need to know and communicate is their destination.

We still program as if there were only one processor, and we assume that one processor will just get more cores to take care of the commands queue. Assuming several cores while programming is a start, but so much more can be achieved!

Step 1: assume a very wide bus

If you think of a small corridor where each command has to go through, you limit the software to be parallelized. You can better think there is a wide square where all commands can freely walk (like the airport example). So if there is no dependency, you can dump all commands in it at once. Don’t code any order.

Telling what needs to be done next should be avoided when possible, so the scheduler has more jobs to choose from. The more jobs the scheduler can choose from, the more temporary locks can be avoided. The less temporary locks (and thus empty queues), the faster the software runs on parallel processors.

In OpenCL you would try to define kernels that process data independently. This search for non-dependencies and actually avoid communication (if possible) is a core-task of designing an OpenCL program.

Step 2: Draw a dependency-graph

When looking at the functional description in a requirements-document, think of first making a dependency-graph before doing other pre-coding steps. Write each group of related functionality on a post-it, and order them such that you get an overview which post-it depends data-wise on which others. You’ll probably end up in a line that widens and narrows, or in a web (with a main line). The conclusions you can draw from such graph are that certain functionality has a lot of time and others are under pressure.

Enriching the graph can be done by adding what data is transferred between post-its, how long a certain post-its will take (use different colors), where user-input is needed, etc. What exact meta-data you should add, depends heavily on what you are making.

The magic that happens here is that software gets into your brain differently. You’ll start thinking in time-constraints, user-dependencies and parallelism. All subjects that become easier when you think of anti-order.

In OpenCL this is managed in the event_wait_list when enqueueing tasks onto the command_queue. All tasks are thrown on the queue, given their dependencies. We only care about correct results, not the order the scheduler chooses.

Just don’t forget ‘the rest’

Getting in control of the functional dependencies is a huge step in making better parallel software, but memory-management, bandwidth-maximization, cache-usages, etc. could be more important and sometimes conflict with what is described above. So when assuming an endless bus, this is solely meant to free your mind from constraints.

We will deal a bit more with this later in this series. But always remember, practice makes the master.

Happy programming!

A postface

I think programming languages can also suffer from globalization (one size fits all). We don’t need a homogeneous answer, as it has been pushed for for decades now. We need the complexity of a heterogeneous world to breed new ideas and solutions. Forcing new paradigms to fit into existing ones kills progress. We should go work on how to make it all more transparent to others, and how to deal with different answers from different people with different backgrounds.

I see OpenCL as a language to describe solutions, not the magic solution that solves everything.

If you enjoyed this post, you will most definitely like the rest of the series on programming theories. In time, they will be extended with much more examples and will be compiled in a book.

Feel free to suggest new subjects!

StreamHPC communications

Avoiding false dependencies in only two steps

Iterative programming learning path

Queues and false dependencies

Step 1: assume a very wide bus

Step 2: Draw a dependency-graph

Just don’t forget ‘the rest’

A postface

Related Posts

Targetting various architectures in OpenCL and CUDA

How to speed up Excel in 6 steps

N-Queens project from over 10 years ago

Taking on OpenCL

StreamHPC communications

Iterative programming learning path

Queues and false dependencies

Step 1: assume a very wide bus

Step 2: Draw a dependency-graph

Just don’t forget ‘the rest’

A postface

Related Posts

Targetting various architectures in OpenCL and CUDA

How to speed up Excel in 6 steps

N-Queens project from over 10 years ago

Taking on OpenCL

Discover more from StreamHPC