In the first part of our two-part blog series, we have discussed how parallel computing applications can best use pseudo-random number generators (PRNGs) so as to benefit from parallel computing speedups, without negatively impacting the statistical properties of the random numbers generated. We have argued that index-based PRNGs (e.g., from the Random123 library), which do not maintain any state but instead take an index and a key as input and return the random number corresponding to the index in its random output sequence, provide numerous benefits in parallel programming. This week, we discuss both an application of index-based PRNGs and another important aspect of PRNGs in parallel environments: that of reproducing the same output among different parallel implementations or among a parallel and a serial implementation. We focus on the latter and consider reproducibility for the purpose of verification, which – as we have stated last week – is an important matter for our customers.
Reproducibility may be simple if random numbers are only used in the initialization phase of the application. In this case, it may be sensible to record the set of random numbers generated in the serial code, write it to a file and use this file as input to a table-based approach for random-number generation in the parallel code (as discussed in Part 1 of our blog). Indeed, this is a convenient approach for both us and our customers as we do not have to deal with PRNG implementation internals and our customer can independently verify the correctness of the parallel implementation. We have used this for several of our projects.
More complex scenarios such as stochastic simulations and Monte Carlo applications require a continuous, parallel generation of random numbers. In this case, reproducibility at first seems challenging considering the concurrent and thus unpredictable order in which parallel code may invoke PRNGs. Indeed, it may be impossible to achieve when using traditional, seed-based PRNGs. This is because we may need to access entries in the PRNG’s output sequence in an arbitrary order rather than sequentially. Essentially, we need to define a one-to-one mapping between random numbers generated in the serial code and those used in the parallel version and then be able to ask the PRNG for the first entry, the second entry, and so on, in both implementations. The random-access capability is exactly what index-based PRNGs provide. The main challenge lies in the definition of the mapping.
We may consider two options for defining the one-to-one mapping between random numbers generated in a serial code and those created in a parallel implementation. The summary of our findings is this:
- Serial-to-parallel mapping: This amounts to an emulation of the serial random number generation in the parallel code. It generally requires minimal or no changes in the serial, original code, but may be increasingly difficult or impossible to define as the complexity of the code grows.
- Parallel-to-serial mapping: This emulates the parallel random number generation in the serial code. It is usually easier to define than the serial-to-parallel mapping but requires deeper changes in the serial code.
We explain below what we mean by that.
Consider a simple scenario where we have some serial code that generates a new set of random numbers in each iteration of a loop using a traditional seed-based PRNG. A parallel implementation may operate by executing each iteration via a single work item. Using traditional PRNGs, we may equip each work item with its own PRNG seed.
- Using a serial-to-parallel mapping, we may then record the PRNG state at each iteration of the loop in the serial code and initialize the PRNG state in each work item of the parallel code similarly to a table-based approach – this does not require any changes in the serial code (other than temporarily for recording the random numbers). Alternatively, we may replace the PRNG in the serial code with an index-based one and use a single, static counter for the PRNG, which we increment with each invocation of the PRNG – this is a minimal change in the serial code. We reflect the index-based PRNG in the parallel code and initialize each work item’s counter with w·n, where w is the work item index and n denotes the size of the set of random numbers created per loop. In both cases, the serial and the parallel code should produce identical outputs.
- For a parallel-to-serial mapping, it is easier to assume that the parallel implementation uses index-based PRNGs. Let’s assume that each work item uses indices of the form w·n+j for the j-th item in the set of random numbers generated. Then the serial code is adapted to use the same PRNG and computes indices as WI(i)·n+j, where WI(i) returns the work item index corresponding to the i-th iteration of the loop. Again, the serial and parallel implementations should then produce identical outputs.
A proper mapping becomes increasingly difficult to define when the code complexity grows. However, it is usually easier to map the parallel random number generation to the serial code. Indeed, consider the case where some random numbers are generated in a data-dependent manner, i.e., only if some condition on the input data is fulfilled. Then it is impossible to give a predefined mapping from the serial to the parallel code. We therefore prefer the serial-to-parallel mapping whenever it is easy to define (as we don’t risk or minimize the risk of introducing bugs in the original code by changing the PRNG generation) but resort to the parallel-to-serial mapping for more difficult cases.