When CUDA kept having a dominance over OpenCL, AMD introduced HIP – a programming language that closely resembles CUDA. Now it doesn’t take months to port code to AMD hardware, but more and more CUDA-software converts to HIP without problems. The real large and complex code-bases only take a few weeks max, where we found that solved problems also made the CUDA-code run faster.

The only problem is that CUDA-libraries need to have their HIP-equivalent to be able to port all CUDA-software.

Here is where we come in. We helped AMD make a high-performance Pseudo Random Generator (PRNG) Library, called rocRAND. Random number generation is important in many fields, from finance (Monte Carlo simulations) to Cryptographics, and from procedural generation in games to providing white noise. For some applications it’s enough to have some data, but for large simulations the PRNG is the limiting factor.

The library provides the most used PRNGs and QRNG (Quasi RNG) based on what we found on Github. Several you can find in cuRAND:

XORWOW
MRG32k3a
Mersenne Twister for Graphic Processors (MTGP32)
Philox (4×32, 10 rounds)
Sobol32

If you’re familiar with PRNGs, you see that from the most important families of generators there is an option. Now it’s easy to port software that uses cuRAND. But that’s not all.

rocRAND is faster than cuRAND in most cases

rocRAND works on NVidia hardware too. And in most cases it’s faster than cuRAND.

Here we compare rocRAND for normal-floats on the AMD Radeon Nano, rocRAND on the GTX 1080 and cuRAND on the GTX 1080. The professional grade GPUs, like the AMD MI25 are much faster – but this is just to show that the library written for AMD GPUs is faster than NVidia’s own library.

Benchmarks

This is before the optimization-phase on AMD R6 Nano and Nvidia GTX1080 – rocRAND on par with cuRAND.

This is after the optimizations, where AMD GPUs get the upper hand due to higher bandwidth memory:

As you can see, it’s preferable to also use the library for NVidia-only projects.

Doing your own benchmarks

On the Github of rocRAND you find instructions to benchmark the library on your own hardware. Do know that the library has been tuned for all recent AMD GPUs and Nvidia GTX GPUs, not Tesla GPUs. Also the code does not work on CPUs or Intel GPUs.

More on random numbers on our blog

Want to know more about Random numbers? We wrote about the subject before.

https://streamhpc.com/blog/2016-03-18/random-numbers-in-parallel-computing-generation-and-reproducibility-part-1/

https://streamhpc.com/blog/2016-08-17/random-numbers-parallel-computing-generation-reproducibility-part-2/

https://streamhpc.com/blog/2016-08-18/porting-code-that-uses-random-numbers/

Need a tailored RNG?

When you know the exact restrictions you have for your project, we can:

further tune the library to be even faster, or
add special characteristics (i.e. less cyclic), or
port other PRNGs to the GPU.

We did not put these hacks in the official code, as we then could not guarantee a correct output for generic goals. In case you need a RND tailored for your specific needs, we are the team that can build it.

Get in touch with the GPU Library Specialists today.

4 thoughts on “Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks”

Nikolay Polyarniy 30 November 2017

Nice! Why the code doesn’t work on CPUs and Intel GPUs? Because of wavefronts/warps?

P.S. there is mistype in “AMD GTX GPUs”
- StreamHPC 30 November 2017
  
  It’s written for HIP that only works with GPUs supporting ROCm (AMD) and CUDA (Nvidia). Also, for CPUs the design would be slightly different to get maximum performance.
  
  Thanks, fixed it.
Timur Magomedov 2 December 2017

Why is it written on HIP not OpenCL? I thought HIP is for porting CUDA applications to newest AMD GPUs but why was it used for new software? Does HIP have any advantages over OpenCL?
- StreamHPC 12 December 2017
  
  HIP and OpenCL both have different goals. For this project the goal was to support (scientific) code that is written in CUDA and should be quickly ported to AMD GPUs. AMD’s Hipify-tools is currently being updated to support this new rocRAND-library.
  A good library for OpenCL-code is https://www.thesalmons.org/john/random123/

Comments are closed.

StreamHPC communications

Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks

rocRAND is faster than cuRAND in most cases

Benchmarks

Doing your own benchmarks

More on random numbers on our blog

Need a tailored RNG?

Related Posts

Random Numbers in Parallel Computing: Generation and Reproducibility (Part 2)

Random Numbers in Parallel Computing: Generation and Reproducibility (Part 1)

OpenCL alternatives for CUDA Linear Algebra Libraries

Install (Intel) Altera Quartus 16.0.2 OpenCL on Ubuntu 14.04 Linux

4 thoughts on “Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks”

StreamHPC communications

rocRAND is faster than cuRAND in most cases

Benchmarks

Doing your own benchmarks

More on random numbers on our blog

Need a tailored RNG?

Related Posts

Random Numbers in Parallel Computing: Generation and Reproducibility (Part 2)

Random Numbers in Parallel Computing: Generation and Reproducibility (Part 1)

OpenCL alternatives for CUDA Linear Algebra Libraries

Install (Intel) Altera Quartus 16.0.2 OpenCL on Ubuntu 14.04 Linux

4 thoughts on “Learn about AMD’s PRNG library we developed: rocRAND – includes benchmarks”

Discover more from StreamHPC