Tom Smeding's "blog"


Generating randomness on a GPU

It seems the disappearance of Nathan Reed's blog post was temporary, and it's up again. That means the below is an inferior retelling of his post; please see there.

When doing simulations on a GPU, e.g. for raytracing, one very quickly needs lots and lots of random numbers. There is a huge number of fine random number generators, but on a GPU you often have a very particular problem: you need lots of random numbers in a large array, generated in parallel. This is not something that conventional PRNG's like the xorshift family are suitable for, since these are inherently sequential: to generate the next random number, it needs the state produced when generating the previous one.

You may try to solve this by initialising the RNG many times, each time with a different seed. However, RNG's are not built for this: the random sequences generated for all those seeds will be different, probably, but they will still be quite correlated. This is because the random generators are normally constructed to go deep, not wide.

This is an insight I learned from reading a post on Nathan Reed's blog, which has now unfortunately disappeared. (In fact, its disappearance is the very reason for writing this post.) However, there is a copy on the Internet Archive here! He astutely realises that RNG's are made to go deep, meaning that the random sequence generated for any particular seed will have good properties when seen on its own, and not to go wide: sequence values at similar depth for different seeds are not very independent. However, hash functions are made to go wide: the outputs for different "seeds" are meant to be competely different and independent, while less emphasis is sometimes laid on the quality of the behaviour when going deep (i.e. when iterating the hash).

This means that when you need many random values in an array, in parallel, what you should do is take a hash function and apply it to the integers 1..n for whatever n you need. The results will be nice. Then, since hash functions are generally more expensive to compute than PRNG's, if you want more of those arrays, iterate mapping the PRNG over that array.

Nathan further recommends the following hash function, which is not a cryptographic hash function but is nothing short of magical when applied for the right purpose: the Wang hash. The version on Nathan's blog is somewhat outdated; there is an improved one to be found on the (also defunct) website of Thomas Wang himself (Internet Archive copy):

uint wang_hash(uint key) {
    // Taken from Thomas Wang's website
    key = ~key + (key << 15);   // key = (key << 15) - key - 1;
    key = key ^ (key >>> 12);
    key = key + (key << 2);
    key = key ^ (key >>> 4);
    key = key * 2057;   // key = key + (key << 3) + (key << 11);
    key = key ^ (key >>> 16);
    return key;

This is a small, fast hash function that works well when applied in the above scheme. As a PRNG, he suggests a LCG or an Xorshift generator; I think it's safest to use Xorshift. An example implementation is here:

uint rand_xorshift() {
    // Taken from Nathan Reed's blog post; originally from Marsaglia
    rng_state ^= (rng_state << 13);
    rng_state ^= (rng_state >> 17);
    rng_state ^= (rng_state << 5);
    return rng_state;

For neat images and a visual indication that this setup does generate nice random numbers, see Nathan's post linked above.