ICS 46 Spring 2022
Notes and Examples: Randomness


What is randomness and why do we want it?

The word random and its cousins, like randomness, are thrown around quite a lot in day-to-day speech, but, from a computing perspective, randomness actually has a somewhat more precise definition than we may normally be accustomed to using. We say that a sequence of values is random if we cannot deduce any sort of pattern that describes it. Statistics has much to say about what constitutes randomness — for example, how varied do the values of a sequence have to be? — and the deep specifics are beyond the scope of our interest at the moment, but what's important is that we can obtain a sequence of values that is varied and unpredictable when we need it.

You might wonder why you would ever want something like this in the first place. What good is a sequence of values whose values you can't predict? It turns out that such a sequence can solve a diverse set of interesting and important problems. A few examples follow.

The need is clear. The question is what kind of random sequences we might want when we're solving real problems and how we obtain them in a C++ program. A fairly recent version of the C++ standard, C++11, introduced a new library for generating random sequences of values. The new library, being superior in (more or less) every way to its ancient predecessor from the Standard C Library, is worth our time to learn a little bit about. But, first, we need to understand more about how computers generate randomness. Like a lot of things, it helps to understand the problem before you investigate the solution.


Entropy

First things first: If we want to generate a sequence of random values, where do we get it from? It's actually not as simple as it sounds. If you think about the programs you've written, you may have noticed that they're mainly deterministic, which is to say that they always do the same thing given the same inputs in the same situation. Algorithms tend to be this way; we tell a computer exactly what we want done, and exactly how we want it done, and the computer does it that way. But when we want varied behavior, that's a tougher nut to crack, unless we have the right tools.

What we need is a source of entropy: a sequence of bits that is unpredictable — that is, if we picked any bit in that sequence, it would have an equal probability of being a 0 or a 1, and even if we saw a huge number of those bits, it would give us no way to predict what the next bit would be. This is actually easier said than done. Where do we get this magical source of entropy from? The answer, in practice, is varied. Some operating systems gather entropy by observing aspects of their internal operation — mouse movements, network traffic, hard drive movement, and so on — that would be difficult to predict. Some computer hardware gathers it by observing other ambient factors, like tracking small fluctuations in temperature or other physical effects over time.

The problem, though, is that our sources of entropy are limited. In a given space of time, there are only so many operations that our operating systems can observe, and there are only so many meaningful measurements of physical effects that can be made. So, unfortunately, our primary sources of entropy might not be enough to supply us with the randomness we need. If we're running a high-powered simulation that requires many megabits of randomness every second, we may run out of entropy, which means we either have to run our simulation more slowly, or we have to be more clever about how we obtain the randomness we're looking for.


Pseudorandomness

There's an important detail worth considering, which sounds philosophical but is actually vital. Do we need a sequence that is actually random, or will the appearance of randomness be enough? If I chose a ridiculously long sequence of n bits and you had no reasonable way to use the first n − 1 bits to guess what the nth bit would be, would you care whether or not I used a deterministic algorithm to produce it? In practice, the answer is generally "no." What we want is the ability to generate values so that they appear to be random; we want them to be varied, and we want it to be essentially impossible to guess what the next one will be, if all you've seen are the values that have been generated so far. Whether they're coming directly from a source of entropy or from a straightforward algorithm is irrelevant for most uses.

You might wonder, though, how a deterministic algorithm — one that always yields the same outputs given the same inputs — could ever generate anything but a predictable sequence of results. The answer lies in how we choose the algorithm, and also in how we start the sequence.

Suppose the first input to our algorithm comes from a legitimate source of entropy, such as the ones described in the previous section; we'll call this value the seed. Now suppose that our algorithm is designed in such a way that it will generate a sequence that has the following properties:

Such algorithms are known as pseudorandom generators, because while their output isn't random, it nonetheless passes statistical tests of randomness. As long as you seed them with a legitimate source of entropy, they can generate fairly long sequences of random values without the sequence repeating, and since you couldn't easily guess the seed (assuming your entropy source was legitimate), you wouldn't be able to guess any of the others, either. How to design such an algorithm is well beyond the scope of this course, but, fortunately, we can stand on the shoulders of giants and use well-known algorithms that have already been designed for this purpose, such as Mersenne Twister or a Linear Congruential method.


Distributions

So far, we've developed a nice set of ideas for generating a long pseudorandom sequence of bits:

  1. Using a (quite possibly limited) source of entropy, seed a pseudorandom generator.
  2. Use the pseudorandom generator to generate the sequence of bits we need.

There is one more problem we need to solve, though. In practice, we generally don't want just a long sequence of bits; instead, we want a sequence of values that has additional properties.

So when we're solving actual problems that involve randomness, what we really want is a distribution of random values that satisfies our actual needs. And that's actually tricky to get right; taking a sequence of random bits and turning it into a sequence of the kinds of values described above, if done improperly, will yield results that are biased or just plain incorrect.

Ideally, we'd have a function we could call for this purpose, one that would take a sequence of random bits and turn it into the distribution of random values that we actually need.


Putting it all together in C++11 and later

All of these concepts that we've been talking about describe the various parts of the <random> library that was added in the C++11 standard. Once you know how these pieces fit together, there are only a few minor details left to get right. In C++11:

So, if you want to generate a pseudorandom sequence of rolls of a single six-sided die, you could do something like this.

#include <random>

...

// Creates an object that lets us tap into our source of entropy.  Note
// that we only want to do this once and then use it sparingly.  Once created,
// the device can be called like a function.
std::random_device device;

// Now that we want to generate a pseudorandom sequence of bits, we seed a
// random engine using our random_device.  While we wouldn't want to use the
// random_device over and over again, we can use it to seed a pseudorandom
// generator, and then let the algorithm's properties of seeming randomness
// take over from there.
std::default_random_engine engine{device()};

// Finally, we need a distribution, so we can specify what kinds of values we
// actually want.  A uniform_int_distribution is one that generates integer
// values between a given minimum and maximum (inclusive), so, for example,
// the one below generates values between 1 and 6 (inclusive), with each of
// those possible values being equally likely.
std::uniform_int_distribution<int> distribution{1, 6};

// Now that we have all of the pieces set up, we're ready to generate our
// sequence of die rolls.  Notice that the distribution can also be called
// like a function whenever we want our next value, and that we pass the
// engine as a parameter.  When the distribution needs more pseudorandom
// bits, it asks the engine for them, so we don't have to worry about that
// part of it; we simply say "Give us numbers between 1 and 6" and we'll
// get a non-biased, uniformly-distributed sequence of numbers.
for (int i = 0; i < 1000; ++i)
{
    std::cout << distribution(engine) << " ";
}

std::cout << std::endl;

Remember what's really going on here; this isn't something you just want to copy and paste without understanding what it does. When you're solving problems that involve randomness, be sure you're clear on what each of these parts actually does. If, for example, you did all of this — create a random device, then an engine, then a distribution — every time you generated a new value, then you'd essentially be using the random device every time, and you'd quickly run out of meaningful entropy.


Additional, in-depth information

For a fuller explanation of the C++11 library for generating random sequences of values, you can also check out the original paper that describes it: