Biblyon the Great

This zine is dedicated to articles about the fantasy role-playing game Gods & Monsters, and other random musings.

Gods & Monsters Fantasy Role-Playing

Beyond here lie dragons

Are my dice random?

Jerry Stratton, August 7, 2006

Histogram of d20 rolls

Looks fair to me. Does it look fair to you?

Gamers are a cowardly and superstitious lot. If any superhero decided to strike fear into our hearts, they would probably take on the mantle of an unlucky die. I’ve known gamers who will smash dice after bad rolls and even bury them.

The problem with eyeballing to determine whether a die is biased is two-fold. First, true randomness doesn’t look random. As humans we like to see patterns, and true randomness will look like it has patterns. Second, we tend to remember outstanding events more than others. We don’t remember the boring rolls, just the rolls that anger or delight us.

In our recent games, I’ve been playing rather than guiding, and because I’m a typically superstitious gamer, I have a special set of dice for playing. It’s a nice looking set of honey-colored dice, and I’ve never used that set until now. The d20 in that set appeared to be rolling a lot of ones. We all noticed it. After we noticed that, I started keeping track. Having seen lots of ones, I didn’t want to have to get rid of this die!

I started keeping track of the numbers after we decided it might be biased. Part of the point of running the statistics is that outstanding numbers tend to be remembered where bland numbers are not, so starting on an outstanding number is itself a bias. Here are the numbers that the suspect d20 generated over the last five gaming nights:

  1. 7, 18, 9, 5, 17, 20, 20, 15, 10, 9, 1, 10, 13, 5, 1, 10, 8, 5, 3, 20, 11, 14, 19, 5, 9, 13, 14, 10, 1, 15, 19, 20, 10, 14
  2. 9, 13, 8, 13, 7, 17, 13, 17, 11, 17, 14, 2, 14
  3. 6, 1, 1, 14, 19, 20, 1, 13, 20, 1, 16, 1, 15, 5
  4. 12, 9, 17, 10, 11, 13, 17, 9, 3, 1, 2, 14, 8, 17, 12, 2, 11, 13, 17, 10, 4, 6, 19, 5, 11, 6, 1, 5
  5. 12, 10, 7, 19, 1, 13, 10, 5, 1, 11, 9, 20, 17, 15, 15, 12, 8, 1, 13, 1, 5, 8, 20, 1, 11, 5, 11, 20, 20, 12

Eyeballing it, I’m seeing several ones and twenties in there. It looks pretty random to me. I’d guess that the average is right about where it’s supposed to be. But how can I tell for sure that this die is or is not biased?

The common way of testing dice for bias (besides counting up the numbers or, worse, relying on a memory of outstanding events) is the chi-square test. There is some free statistical software from the R Project for statistical computing that will help us perform this test without having to do all of the math ourselves. R is basically a scripting language specifically for statistics. It is hellaciously complicated, but for our purposes we can get by with a few lines.

What I need first is the frequency of each result. That is, 1 appeared fifteen times, 2 appeared three times, 3 appeared twice, etc. Download R and paste in the following lines:

  • session1 = c(7, 18, 9, 5, 17, 20, 20, 15, 10, 9, 1, 10, 13, 5, 1, 10, 8, 5, 3, 20, 11, 14, 19, 5, 9, 13, 14, 10, 1, 15, 19, 20, 10, 14)
  • session2 = c(9, 13, 8, 13, 7, 17, 13, 17, 11, 17, 14, 2, 14)
  • session3 = c(6, 1, 1, 14, 19, 20, 1, 13, 20, 1, 16, 1, 15, 5)
  • session4 = c(12, 9, 17, 10, 11, 13, 17, 9, 3, 1, 2, 14, 8, 17, 12, 2, 11, 13, 17, 10, 4, 6, 19, 5, 11, 6, 1, 5)
  • session5 = c(12, 10, 7, 19, 1, 13, 10, 5, 1, 11, 9, 20, 17, 15, 15, 12, 8, 1, 13, 1, 5, 8, 20, 1, 11, 5, 11, 20, 20, 12)
  • rolls = c(session1, session2, session3, session4, session5)
  • frequency = table(rolls)
  • frequency
rolls
1234567891011121314151617181920
153211033579851075191510

Looks like fifteen ones and ten twenties. As long as the ones outnumber the twenties, I’m sure the die isn’t biased!

The “c” concatenates, or combines, multiple items. First, I’m concatenating each session’s rolls together into a list for each session, and then I’m combining each list into a bigger list of all rolls, called “rolls”. I could just as well have put all of the individual rolls into “rolls” directly, but keeping your data into appropriate bins sometimes can help you avoid mistakes.

Now that I have the frequency of each result, I need the probability that each result is likely to show up, assuming an unbiased die. For a 20-sided die, that probability is one in twenty for each of the twenty possible results.

  • probability = rep(1/20, 20)

The “rep” repeats an item a specific number of times. Here, I am repeating the first number (one twentieth) a number of times equal to the second number (that is, twenty times). Now that I have the frequencies and the probabilities, I can run the chi-square test:

[toggle code]

  • chisq.test(frequency, p=probability)
  • Chi-squared test for given probabilities
  • data: frequency
    • X-squared = 46.2101, df = 19, p-value = 0.0004628

The “df” or “degrees of freedom” is 19. It is simply the number of possible results (20) minus one: a die that only had one possible result would have zero degrees of freedom.

When the chi-square is compared to the degrees of freedom, it generates a “p-value” of .0004628. What does this mean? This is the chance that a truly random die will produce results even more extreme than the results we’re seeing. We would expect random dice to produce more extreme results about .04628% of the time. That’s not very often! This die is looking pretty biased.

In general, if the p-value is greater than .1, there is no proof that the die is biased: this means that you’d be likely to get even more extreme results more than one out of every ten tries. The smaller the p-value, the more likely it is that the die is biased. If the p-value goes below .05, this is good evidence that the die is biased, and if the p-value goes below .01, this is very strong evidence that the die is biased.

If the test realizes that there isn’t enough data to make the test, it will also give a warning:

  • Warning message:
  • Chi-squared approximation may be incorrect in: chisq.test(frequency, p = probability)

It is important to continue tracking the die’s rolls until you have enough data to make a real guess. At the very least, you should keep tracking until

  1. each number comes up at least once;
  2. most numbers come up at least five times.

How many of the results came up five or more times, compared to how many did not?

  • enough = ifelse(frequency>=5, "enough", "not enough")
  • table(enough)
enough
enoughnot enough
137

The ifelse function goes through the first list (frequency, in this case) and applies that test to each item in that list. In this case, it tests to see if the item is greater than or equal to 5. If it is, that item gets “enough” at its position in the new list. If the item does not match the test (does not have a frequency greater than or equal to five) it gets “not enough” at its position in the new list.

So thirteen numbers occur five or more times. I want to see at least eleven numbers (i.e., more than half of them) occur five or more times before I start trusting these results. Really, I’d like to see all of them occur five or more times, but if the die is truly biased that’s likely to take forever.

Whatever criteria you choose, you should choose it or them before starting the test rolls. The absolute minimum number of rolls will be five times the number of sides on the die; otherwise, you won’t satisfy the second criteria above on an even remotely random die. Ten times the number of sides ought to be more than enough. I’m going for seven times the number of sides, which means I need 140 rolls. At 119 rolls currently, I’m very close.

Normally, it would be back to the game for me. Everyone knows that rolls made for game purposes are completely different from rolls made outside of the game! But the numbers coming out of these calculations indicate such a skew that I don’t think it’s possible for the next 21 rolls to make any difference. I’m going to make an exception and roll 200 times on my desktop.

Retest

Last of 200 d20 rolls

The last of 200 rolls. Could they have been any more bland?

Once you come up with one set of results, you’ll want to verify them. The scientific method, after all, is all about predictability. If this die is biased, it will continue to be biased outside of the game. I’ll roll 200 times—ten times per side—to see if this result remains. At the same time, I’ll roll another die from my grab-bag, so as to compare the two. I picked out an orange clear die. For all 200 of these rolls, I’ve literally rolled both the honey die and the orange die at the same time for 200 throws. The things I will do for this blog!

Let’s look at the orange die first:

  • orange = c(4, 5, 19, 4, 6, 16, 20, 4, 6, 5, 13, 3, 1, 13, 2, 17, 9, 14, 10, 8, 18, 10, 4, 9, 9, 4, 20, 7, 16, 13, 12, 4, 17, 6, 11, 3, 19, 9, 19, 2, 6, 19, 18, 17, 20, 9, 4, 6, 6, 16, 5, 4, 14, 13, 8, 16, 18, 17, 5, 18, 9, 11, 14, 8, 18, 8, 16, 8, 6, 20, 20, 9, 20, 3, 1, 13, 8, 10, 2, 4, 2, 15, 3, 4, 14, 7, 6, 16, 18, 7, 5, 1, 10, 5, 4, 19, 18, 9, 6, 7, 2, 3, 15, 4, 3, 3, 10, 12, 6, 7, 1, 5, 20, 8, 8, 2, 7, 15, 2, 8, 18, 16, 7, 6, 19, 14, 10, 4, 9, 1, 17, 13, 6, 17, 4, 7, 7, 20, 8, 4, 20, 19, 17, 13, 7, 17, 18, 3, 1, 9, 4, 7, 9, 16, 6, 3, 1, 14, 10, 2, 8, 10, 14, 5, 3, 11, 12, 16, 19, 4, 7, 18, 16, 6, 6, 12, 19, 15, 13, 16, 18, 11, 14, 17, 19, 6, 16, 18, 18, 10, 6, 11, 18, 10, 20, 12, 15, 2, 2, 9)
  • orangefrequency = table(orange)
  • orangefrequency
orange
1234567891011121314151617181920
71010178171211121055885129141010

That certainly looks pretty random, and the chi-square test agrees.

[toggle code]

  • probability = rep(1/20, 20)
  • chisq.test(orangefrequency, p=probability)
  • Chi-squared test for given probabilities
  • data: orangefrequency<br />
    • X-squared = 22.4, df = 19, p-value = 0.2648

Over a quarter of the time we would expect to get results more extreme than these.

Now, what about the honey d20?

  • honey = c(8, 19, 15, 6, 3, 15, 4, 1, 14, 17, 16, 8, 3, 15, 15, 3, 10, 5, 1, 8, 13, 8, 1, 16, 13, 20, 4, 1, 11, 10, 6, 18, 10, 1, 6, 4, 11, 2, 19, 9, 18, 13, 20, 19, 6, 13, 20, 4, 5, 19, 3, 5, 15, 1, 18, 20, 7, 8, 16, 11, 4, 18, 7, 9, 17, 15, 7, 5, 19, 11, 18, 7, 18, 4, 18, 18, 12, 12, 2, 9, 1, 5, 15, 15, 10, 17, 3, 2, 14, 3, 1, 13, 7, 5, 1, 12, 8, 16, 8, 10, 11, 18, 9, 16, 6, 11, 4, 11, 16, 10, 5, 9, 10, 10, 10, 17, 15, 10, 1, 19, 20, 13, 16, 13, 11, 5, 9, 4, 7, 19, 7, 13, 7, 14, 20, 19, 17, 3, 2, 10, 13, 10, 19, 18, 8, 11, 12, 7, 2, 11, 5, 8, 11, 5, 1, 20, 9, 10, 9, 11, 20, 15, 14, 4, 2, 15, 17, 13, 5, 4, 1, 7, 5, 9, 7, 17, 15, 19, 20, 19, 15, 20, 1, 4, 1, 5, 9, 18, 18, 9, 13, 1, 4, 20, 2, 1, 18, 3, 16, 10)
  • honeyfrequency = table(honey)
  • honeyfrequency
honey
1234567891011121314151617181920
16781213511911141241141387131111

[toggle code]

  • probability = rep(1/20, 20)
  • chisq.test(honeyfrequency, p=probability)
  • Chi-squared test for given probabilities
  • data: honeyfrequency<br />
    • X-squared = 21.6, df = 19, p-value = 0.3046

Even with that run of ones towards the end that resulted in there being yet again more ones than any other result, this is a perfectly reasonable set of rolls: 30% of the time we would expect more extreme results than in this test set of 200. That’s a far cry from the previous results for this die. What does this mean?

Perhaps this confirms the hypothesis that dice rolled in the game are different than dice rolled out of the game. Or, perhaps I’m psychic—I know you’re thinking that. Or it could be that I just happened upon a biased-looking set of rolls while writing this article. It would be just my luck.

Because the first results are not obviously repeatable, I should run some more tests. My arm is tired, however, so I’m not going to. I may do it later on—or ask someone else to do it—after I send the Adventure Guide these results.

Charting the results

Line chart of d20 rolls

You can also create line charts of your results.

Above, we showed the frequencies of each roll with table(rolls), or in our case just “frequency” since we did frequency=table(rolls). But we can also plot the results on a chart.

The simplest plot is created using:

  • plot(frequency)

This will create a chart very much like the lead-in histogram in this article. The full line for creating that chart is:

  • plot(frequency, type="h", col="light green", lwd=5, col.axis="red", col.lab="dark green", xlab="d20 rolls")

This gives us a light green color for the data, 5-pixel line widths, a red axis on the left, dark green labels, and a bottom label of “d20 rolls”.

In this latter case I specify a type of “h”, or histogram. That generally makes the most sense for charting dice results. But you might also use:

  • plot(frequency, type="l")

This changes to a line chart. Other types include “p” for points, “b” or “o” for both lines and points, and “s” or “S” for stairs.

Statistics

You can also get the minimum and maximum frequency with range(frequency) or the minimum and maximum roll with range(rolls). And sum(frequency) will give you the total number of rolls, because each entry in frequency is the number of times a result came up. Add them together, and that’s the total number of rolls.

You can get the average using mean(frequency) or mean(rolls). If you want the median, you can use median(frequency) or median(rolls). The median is the “middle” number, if you lay out all of the numbers in a row from lowest to highest. If you’re familiar with quantiles, you can use quantile(frequency) or quantile(rolls).

In each case, running it on “frequency” gets you the stats for the number of times numbers come up, and running it on “rolls” gets you the stats for the numbers that were rolled. For example, on the data above, median(rolls) tells me that the median roll was 11; that (or 10) is what we would expect. The mean for the above rolls is 10.47; that’s very close to the 10.5 we’re expecting. If the in-game rolls are biased, it doesn’t appear to be affecting the median or average.

Arbitrary categories

You can use cut to divide the rolls (or frequencies) into arbitrary categories. For example, if you want to count up the number of good rolls and bad rolls, you first have to decide what makes a good roll and a bad roll. Let’s say that rolling five or less is a good roll, and rolling 16 or more is a bad roll.

  • cutoffs = c(0, 5, 15, 20)
  • qualities = c("Good", "Boring", "Bad")
  • rollquality = cut(rolls, cutoffs, qualities)
  • table(rollquality)
rollquality
GoodBoringBad
316226

The cutoffs are basically a range. There will always be one more cutoff number than cutoff name. Here, we’re saying that the first category is for numbers greater than zero and up to five. The second category is for numbers greater than five and up to 15. The third category is for numbers greater than 15 and up to 20.

This gives a result of 31 good rolls (1, 2, 3, 4, and 5) and 26 bad rolls (16, 17, 18, 19, 20). There are 62 rolls in between, which we’re calling “boring” rolls.

If you wanted to categorize the rolls more precisely, you might use:

  • cutoffs = c(0, 3, 7, 13, 17, 20)
  • qualities = c("Great", "Good", "Boring", "Bad", "Awful")
  • rollquality = cut(rolls, cutoffs, qualities)
  • table(rollquality)
rollquality
GreatGoodBoringBadAwful
2017442216

If you want to verify those categories, or if you want to see the rolls in order for some other reason, you can type:

  • sort(rolls)
1111111111111111222
19334555555555566677
3778888899999991010101010
55101010101111111111111111121212121213
73131313131313131313141414141414141515
91151515161717171717171717171819191919
1091920202020202020202020

You can see 16 rolls that are 18, 19, or 20 (awful rolls) and 20 rolls that are 1, 2, or 3 (great rolls).

You can use sort on tables as well:

  • sort(frequency)
rolls
4161832678121519914111017513201
111233355557789910101015

This shows which rolls were rolled least often and which were rolled most often.

Examining computer-generated rolls

We can have R generate some random rolls for us.

  • rolls = sample(20, 200, replace=TRUE)
  • frequency = table(rolls)
frequency
1234567891011121314151617181920
6812913128712111311810910810716

First, generate 200 rolls from 1 to 20. We have to specify replace=TRUE because by default results taken from a sample are not replaced. That is, it is assuming something like a deck of cards rather than a die. When you pull a card from a deck of cards, you won’t see that card again on subsequent pulls. With dice, however, you will see that number again on subsequent rolls.

When you look at the frequency, make sure that each number from 1 to 20 shows up at least once. In the list above, each number shows up at least once so I’m good to go. Since your list will be different from mine, you’ll need to double-check that all 20 entries show up before going to the next step.

  • probabilities = rep(1/20, 20)
  • chisq.test(frequency, p=probabilities)

My p-value is 0.8856, making this computer-generated die a non-biased d20.

Always remember that the chi-square test is all about distribution. If the results are somehow related, the chi-square test is simply invalid: it requires that the results--in our case, the die rolls--are unrelated.

Faked d6 rolls histogram

These d6 were created completely non-randomly, ensuring that each number appears exactly the same number of times.

An example of a completely non-random set of results with “random” distribution would be if our computer die roller remembered the last roll, and always switched up to the next option, wrapping around to the bottom once it reaches the highest result. This hypothetical computer-generated d6 would give us 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, etc. Roll it 30 times and you will get each number five times. We can do this in R:

  • rolls = c(rep(c(1:6), 5))
  • frequency = table(rolls)
  • plot(frequency)
  • probabilities = rep(1/6, 6)
  • chisq.test(frequency, p=probabilities)

Giving us a p-value of 1.0! Which is obviously true: 100% of the time, a truly random die will produce a result more extreme than this. We know these results are non-random, because we know that each “roll” is related to the previous one. If a computer program is doing something behind the scenes to ensure an even distribution of numbers, the chi-square test will not necessarily show us this. Fortunately, this kind of behind-the-scenes manipulation is not something we have to worry about with physical dice, unless Einstein was wrong and God is teasing us.

Plotting a 3d6 roll

As long as we’re playing around, we can look at the “normal distribution” that shows up in early gaming manuals, usually when discussing 3d6 rolls.

I’m far from an expert in R, having learned it specifically for this article, but the easiest way that I could find to generate 3d6 rolls in R is to generate three lists of d6 rolls and add them together.

  • samplesize = 50
  • roll1 = sample(6, samplesize, replace=TRUE)
  • roll2 = sample(6, samplesize, replace=TRUE)
  • roll3 = sample(6, samplesize, replace=TRUE)
  • rolls = roll1+roll2+roll3
  • frequency = table(rolls)
  • plot(frequency, type="l", xlab="3d6 rolls")

Change the samplesize from 50 to 500, 5000, 50000, and even 500000, and watch as the line chart smooths out into the “normal distribution” or “bell curve”. As you can see, the difference between 500 rolls and 500,000 is obvious. The more rolls you make, the closer you get to the normal distribution curve. You may want to have a grad student do the actual 500,000 rolls, however.

Bell Curves for 3d6

With a carpal tunnel-inducing 500,000 rolls, the distribution looks like the standard normal distribution curve.

  1. <- Spotlight: Evil
  2. Facilities ->