Software Testing Blog

Spot the defect: randomness

Today on Ask The Bug Guys I’m going to turn things around a bit and ask you to find and explain the bug.

Suppose we want to generate a series of pseudo-random integers between 1 and 6 (inclusive) to simulate rolling a fair die. In the class library there is the useful `random.Next(min, max)` method which does precisely that. (You have to remember to call it as `random.Next(1, 7)` because the lower bound is inclusive and the upper bound is exclusive.) But I can do way better than that. Here is my super-awesome implementation:

```using System;
static class RandomExtensions
{
public static int RollDie(this Random random)
{
double d1 = random.NextDouble(); // Always less than 1.0
double d2 = d1 * 6;              // Always less than 6.0.
string s1 = d2.ToString();
char c = s1[0];           // '0' through '5'
string s2 = c.ToString(); // "0" through "5"
int i1 = int.Parse(s2);   // 0 through 5
int i2 = i1 + 1;          // 1 through 6
return i2;
}
}
```

I know what you are thinking: wow, Eric, just… wow, that is some insanely great code right there. Such an elegant and powerful solution with no room for improvement.

But I digress. I’m not at all concerned about correctness because this code is super awesome, but I am a bit concerned that this might be a biased die, so I’m going to do some tests. I’ll roll the die six million times, and I should get about a million of each number, right?

```class Program
{
static void Main()
{
var random = new Random();
int[] counts = new int[7]; // 0 through 6
for (int i = 0; i < 6000000; i += 1)
counts[random.RollDie()] += 1;
foreach (int count in counts)
Console.WriteLine(count);
}
}
```

The expected behavior of this program is to print zero followed by six numbers each within a percent or so of 1000000. Perhaps it is needless to say: that is not its actual behavior.

There is a serious bug — other than the sheer ludicrousness of not using `Next(min, max)` like a sensible person — in this implementation somewhere. Can you spot the defect without running the code and predict the behavior of this program? Leave your thoughts in the comments, and don’t read the comments if you don’t want spoilers.

1. Kurt Bachtold says:

I had to run it 🙁 I never would have guessed that!

2. pc says:

I’m not a C# developer and haven’t checked the documentation, but I’m guessing it has something to do with double’s toString implementation. Likely, numbers below a certain threshold will be converted in scientific notation (2.3e-2 or the like), and so taking the first character isn’t going to give the integral portion of the double.

3. Jesse McGrew says:

Could it be that int.Parse throws because c isn’t always a digit?

4. CoF says:

Last example:
``` number = .000000001; // Displays 1E-09. Console.WriteLine(number.ToString()); ```

5. Franck JEANNIN says:

The program will crash. ToString can use scientific notation like “8E-10” so the first character is not always less than 6.

6. Roman says:

In some cases Double.ToString() output will be in exponent format that is why first char will be 1.

7. Chris says:

Well, the line `string s1 = d2.ToString();` _assumes_ that you get some result like:

1.3874398742
0.2384792342
3.9475498574

But you can easily get numbers _very close_ to zero, which would output using exponent notation, for example:

1.18073749923182E-06
4.00615346757982E-05
3.5634721264073E-05

Then the rest of the code takes the first character (in the above examples, it would be “1”, “4”, “3”). In the best case scenario, this would bias your results (the above _should_ be treated like zero, but instead they incorrectly increase 1, 4, and 3)

But in reality, you’re just as likely to get results like these:

8.90661031422513E-05
9.18013975451707E-05
7.23050907590916E-05

Which would give you 8, 9, 7. The rest of the code in turn would return “random” values of 9, 10, and 8 respectively despite you explicitly asking for numbers between 1 and 6 inclusive. Naturally, this could _easily_ introduce errors later in your code. A great example is the testing code you provided. The line:

counts[random.RollDie()] += 1;

Would result in an IndexOutOfRangeException when you tried to increment the 9th index of your 6-length array.

1. Chris says:

Woops, I’ll admit it. I found this by running the code; I had skimmed over your challenge to spot it without running it! I hadn’t noticed that aspect until I reread your question after posting. Feel free to disregard my answer. 🙂

8. Stuart says:

Is this a “char.ToString() produces the ASCII value of the character” thing?

9. String formatting of binary floating-point will truncate precision by default, and may result in a value one higher than expected.

Also, there’s a bug in your accessing of the counts array, which I believe is unintentional. Should be “counts[random.RollDie() – 1]”.

1. Oops. Never mind about the counts[] comment. It’s been a long Wednesday masquerading as a Monday…

10. dlev says:

Your comments are all very convincing, but I *think* the issue would be that the `ToString()` call on the [0, 6) `double` value is a) dependent on the current culture, and therefore b) may end up generating a representation of the number in scientific notation. If so, then the numbers in [0, 1) are actually going to be distributed roughly evenly amongst 1-9, and so your distribution will be “super weird” (the technical term).

Right or wrong, looking forward to reading the follow-up.

1. True, but it’s the consequences of that super-weird distribution that cause the program to crash.

11. Mike says:

Tip1: It’s not a locale issue.

Tip2: It’s not a rounding issue.

12. Alex Godofsky says:

I’m betting `Double.ToString` doesn’t print at full precision and so will print “6” for some doubles strictly less than 6.

13. Weeble Wilson says:

I think d2.ToString() will produce strings like “8.73425e-02”, meaning that the program is very likely to fail with an index out of bounds exception when it produces a number greater than 6. Even if that were fixed, I’m also concerned there might be values of d1 very close to 1.0 that when multiplied by 6 and printed will round to “6.0”.

14. Andy Lamb says:

Is it a floating point arithmetic error resulting in d2 >= 6.0 and as a result i2 can equal 7?

15. Nick North says:

I’ll dip my toe in the water… I think the problem is with d2.ToString(). This will use scientific notation for the output if d2 is sufficiently small, meaning s1 may be something like “9.4E-07”, which will yield a result of 9.

16. My guess would be that d1 * 6 performs integer multiplication, not double multiplication, and hence would either be 0 or 6, and nothing in-between?

1. Nope. int times double converts the int to double, not the double to int. A common error is to divide one int by another and assign the result to double — 22 / 7 converted to double is 3.0, not 3.14… But that’s not the problem here.

17. Not obvious! Here’s at least one problem: Double.ToString() can return a number in scientific notation, e.g.,

(double) 1e-15).ToString() == “1E-15”

This seems behavior seems to kick in for d2 < 1E-4. So 1 of every 60,000 times, you'd be sampling something like the first significant digit of d2 (i.e., http://en.wikipedia.org/wiki/Benford%27s_law). But that's too little of a difference to be picked up in 6,000,000 samples, so perhaps there's another problem…

18. Dan says:

This line can sometimes produce 6.0 because of double precision rounding.
double d2 = d1 * 6; // Always less than 6.0.
This will cause program to crash with index out of range.

19. Nick DeVore says:

Is the bug in your loop? The documentation for Random states that when you use the parameterless Random() constructor, it uses the system clock as the seed. But the system clock doesn’t detect time differences that are less than 15 milliseconds, and when the random is accessed within a tight loop, there is the high probability that the same “random” number will be generated for a certain number of subsequent calls.

1. Mike C says:

This is true if you’re constructing several Random() objects within the same 15 milliseconds, but not if you’re only constructing one. Those several objects would have the same seed and produce the same sequence, but wouldn’t produce the same number on sequential calls any more than a Random() object constructed with a specified seed would. (It’s random, the same number on sequential calls is possible, although extremely unlikely in the case of Random.NextDouble().

2. This is the most commonly reported problem with Random, but it’s not the problem today.

20. Sam Swain says:

It will almost certainly crash in the test rig with a counts[] bounds fail:
Values below a certain (small) threshold will be returned from ToString in exponential form with the first character being the first significant digit which could be any digit (it’s random) and not 0. Thus char c = s1[0]; can generate ‘0’ through ‘9’, i2 becoming 1 through 10 pushing the index out of range.

21. Sylvarking says:

I’m guessing it has something to do with the conversion from `double` to `string`. Perhaps many numbers between 0 and 1 are rendered using exponential notation, with the end result being not enough 1s and too many of the other numbers.

22. Shawn says:

The test program will throw an out of range exception if the random number is small enough that ToString decides to format it using scientific notation.

23. Copy paste below as is in Linqpad to see the problem. Double has higher precision than int and that is the root of problem here.

void Main()
{
var random = new Random();
int[] counts = new int[7]; // 0 through 6
for (int i = 0; i < 6000000; i += 1)
try{
counts[random.RollDie()] += 1;
}catch(Exception ex)
{
RandomExtensions.PrintVars();
throw;
}
foreach (int count in counts)
Console.WriteLine(count);
}

static class RandomExtensions
{
public static int RollDie(this Random random)
{
ClearVars();
double d1 = random.NextDouble(); // Always less than 1.0
D1 = d1;
double d2 = d1 * 6; // Always less than 6.0.
D2 = d2;
string s1 = d2.ToString();
S1 = s1;
char c = s1[0]; // '0' through '5'
string s2 = c.ToString(); // "0" through "5"
S2 = s2;
int i1 = int.Parse(s2); // 0 through 5
I1 = i1;
int i2 = i1 + 1; // 1 through 6
return i2;
}
static void ClearVars(){
D1 = 0D;
D2 = 0D;
S1=string.Empty;
S2 = string.Empty;
I1 = 0;
}
public static void PrintVars(){
D1.Dump("D1");D2.Dump("D2");S1.Dump("S1");S2.Dump("S2");I1.Dump("I1");
}
public static double D1;
public static double D2;
public static string S1;
public static string S2;
public static int I1;
}

24. Doesn’t double.ToString() use scientific notation? So for instance 0.0084615 would be displayed as 8.4615e+03, meaning s1[0] = 8, then i2 = 9 — not even in the requested range!

1. …well the comments were still empty when I first loaded the page, so I’m totally gonna count this as a win. 😉

1. I had accidentally set the comments to “always require manual approval”. Fixed now!

25. You should get something like following from the snippet –

D1
1.50431878934815E-05

D2
9.02591273608893E-05

S1
9.02591273608893E-05

S2
9

I1
9

26. Kirill says:

counts[0]…

27. Douglas says:

The string representation of very small numbers is nE-m.
So s1[0] is NOT always ‘0’ to ‘5’.

1. Douglas says:

Then I thought to try using:
s1 = d2.ToString(“F”);

But that can be rounded up to 6. Using the “F9” format specifier seems to work, though I can’t prove it.

1. Douglas says:

I can prove that it will not work with “F9” as it rounds up with a simple test using the maximum value of Random.NextDouble

double v = 0.99999999999999978; // Random.NextDouble maximum
Console.WriteLine((v * 6).ToString(“F9”));

Prints 6.000000000.

28. JW says:

(That was my guess upon reading the code, but I had to run the code to get the details, alas.)

I only know this from some badly written VB.NET code I was working through recently where I was changing from using double to decimal and had to deal with some odd conversions in the financial calculations.

The issue is because doing .ToString() on a double has the nice side effect of printing the number in scientific notation if the number is close enough to 0. Such that 0.00000789 becomes 7.89E-6 when you do .ToString() on it. This causes the value to come back outside the range of 1-6 such that 7, 8 and 9 are value values as well. Trying to update these indexes in the array would cause an index out of bounds for the array exception.

30. tore says:

d2.ToString(); could format with an exponent expression I assume.
So 0.00000888 -> “888e-5” or something similar..

31. random.RollDie() starts at 1 not 0 which I suppose is fine since you declared the array to be of size 7. The other thing.. I could not figure out without running the code =/

32. Kirill says:

Aha, is it again related with floating point arithmetic and double d2 = d1 * 6; // Always less than 6.0. <- is not true and could gives you 6

33. Mike C says:

I’d guess it’s due to very small values (approaching 0) being represented by scientific notation when you get to string s1 = d2.ToString();. At that point, the first digit can be anything. Ironically, using a standard numeric format (such as “F”) could introduce a rounding error on the other end, where values approaching 6 might get rounded up to 6, unless you specify a large enough number of decimal places (such as “F20”), but then there’s a performance impact.

34. Ian Menzies says:

I definitely did not spot the defect until I ran it. Even after I ran it, I didn’t quite get what was going wrong until I threw a whole bunch of WriteLines all over the place (I was running the code on .NET Pad).

35. Justin Laster says:

Seeing the .ToString() on those numbers makes me feel like there’s a pit inside my stomach.

In fact, after typing that out, I checked MSDN (RTFM!) and I think it actually gives us exactly what may be happening here:

The very last example as seen tells us that calling .ToString() on a double *may* return a special format for “exponential (scientific) notation.” without *any* arguments.

The format is seen as such:

// Displays 1E-09.

I’m guessing since we’re just pulling the first character out of the “stringified” double, that number could be anything from 1-9 *IF* the library *happens* to use that notation.

Basically, if you call ToString() on a double, and it happens to decide to use that specific notation, you could get a number, say 8 which doesn’t fit in our counts[] array. You’ll get a nice element out of bounds or something.

Why the library defaults to this behavior as laid out on that page (assuming it is correct) without any explicit invocation and how it decides to that is questionable, in my opinion — but that’s another discussion.

This may be extremely unlikely and culture dependent — the MSDN documentation doesn’t state under what circumstances this happens — so you may not have even run into it. If that isn’t it, it probably certainly is a bug you’d want to fix in your super awesome code :).

36. Jean Hominal says:

Here is my guess: after the `ToString` call, the numbers in question may end up in scientific notation if they are small enough.
Meaning that 1 will happen less often than expected, while 10 will apparently be a possible value on a 6-face die.

37. Robert Sharp says:

It’s a sneaky bug though. Out of 6 million, only 41 rolls were out of bounds when I ran it (once I made the array large enough to handle the output).

1. And another few dozen were scientific notation that ended up producing 2 through 5 anyways.

38. Arseny Kapoulkine says:

The followup should be “Here, I fixed it: s1 = d2.ToString(“f”)… Or did I?”

39. Daniel Brückner says:

I also expected being able to crash `Int23.Parse()` by going to the number format setting in the control panel, disable leading zeros and replace the decimal separator with my favorite character. Now `(-1.2).ToString()` returns foo1bar2 on my system but `Double.ToString()` keeps ignoring my leading zero setting.

my first thought when seeing the code was culture dependent ToString too. I think that comes from Resharper always warning me about it.

40. `int i2 = (int)d2 + 1;`

Problem solved \o/ ;p

41. Jorge says:

Sorry if someone has said it!
Here is my shot

double d2 = d1 * 6;

This seems very wrong…
If d1 is < 1 then you can get 0.1 and that * 6 will still be 0.
So essentially you have around 83% of having a number between 1 and 5 instead of 80% (0.83*6 = 4.98)
Also tested the probability of having a 1 and 0 and that was totally skewed from the "normal" dice probabilities.

Eric, when are you posting the solution? :p

42. Ariel Ben-Yehuda says:

And the lesson is – don’t use doubles where you can’t handle inaccuracy unless you are planning to run your program through a proof-checker and your name is Dan J. Bernstein.

43. My answer was that the code was using a culture based version of ToString and maybe there’s a culture that formats numbers weirdly.

Looking at the actual answer, I was sorta correct.

44. If I’m not mistaken, this could be solved by using a minimum format specifier of `F`. That will always produce an output of `0.XY` in this case, where X is the number we care about. We can’t use `F1` due to rounding which could skew results.

1. Oh, except that would break for the upper bound of 0.99, which would round to 1. What format string could be used in this case? `R`?

1. Ah, but can 5.9999999999999991 be the result of multiplying random.NextDouble() by six?

🙂

1. Sorry, I missed that remark on MSDN. But we can have the same effect with 0.99999999999999978 or even 0.99999999999999967:
``` 0.99999999999999978 * 6 ==> 5.9999999999999982 5.9999999999999982.ToString() ==> 6 5.9999999999999982.ToString("R") ==> 5.9999999999999982 ```

45. CodesInChaos says:

> other than the sheer ludicrousness of not using Next(min, max) like a sensible person

It’s not like Next(max) or Next(min, max) are free of biases either.
My favourite example is `Next(1431655765) % 2` which contrary to expectation doesn’t output 0 and 1 with roughly the same probability.

Considering this and the difficulty of producing a unique 31 bit seed (random 31 bit values collide frequently), I’d argue that a sensible person should drop `System.Random` altogether.

46. Les Noland says:

What I don’t understand is why you would have gone through all those gyrations with strings and characters to begin with. Why not just cast the double to an int?

``` public static int RollDie(this Random random) { double d1 = random.NextDouble(); // Always less than 1.0 double d2 = d1 * 6; // Always less than 6.0. int i1 = (int)d2; int i2 = i1 + 1; // 1 through 6 return i2; } ```

– Les

1. The program as given is completely ridiculous; there are many, many better ways to solve this problem and you’ve found one of them. However, the program was not entirely contrived; I’ve seen real code that tried to examine the first character of the string form of a double and had bugs as a result.