Software Testing Blog

Spot the defect: randomness

Today on Ask The Bug Guys I’m going to turn things around a bit and ask you to find and explain the bug.

Suppose we want to generate a series of pseudo-random integers between 1 and 6 (inclusive) to simulate rolling a fair die. In the class library there is the useful random.Next(min, max) method which does precisely that. (You have to remember to call it as random.Next(1, 7) because the lower bound is inclusive and the upper bound is exclusive.) But I can do way better than that. Here is my super-awesome implementation:

using System;
static class RandomExtensions
{
    public static int RollDie(this Random random)
    {
        double d1 = random.NextDouble(); // Always less than 1.0
        double d2 = d1 * 6;              // Always less than 6.0.
        string s1 = d2.ToString();
        char c = s1[0];           // '0' through '5'
        string s2 = c.ToString(); // "0" through "5"
        int i1 = int.Parse(s2);   // 0 through 5
        int i2 = i1 + 1;          // 1 through 6
        return i2;
    }
}

I know what you are thinking: wow, Eric, just… wow, that is some insanely great code right there. Such an elegant and powerful solution with no room for improvement.

But I digress. I’m not at all concerned about correctness because this code is super awesome, but I am a bit concerned that this might be a biased die, so I’m going to do some tests. I’ll roll the die six million times, and I should get about a million of each number, right?

class Program
{
    static void Main()
    {
        var random = new Random();
        int[] counts = new int[7]; // 0 through 6
        for (int i = 0; i < 6000000; i += 1)
            counts[random.RollDie()] += 1;
        foreach (int count in counts)
            Console.WriteLine(count);
    }
}

The expected behavior of this program is to print zero followed by six numbers each within a percent or so of 1000000. Perhaps it is needless to say: that is not its actual behavior.

There is a serious bug — other than the sheer ludicrousness of not using Next(min, max) like a sensible person — in this implementation somewhere. Can you spot the defect without running the code and predict the behavior of this program? Leave your thoughts in the comments, and don’t read the comments if you don’t want spoilers.

  1. I’m not a C# developer and haven’t checked the documentation, but I’m guessing it has something to do with double’s toString implementation. Likely, numbers below a certain threshold will be converted in scientific notation (2.3e-2 or the like), and so taking the first character isn’t going to give the integral portion of the double.

  2. The program will crash. ToString can use scientific notation like “8E-10” so the first character is not always less than 6.

  3. Well, the line `string s1 = d2.ToString();` _assumes_ that you get some result like:

    1.3874398742
    0.2384792342
    3.9475498574

    But you can easily get numbers _very close_ to zero, which would output using exponent notation, for example:

    1.18073749923182E-06
    4.00615346757982E-05
    3.5634721264073E-05

    Then the rest of the code takes the first character (in the above examples, it would be “1”, “4”, “3”). In the best case scenario, this would bias your results (the above _should_ be treated like zero, but instead they incorrectly increase 1, 4, and 3)

    But in reality, you’re just as likely to get results like these:

    8.90661031422513E-05
    9.18013975451707E-05
    7.23050907590916E-05

    Which would give you 8, 9, 7. The rest of the code in turn would return “random” values of 9, 10, and 8 respectively despite you explicitly asking for numbers between 1 and 6 inclusive. Naturally, this could _easily_ introduce errors later in your code. A great example is the testing code you provided. The line:

    counts[random.RollDie()] += 1;

    Would result in an IndexOutOfRangeException when you tried to increment the 9th index of your 6-length array.

    1. Woops, I’ll admit it. I found this by running the code; I had skimmed over your challenge to spot it without running it! I hadn’t noticed that aspect until I reread your question after posting. Feel free to disregard my answer. 🙂

  4. String formatting of binary floating-point will truncate precision by default, and may result in a value one higher than expected.

    Also, there’s a bug in your accessing of the counts array, which I believe is unintentional. Should be “counts[random.RollDie() – 1]”.

  5. Your comments are all very convincing, but I *think* the issue would be that the ToString() call on the [0, 6) double value is a) dependent on the current culture, and therefore b) may end up generating a representation of the number in scientific notation. If so, then the numbers in [0, 1) are actually going to be distributed roughly evenly amongst 1-9, and so your distribution will be “super weird” (the technical term).

    Right or wrong, looking forward to reading the follow-up.

  6. I’m betting Double.ToString doesn’t print at full precision and so will print “6” for some doubles strictly less than 6.

  7. I think d2.ToString() will produce strings like “8.73425e-02”, meaning that the program is very likely to fail with an index out of bounds exception when it produces a number greater than 6. Even if that were fixed, I’m also concerned there might be values of d1 very close to 1.0 that when multiplied by 6 and printed will round to “6.0”.

  8. I’ll dip my toe in the water… I think the problem is with d2.ToString(). This will use scientific notation for the output if d2 is sufficiently small, meaning s1 may be something like “9.4E-07”, which will yield a result of 9.

    1. Nope. int times double converts the int to double, not the double to int. A common error is to divide one int by another and assign the result to double — 22 / 7 converted to double is 3.0, not 3.14… But that’s not the problem here.

  9. Not obvious! Here’s at least one problem: Double.ToString() can return a number in scientific notation, e.g.,

    (double) 1e-15).ToString() == “1E-15”

    This seems behavior seems to kick in for d2 < 1E-4. So 1 of every 60,000 times, you'd be sampling something like the first significant digit of d2 (i.e., http://en.wikipedia.org/wiki/Benford%27s_law). But that's too little of a difference to be picked up in 6,000,000 samples, so perhaps there's another problem…

  10. This line can sometimes produce 6.0 because of double precision rounding.
    double d2 = d1 * 6; // Always less than 6.0.
    This will cause program to crash with index out of range.

  11. Is the bug in your loop? The documentation for Random states that when you use the parameterless Random() constructor, it uses the system clock as the seed. But the system clock doesn’t detect time differences that are less than 15 milliseconds, and when the random is accessed within a tight loop, there is the high probability that the same “random” number will be generated for a certain number of subsequent calls.

    1. This is true if you’re constructing several Random() objects within the same 15 milliseconds, but not if you’re only constructing one. Those several objects would have the same seed and produce the same sequence, but wouldn’t produce the same number on sequential calls any more than a Random() object constructed with a specified seed would. (It’s random, the same number on sequential calls is possible, although extremely unlikely in the case of Random.NextDouble().

  12. It will almost certainly crash in the test rig with a counts[] bounds fail:
    Values below a certain (small) threshold will be returned from ToString in exponential form with the first character being the first significant digit which could be any digit (it’s random) and not 0. Thus char c = s1[0]; can generate ‘0’ through ‘9’, i2 becoming 1 through 10 pushing the index out of range.

  13. I’m guessing it has something to do with the conversion from double to string. Perhaps many numbers between 0 and 1 are rendered using exponential notation, with the end result being not enough 1s and too many of the other numbers.

  14. The test program will throw an out of range exception if the random number is small enough that ToString decides to format it using scientific notation.

  15. Copy paste below as is in Linqpad to see the problem. Double has higher precision than int and that is the root of problem here.

    void Main()
    {
    var random = new Random();
    int[] counts = new int[7]; // 0 through 6
    for (int i = 0; i < 6000000; i += 1)
    try{
    counts[random.RollDie()] += 1;
    }catch(Exception ex)
    {
    RandomExtensions.PrintVars();
    throw;
    }
    foreach (int count in counts)
    Console.WriteLine(count);
    }

    static class RandomExtensions
    {
    public static int RollDie(this Random random)
    {
    ClearVars();
    double d1 = random.NextDouble(); // Always less than 1.0
    D1 = d1;
    double d2 = d1 * 6; // Always less than 6.0.
    D2 = d2;
    string s1 = d2.ToString();
    S1 = s1;
    char c = s1[0]; // '0' through '5'
    string s2 = c.ToString(); // "0" through "5"
    S2 = s2;
    int i1 = int.Parse(s2); // 0 through 5
    I1 = i1;
    int i2 = i1 + 1; // 1 through 6
    return i2;
    }
    static void ClearVars(){
    D1 = 0D;
    D2 = 0D;
    S1=string.Empty;
    S2 = string.Empty;
    I1 = 0;
    }
    public static void PrintVars(){
    D1.Dump("D1");D2.Dump("D2");S1.Dump("S1");S2.Dump("S2");I1.Dump("I1");
    }
    public static double D1;
    public static double D2;
    public static string S1;
    public static string S2;
    public static int I1;
    }

    1. Then I thought to try using:
      s1 = d2.ToString(“F”);

      But that can be rounded up to 6. Using the “F9” format specifier seems to work, though I can’t prove it.

      1. I can prove that it will not work with “F9” as it rounds up with a simple test using the maximum value of Random.NextDouble

        double v = 0.99999999999999978; // Random.NextDouble maximum
        Console.WriteLine((v * 6).ToString(“F9”));

        Prints 6.000000000.

  16. Your faith in ToString() will be your undoing…

    (That was my guess upon reading the code, but I had to run the code to get the details, alas.)

  17. I only know this from some badly written VB.NET code I was working through recently where I was changing from using double to decimal and had to deal with some odd conversions in the financial calculations.

    The issue is because doing .ToString() on a double has the nice side effect of printing the number in scientific notation if the number is close enough to 0. Such that 0.00000789 becomes 7.89E-6 when you do .ToString() on it. This causes the value to come back outside the range of 1-6 such that 7, 8 and 9 are value values as well. Trying to update these indexes in the array would cause an index out of bounds for the array exception.

  18. Aha, is it again related with floating point arithmetic and double d2 = d1 * 6; // Always less than 6.0. <- is not true and could gives you 6

  19. I’d guess it’s due to very small values (approaching 0) being represented by scientific notation when you get to string s1 = d2.ToString();. At that point, the first digit can be anything. Ironically, using a standard numeric format (such as “F”) could introduce a rounding error on the other end, where values approaching 6 might get rounded up to 6, unless you specify a large enough number of decimal places (such as “F20”), but then there’s a performance impact.

  20. I definitely did not spot the defect until I ran it. Even after I ran it, I didn’t quite get what was going wrong until I threw a whole bunch of WriteLines all over the place (I was running the code on .NET Pad).

  21. Seeing the .ToString() on those numbers makes me feel like there’s a pit inside my stomach.

    In fact, after typing that out, I checked MSDN (RTFM!) and I think it actually gives us exactly what may be happening here:

    http://msdn.microsoft.com/en-us/library/3hfd35ad(v=vs.110).aspx

    The very last example as seen tells us that calling .ToString() on a double *may* return a special format for “exponential (scientific) notation.” without *any* arguments.

    The format is seen as such:

    // Displays 1E-09.

    I’m guessing since we’re just pulling the first character out of the “stringified” double, that number could be anything from 1-9 *IF* the library *happens* to use that notation.

    Basically, if you call ToString() on a double, and it happens to decide to use that specific notation, you could get a number, say 8 which doesn’t fit in our counts[] array. You’ll get a nice element out of bounds or something.

    Why the library defaults to this behavior as laid out on that page (assuming it is correct) without any explicit invocation and how it decides to that is questionable, in my opinion — but that’s another discussion.

    This may be extremely unlikely and culture dependent — the MSDN documentation doesn’t state under what circumstances this happens — so you may not have even run into it. If that isn’t it, it probably certainly is a bug you’d want to fix in your super awesome code :).

  22. Here is my guess: after the `ToString` call, the numbers in question may end up in scientific notation if they are small enough.
    Meaning that 1 will happen less often than expected, while 10 will apparently be a possible value on a 6-face die.

  23. It’s a sneaky bug though. Out of 6 million, only 41 rolls were out of bounds when I ran it (once I made the array large enough to handle the output).

  24. I also expected being able to crash Int23.Parse() by going to the number format setting in the control panel, disable leading zeros and replace the decimal separator with my favorite character. Now (-1.2).ToString() returns foo1bar2 on my system but Double.ToString() keeps ignoring my leading zero setting.

    1. my first thought when seeing the code was culture dependent ToString too. I think that comes from Resharper always warning me about it.

  25. Skipping all the comments without reading…
    Sorry if someone has said it!
    Here is my shot

    double d2 = d1 * 6;

    This seems very wrong…
    If d1 is < 1 then you can get 0.1 and that * 6 will still be 0.
    So essentially you have around 83% of having a number between 1 and 5 instead of 80% (0.83*6 = 4.98)
    Also tested the probability of having a 1 and 0 and that was totally skewed from the "normal" dice probabilities.

    Eric, when are you posting the solution? :p

  26. And the lesson is – don’t use doubles where you can’t handle inaccuracy unless you are planning to run your program through a proof-checker and your name is Dan J. Bernstein.

  27. If I’m not mistaken, this could be solved by using a minimum format specifier of F. That will always produce an output of 0.XY in this case, where X is the number we care about. We can’t use F1 due to rounding which could skew results.

          1. Sorry, I missed that remark on MSDN. But we can have the same effect with 0.99999999999999978 or even 0.99999999999999967:

            0.99999999999999978 * 6 ==> 5.9999999999999982
            5.9999999999999982.ToString() ==> 6
            5.9999999999999982.ToString("R") ==> 5.9999999999999982

  28. > other than the sheer ludicrousness of not using Next(min, max) like a sensible person

    It’s not like Next(max) or Next(min, max) are free of biases either.
    My favourite example is `Next(1431655765) % 2` which contrary to expectation doesn’t output 0 and 1 with roughly the same probability.

    Considering this and the difficulty of producing a unique 31 bit seed (random 31 bit values collide frequently), I’d argue that a sensible person should drop `System.Random` altogether.

  29. What I don’t understand is why you would have gone through all those gyrations with strings and characters to begin with. Why not just cast the double to an int?


    public static int RollDie(this Random random)
    {
    double d1 = random.NextDouble(); // Always less than 1.0
    double d2 = d1 * 6; // Always less than 6.0.
    int i1 = (int)d2;
    int i2 = i1 + 1; // 1 through 6
    return i2;
    }

    – Les

    1. The program as given is completely ridiculous; there are many, many better ways to solve this problem and you’ve found one of them. However, the program was not entirely contrived; I’ve seen real code that tried to examine the first character of the string form of a double and had bugs as a result.

Leave a Reply

Your email address will not be published. Required fields are marked *