Software Testing Blog

Increment in C#, C and C++

(Throughout this post, everything applies equally well to -- as it does to ++, and this article only considers side effect ordering in single-threaded programs.)

One of the bug questions I get most frequently from new C# developers is

I’m porting some C code to C# and it doesn’t work right in C#. The code is M(x++, ++x); and it does something different in C# than in C. Is this a bug in the C# compiler?

No, it’s a bug in the original C code. Or, at least, a bug waiting to happen.

C does not define the order in which side effects such as increments are observed to occur. Or rather, the C standard defines a number of points in the code called “sequence points”, and requires that side effects happen in order with respect to sequence points, but otherwise may be arbitrarily re-ordered. In the statement given the call to M and the semicolon are sequence points; the side effects of the two increments can happen in any order but they must both happen before control passes to M, and side effects of M must happen before the next statement. Other than that, the compiler is free to reorder as it sees fit.

In short, the rules in C for x++ are:

  • x is only evaluated once
  • x must be an lvalue
  • temp = x;
  • incremented = temp + 1;
  • x = incremented; happens any time before the next sequence point
  • the result is the value of temp

The rules in C for ++x are exactly the same except that the final rule is replaced by “the result is the value of incremented”.

The rules for both in C# are exactly the same as the rules in C, except that the second-last rule is replaced by “happens immediately”. (Also, C# programmers would say “variable”, not “lvalue”, and C# has special rules for applying the increment operator to properties that we are going to ignore.)

C# also requires that method call arguments be computed left to right; C does not require any order.

So in C# what must happen here, assuming that x starts off as zero, is:

  • x++ assigns 1 to x and produces 0
  • ++x assigns 2 to x and produces 2
  • M(0, 2) is called

In C it would be perfectly legal for the compiler to generate code like this:

  • ++x assigns 1 to x and produces 1
  • x++ assigns 2 to x and produces 1
  • M(1, 1) is called

Notice that in both cases so far, x at least ends up as two. This is strange, but legal in C:

  • x++ produces 0 …
  • ++x assigns 1 to x and produces 1
  • … and assigns 1 to x
  • M(0, 1) is called

It is very strange to increment a variable twice and have one of the increments be lost, but such a compiler would be legal!

You cannot expect that your C program will do the same thing when you port it directly to C#, but that’s because you can’t expect that your C program will do the same thing when you use a different C compiler either. Some C compilers have the same behavior as C#, but they are not required to and many do not. If you write your C programs so that it does not matter what choices the compiler makes, you’ll have a better chance of writing a portable, understandable and correct program.

Now let’s bring C++ into it, and talk about operator overloading. C# and C++ do operator overloading of ++ completely differently, so this is also a common source of bugs when porting code.

C++ allows great latitude in how you implement an increment operator and describing all of them would take too long. Let’s stick to the mainstream; the way you typically overload++ in C++ is to follow this pattern:

class D
{
  ...
  D& operator++() {
    // [somehow mutate this]
    return *this;
  }
  D operator++(int)
  {
    // [make a copy of current state]
    // [mutate this]
    // return the copy
  }
};

I think even fans of C++ would admit that this is quite bizarre. First off, the “int” is just a marker that means “this one is the postfix version”, which is a bit of an odd convention. What is somewhat more odd is: the operator bodies are expected to both mutate their internal state and return the correct value. Moreover, the prefix operator typically returns an lvalue, not an rvalue. The long and short of it is: in C++ you are expected to implementall the semantics of ++ in your overloaded operator, because the compiler basically treats a usage of ++ on an lvalue of type D as a call to the overloading method.

The C# approach is considerably simpler and based on the observation that the only part that is unique to a given type is the step where the incremented value is computed.  To overload an increment operator in C# you simply provide a method that the compiler can call when it needs to execute that step:

public static MyNumber operator++(MyNumber original)
{
  return original + one; // whatever that is; this is the custom logic
}

The overloaded operator method does not mutate the original. Rather, it takes a value and returns the next value. The mutation of the original variable is handled by the compiler in step 5.   When the C# compiler sees an overloaded ++ operator it goes through the same six steps as before; it just replaces step 4 with a call to the helper method to do the addition. Thus C# does not require separate methods for prefix and postfix increment; remember, the only difference between them is whether the original or incremented value is returned.

Because C++ and C# handle this completely differently, this is a common source of bugs when porting C++ code to C#. But more importantly it means that a user-defined ++ operator in C# should never be used to mutate the state of the operand. In C#, doing ++ on a variable mutates the variable, not the object that a variable of reference type refers to. If that variable is of reference type, do not use ++ on it unless you expect it to be referring to a different reference afterwards.  Working against the tool by doing something like this on a reference type:

public static MyDatabaseCursor operator++(MyDatabaseCursor original)
{
  original.Advance();
  return original;
}

is not idiomatic in C#. This should instead be something like:

public static MyDatabaseCursor operator++(MyDatabaseCursor original)
{
  return new MyDatabaseCursor(original.row + 1);
}

More generally, though, the whole thing is not idiomatic in C#. Please do not make “cute” operator overloads, like overloading ++ on a purchase order to mean “add another stapler to the order”. The point of operator overloading is to allow you to write mathematical logic using mathematical symbols, not to write obscure shorthands for business logic that have only a tangential connection to mathematics.  The best practice here is to restrict overloading ++ to types that are logically numbers of some sort.

For more thoughts on this subject, see http://ericlippert.com/tag/precedence/.

  1. In C it would be perfectly legal for the compiler to generate code like this:

    Actually, because the C programs contains two modifications of the same lvalue (x) without an intervening sequence point, the program is completely undefined and could do absolutely anything at all, including producing completely bogus values for x. For example, it could first calculate the value for “x++” (0) and also determine that it will have to assign 1 to x; next calculate the value for “++x” (1) and assign 1 to x, and finally assign the previously calculated value to x, leaving x to be 1 after the two increments. Or it could use the calculated value after the “x++” to calculate the result of “++x” (2) but still assign the temporary to x, yielding the call M(0, 2) but leaving x as 1 at the end.

    Or it could format your hard disk, because undefined behavior.

    I think even fans of C++ would admit that this is quite bizarre.

    Not really, no.

    First off, the “int” is just a marker that means “this one is the postfix version”, which is a bit of an odd convention.

    Yes, it’s a bit odd. But the justification in “Design & Evolution of C++” makes sense.

    What is somewhat more odd is: the operator bodies are expected to both mutate their internal state and return the correct value.

    That’s not odd. That’s perfectly consistent with how every other operator in C++ is overloaded.

    Moreover, the prefix operator typically returns an lvalue, not an rvalue.

    This isn’t odd either. Prefix increment yields an lvalue for primitives in C++ (++++i is legal, if inadvisable), so it makes sense that the overloaded version does too.

    The long and short of it is: in C++ you are expected to implementall the semantics of ++ in your overloaded operator, because the compiler basically treats a usage of ++ on an lvalue of type D as a call to the overloading method.

    Just as operator overloading works for every other operator in C++. An idea that gives us expression templates and embedded DSLs.

    The C# approach is considerably simpler

    True when it comes to the implementation, but in a way it’s also less intuitive: the operator++ function doesn’t do what the ++ operator does. There’s two similarly weird operators in C++: operator new and operator ->. Both are that way because they have to deal with arguments that can’t easily be expressed as parameters (a type in the former case, an identifier in the latter).
    Also, it’s less efficient: what’s the point in having an in-place operation if it is not implemented in-place under the hood?

    1. Well there’s a perfectly fine idiom in C++ for how to implement the post-increment in terms of pre-increment and a copy without duplicating any work.

      T operator++(int) {
      T old = *this; // this creates a copy
      ++*this;
      return old;
      }

      Which I don’t find that odd – the “int” marker always feels off, but I’m not sure how you’d want to change the syntax to deal with this better.

  2. C# also requires that method call arguments be computed left to right

    Would it be worth noting the difference between the latest version of C# and previous versions when it comes to named arguments? My understanding is that prior to C# 5 it evaluated named arguments first and then the positional arguments, but with C# 5 it’s now necessarily left-to-right. Is it left-to-right as it’s named in the function or the call? If I gave function(int a = 1, int b = 1) and I call function(b: c(), a:d()), what is the order of the evaluation? I did a quick test and it appears it’s left-to-right as called, not as declared:

    > Func function = (a, b) => a + b;
    > Func f1 = () => { Console.WriteLine(“f1”); return 1; };
    > Func f2 = () => { Console.WriteLine(“f2”); return 2; };
    > function(arg2: f2(), arg1: f1())
    f2
    f1
    3

    That might be worth calling out.

    1. The intention was always that argument side effects would happen in textually left-to-right order. In C# 4, which added the feature, there were some bugs in the code generator which could cause some side effects to occur in the wrong order. To my knowledge those bugs were all fixed for C# 5.

  3. Ok, not an intelligent comment like the others (maybe delete it after reading?), just pointing out a typo:

    referring to a difference reference afterwards

    difference -> different

  4. For what it’s worth, I always implemented «// [mutate this]» in the postincrement operator as «++*this;», to eliminate some otherwise duplicated code. You could implement pre in terms of post instead, but it’s unlikely to be as efficient, except in the simplest of cases.

    Also, nitpick: In both C++ operators, the thing being mutated is *this. Bad Things will happen if you manage to mutate this.

  5. The reason I put this into the category of “you shouldn’t be doing that in the first place” is because it’s dangerous to write expressions that include operands with side effects. The problem is that the order in which the operands of individual operators are evaluated is undefined. As with many of the things that are left “undefined” in C and C++ this is to allow compilers to optimize the code without unnecessary constraints. I would then argue that this isn’t really a difference between the C++ and C# languages. It just happens to be a difference in the undefined behavior from different compiler implementations.

    1. I want to start by saying that I totally agree with your conclusion: you shouldn’t be doing that in the first place in C, C++ or C#. (Or Java or JavaScript or…) In all those languages, a sufficient reason to avoid those idioms is that it is confusing to the reader of the code. You don’t need a better reason than that! But in C/C++ you have the additional reason that it is undefined behaviour. In more modern languages like C#, Java, and so on, order of side effects in single-threaded programs is defined behaviour, and not just the whim of the compiler writer.

Leave a Reply

Your email address will not be published. Required fields are marked *