Software Testing Blog

Ontogeny, phylogeny and virtual methods

Today on Ask The Bug Guys, a question I get occasionally, particularly from C++ programmers learning C#:

I’ve heard that it’s a bad idea to call a virtual method from a constructor in C++ (or C#). Does the same advice apply to C# (or C++)?


Great question. Calling a virtual method from a constructor in an unsealed class is a bad practice in both C++ and C# (and also in Java, though I won’t discuss the Java details today) but the reasons why are subtly different.

There is a now-discredited hypothesis of biology that the development of an individual organism is a replay in miniature of the large-scale evolutionary history of the species. That is, in utero the foetus first resembles a single-celled organism, then a fish, and so on. This hypothesis is usually summarized as “ontogeny recapitulates phylogeny“, and it turns out to be false for biological systems. However it is true in C++! Typically when writing a C++ program you write a base class first, then a derived class, then a more derived class, and so on; this is the phylogeny of the class hierarchy. That process is then repeated in miniature when an instance of the most derived class is created; the ontogeny is: while the base constructor is running, the object behaves like an instance of the base class, then when the derived constructor is running the object behaves like an instance of the derived class. And when the constructors are all finished the object behaves like an instance of the most derived class.

Let’s look at an example:

#include <iostream>
class B
{
public:
  virtual void M()
  {
    std::cout << "B" << std::endl;
  }
  B()
  { 
    M();
  }
};
class D : public B
{
public:
  virtual void M()
  {
    std::cout << "D" << std::endl;
  }
  D() : B()
  {
    M();
  }
};
int main(int argc, char* argv[])
{
  new D(); // B - D
  return 0;
}

The constructor for D first invokes the constructor for B, and when it runs the virtual method slot for M is still referring to B::M! The ontogeny of the object is recapitulating the phylogeny of the class hierarchy. When the B constructor completes, the virtual slot is rewritten to refer to D::M, which is then invoked by the D constructor.

By contrast, C# does not have this idea that the object progressively takes on different types during its construction. In C# an object is of its most-derived type from the moment the memory allocator creates storage for an instance of a class. And thus the seemingly-equivalent program in C# has different behavior:

using System;
class B
{
  public virtual void M()
  {
    Console.WriteLine("B");
  }
  public B()
  { 
    M();
  }
}
class D : B
{
  public override void M()
  {
    Console.WriteLine("D");
  }
  public D() : base()
  {
    M();
  }
}
class P
{
  static void Main()
  {
    new D(); // D - D
  }
}

When the B constructor calls M the virtual slot is already referring to D.M; it never refers to B.M in an instance of D.

Clearly this is a bit of a “gotcha” for new C# programmers who are used to the way C++ does it (or vice versa), but why is this a bad practice in both languages?

Because it is confusing, surprising and dangerous. In C# programs we have a situation where code in B is calling a method of D before D‘s constructor has run! That method might depend for its correctness or safety on some initialization that is performed in D‘s constructor. One imagines something like:

class D : B
{
  public override void M()
  {
    Console.WriteLine(this.foo + this.bar);
  }

where foo and bar are initialized in D‘s constructor. If the initialization code has not run yet then the fields will still have their default values, which could be completely wrong. Such a bug could go unnoticed for a long time. C++ doesn’t have this problem because a method in D is not called before D‘s constructor runs, but that is hardly better. Many C++ developers are unaware of this unusual feature of C++, and it could be surprising to them. One might reasonably expect that an overridden virtual method will always call the most-overriding method, not the least-overriding method. If the most-overriding version of a method performs a security check that the base class does not then there could again be a serious, hard-to-spot bug in the program.

Either way, it’s a good idea to avoid this pattern in both languages. In particular, remember that in C# methods ToString, GetHashCode and Equals are virtual; try to avoid calling them in constructors of unsealed classes.

  1. And it can get worse than that. Calling a virtual method in a constructor is relatively easy to spot (especially when using Reshaper, which flags virtual invocations in constructors). Consider a constructor that passes “this” to another class (which, by itself, might be an anti pattern already). This other object might have no idea that the object it just received isn’t fully constructed yet, potentially calling virtual methods which then again operate on uninitialized state.

  2. Unfortunately, with the advent of so many technologies relying on dynamic proxies, this kind of virtual call in the constructor is becoming unavoidable. Specifically, calling virtual setters on your virtual properties.

    You have to do this all the time, for example, in Entity Framework. Fortunately, the “not fully initialized” problem isn’t so much a factor when the virtual call you’re making is doing the initializing, but it’s still troubling to be forced to break what used to be a very hard-and-fast rule on a regular basis.

  3. One difficulty with constructors in .NET is that some kinds of things need to be “attached” to other objects. Ideally, such attachment would occur between the execution of the most derived constructor and the exposure of the newly-constructed object to client code, but there’s no clean mechanism for that. Either construction must be limited to factory methods (which all derived class must implement for themselves) and such methods must all call a `PrepareForUse` method, or else the base constructor must perform the necessary attachment. The factory method approach allows better semantics than are possible with constructors (e.g. cleanup methods can be called if `PrepareForUse` throws an exception) but requiring every derived type to implement the same wrapper code is somewhat ugly.

    I wish languages like VB.NET and C# would allow specification of things which need to happen before the base-constructor call, and things which need to happen after. C# runs field initializers before the base constructor call, but allows no mechanism for them to make use of constructor-parameter values. VB.NET runs field initializers after the base constructor, which is for some things disadvantageous, but which allows them to make use of the object under construction. Each approach has merit; it’s too bad that languages don’t offer more flexibility to use whichever approach is more helpful in each situation.

  4. Heck, consider the code generated by the WinForms designer. Among other things, the InitializeComponent sets properties on the form (or user control) class, and some of those property setters are virtual. And neither the compiler nor ReSharper will warn you about it, because those virtual calls aren’t happening directly from the constructor, but instead from another method (InitializeComponent) that’s called *by* the constructor.

    The ability to call virtual methods from the constructor is occasionally useful in C# (and in Delphi, which has the same semantics), but it definitely is very subtle and easy to miss.

  5. You don’t need an extra class to break things apart.
    It is enough to call a non-virtual method that does call the virtual one – and you’re done!
    Imagine this (almost making sense) design:

    public class Base
    {
    public Base(Foo fooReference)
    {
    Initialize(fooReference);
    }
    protected Initialize(Foo fooReference)
    {
    using(new TransactionScope()) // let's make sure the initialization always happen in a transaction
    {
    InitializeImpl(fooReference);
    }
    }
    protected virtual InitializeImpl(Foo fooReference)
    {
    _foo = fooReference;
    }
    private Foo _foo;
    }

  6. “This other object might have no idea that the object it just received isn’t fully constructed yet, potentially calling virtual methods which then again operate on uninitialized state.”

    Or worse, calling virtual methods which don’t even exist.

    It takes some doing — the C++ compiler in VS2013 (and probably others, but I didn’t test) is smart enough to resolve virtual calls at compile time if it can, resulting in link errors when you try to call an invalid virtual function — but if you leak the object reference from a base constructor of a class with a pure virtual function, then at runtime you can wind up attempting to call a virtual function that doesn’t even exist.

    I guess in some ways this is actually a better outcome. After all, your program crashes instead of proceeding with uninitialized state. But one can easily imagine a scenario where this crash doesn’t occur in testing, but only after the program is deployed. E.g. the leaked reference is used in a different thread, where a race exists between initialization and usage of the virtual function.

    In other words, the advice here isn’t simply to not call virtual methods from within the constructor. As Axel points out, one really should avoid allowing the object reference to leave the constructor _at all_, at least if there are any virtual members, but also even if the class simply isn’t sealed (in C#). The mere opportunity to call a virtual member before construction is completed can lead to problems regardless.

  7. > *As Axel points out, one really should avoid allowing the object reference to leave the constructor _at all_*

    Yep, that means that creating a thread inside of the object’s constructor, passing “this” as an argument for the thread’s function (all this in the initializer list), and then calling thread.start() inside the constructor’s body DOESN’T WORK if someone decides to inherit this class.

  8. Eric, how long have you been waiting to use the phrase “ontegeny recapitulates phylogeny” in a technical post? Have you perchance read the fine book “Discarded Science” by John Grant? It’s an enjoyable read, and the first place I encountered the concept.

Leave a Reply

Your email address will not be published. Required fields are marked *