Development Testing Blog

How does locking work in C#?

(This article is available in Chinese. 这篇文章可在中国)

A question I’ve gotten from several readers recently is “how does the lock statement actually work in C#?” It appears to be somewhat magical; if you didn’t already have a lock statement, how would you implement it?

Before I go on, I should make one thing very clear: this article is going to be chock full of lies. The mechanisms that the CLR actually uses to implement locks are quite a bit more complicated than the oversimplified sketch I’m presenting here. The intended takeaway here is not that you understand precisely what the CLR does, but rather that you understand how a very simple lock could be built out of even simpler parts.

Let’s start by clearly describing what a lock statement does. It is documented by the specification as having two main properties. First, a lock statement takes a reference to an object which may be “acquired” by a thread; such an object is called a “monitor” for historical reasons. Only one thread may acquire a particular monitor at one time. If a second thread attempts to acquire a particular monitor while a first thread is holding it, the second thread blocks until such time as the first thread releases the lock and the second thread can acquire the monitor. The question then is how this behaviour can be implemented without locking, since that’s what we’re trying to implement. Second, the C# specification states that certain special side effects in multithreaded programs are always observed to be ordered in a particular way with respect to locks; we won’t discuss this aspect of locking in this article.

Once more, before I go on I want to clarify a few other differences between what I’m presenting today and reality. In the real C# language you can lock on any instance of a reference type; we won’t consider that. In the real C# the same thread can acquire the same monitor twice:

void M()
{
  lock(this.someMonitor) { N(); }
}

void N()
{
  lock(this.someMonitor) { whatever }
}

We won’t consider that either. And in the real C# language, there are more operations you can perform on monitors than just acquiring and releasing them, such as waiting and pulsing. I’m not going to discuss any of these today; remember, this is a pile of lies intended to get across the idea that locks are built out of more fundamental parts. The real implementation is far more complex than what I’ll present here.

OK, now that we’ve got that out of the way, the next thing to discuss is what the lock statement actually means in C#. When you say

void N()
{
  lock(this.someMonitor) { whatever }
}

the C# compiler does a very simple transformation of that code into:

void N()
{
  object monitor = this.someMonitor;
  System.Threading.Monitor.Enter(monitor);
  try
  {
    whatever
  }
  finally
  {
    System.Threading.Monitor.Exit(monitor);
  }
}

As you can see, Enter and Exit are the names in the .NET framework for the “acquire” and “release” operations on a monitor.

Here we have our first major lie. That was the code generated before C# 4, and it has some reliability problems. See my 2009 article on this subject for the real codegen.

To illustrate how a lock is not magical, I need to show how to implement those enter and exit methods without using locks. So again, let’s simplify. Instead of any object, let’s make a specific class of monitor objects to illustrate how it might work. Here’s a naive and broken implementation:

sealed class MySimpleMonitor // Broken!
{
  private bool acquired;
  public MySimpleMonitor()
  {
    acquired = false;
  }
  public void Enter()
  {
    // Yield until acquired is false
    while(acquired) { Thread.Sleep(1); }
    acquired = true;
  }
  public void Exit()
  {
    if (!acquired) throw new Exception("Bogus exit dude!");
    acquired = false;
  }
}

It should be clear why this implementation is unacceptable. Suppose we have:

class C
{
  private MySimpleMonitor monitor = new MySimpleMonitor();
  public void Foo()
  {
    monitor.Enter();
    try
    {
      whatever
    }
    finally
    {
      Monitor.Exit();
    }
  }
}

Threads A and B both call Foo and both enter Enter. Both discover that acquired is false, skip the loop, both set acquired to true, and both enter the body of the lock. How do we fix this problem?

The method we need is int Interlocked.CompareExchange(ref int variable, int newValue, int oldValue). This method takes an integer variable (by reference) and two values. If the variable is equal to the second value then it is set to the first value; otherwise, it stays the same. Regardless of whether the variable is changed or not, the original value of the variable is returned. All of this is done atomically. This is the magic building block that we need to build a monitor:

sealed class MySimpleMonitor
{
  private const int Available = 0;
  private const int Taken = 1;
  private int state;
  public MySimpleMonitor()
  {
    state = Available;
  }
  public void Enter()
  {
    while(true)
    {
      // If the state is Available then set it to Taken.
      int original = Interlocked.CompareExchange(ref state, Taken, Available);
      // Was it originally Available? Then we took it!
      if (original == Available)
        return;
      // It was not Available so it must have been Taken. We need to block.
      // This call means "yield the rest of my time to any other thread";
      // hopefully the thread that has the lock will call Exit.
      Thread.Sleep(1);
    }
  }
  public void Exit()
  {
    // If we're exiting, we'd better be in the Taken state.
    int original = Interlocked.CompareExchange(ref state, Available, Taken);
    if (original == Available)
      throw new Exception("Bogus exit dude!");
    // We must have been in the Taken state, so we're now in the Available state.
  }
}

Now if you’re reading carefully you should at this point be protesting that the question has been thoroughly begged. We have implemented an exceedingly simple monitor, yes, but we’ve just moved the magic atomicity from Enter into Interlocked.CompareExchange! How then is this implemented atomically?

By the hardware! The Intel chip has an instruction CMPXCHG which does an atomic-compare-and-exchange. Interlocked.CompareExchange can be thought of as a thin wrapper around a machine code routine that uses this instruction. How the hardware achieves an atomic compare and exchange is up to Intel. Ultimately all the magic in any computer program comes down to the hardware eventually. (Of course that instruction has not existed in every chipset since the invention of monitors in the 1970s. Implementing atomic-compare-and-exchange on hardware that does not have this instruction is an interesting challenge but well beyond the scope of this article. On modern hardware we rely on these sorts of instructions.)

The monitor implementation I’ve presented here would work, but it is extremely inefficient compared to real monitors; the real implementation does not simply sit in a loop calling CompareExchange and Thread.Sleep(1). Suppose we have a hyperthreaded processor — that is, we have two threads of execution going in one physical processor. Suppose further one thread has acquired the monitor and its lock body is extremely short: on the order of nanoseconds. If the second thread has the bad luck to ask for the monitor a couple nanoseconds before the monitor is going to be released by the first thread, then the second thread ends up ceding its time to any other thread in the system. What would be better is for it to burn those couple nanoseconds doing no real work and try again; this avoids the cost of the context switch to another thread.

The .NET Framework gives you multiple tools you could use to build a more sophisticated waiting strategy: Thread.SpinWait puts the processor into a tight loop allowing you to wait a few nanoseconds or microseconds without ceding control to another thread. Thread.Sleep(0) cedes control to any ready thread of equal priority or keeps going on the current thread if there is none. Thread.Yield cedes control to any ready thread associated with the current processor. And as we’ve seen Thread.Sleep(1) cedes control to any ready thread of the operating system’s choice. By carefully choosing a mix of these calls and doing performance testing in realistic conditions you could build a high-performance implementation, and of course this is what the CLR team has actually done.

So that’s it; an extremely simple monitor can be built out of an atomic test-and-set of a flag that indicates whether the monitor is taken or available, and a strategy for waiting that gives good performance in the (hopefully unlikely!) event that the monitor cannot be acquired immediately. We’d need to add a small amount of extra gear to allow the same monitor to be taken in “nested” scenarios, but perhaps you can see how that could be done from this sketch. In order to make a monitor that is robust in the face of thread abort exceptions we’d need even more gear, but again, it could all be built out of judicious uses of CompareExchange. You might have also noticed that our oversimplified implementation is in no way “fair”: if there are ten threads waiting for a monitor there is no guarantee that the one that has been waiting the longest gets it; this could cause “thread starvation” in practice. And finally, in the real CLR any object of reference type can be used as a monitor. The exact details of how the CLR does so efficiently are beyond the scope of this article; suffice to say that the CLR’s implementation is heavily optimized for the case that a monitor is (1) used only for locking, and (2) is never contended.


As always, if you have questions about a bug you’ve found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

  1. I think it’s worth at least mentioning that the CLR also uses a Win32 synchronization mechanism (currently, an event) in addition to spinning and yielding. In fact, in most scenarios where you hold a lock for more than a few milliseconds, entering a wait state by waiting on a kernel object is going to be favorable to spinning in a CAS loop.

    :: Sasha

    1. What do you mean by CAS loop? and when you wrote “kernel object” its mean that the block is in kernel-mode?
      Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *