Software Testing Blog

Can I skip the lock when reading an integer?

(This article is available in Chinese. 这篇文章可在中国)


UPDATE: This post raised a number of questions; a follow-up post is here, so please read that when you’re done this one.


Today’s question comes via my fellow Coverity employee Ian, who passes it along from a third party developer:

Here is a greatly simplified version of our code:

public class TestLock
{
  private object threadLock = new object();
  private int value = 0;
  public void Start()
  {
    lock (threadLock) { value = 100; }
  }
  public void Finish()
  {
    lock (threadLock)
    {
      if (value != 0 )
        value = 0;
    }
  }
  public void Increment()
  {
    lock (threadLock)
    {
      if (value != 0) value += 10;
    }
  }
  public int GetValue()
  {
    return value; // No lock!
  }
}

The start, finish and increment operations are all either mutating or non-atomic operations, or both, so they are locked. But the get-value operation is guaranteed by the C# specification to be atomic, so we figure that we can safely remove the lock. Is this correct?

The short answer is: Please do not elide the lock regardless of its correctness. It’s dangerous and the benefit is absurdly tiny compared to the risk.

The slightly longer answer is: The fact that you’re asking the question in the first place means that you don’t have a strong enough understanding of the memory model of C# in order to correctly write low-lock code, so it should be avoided. In fact, I’m not a big fan of multithreaded code in the first place. If you can avoid writing multithreaded code, do so. If you cannot, then use the highest level of abstraction available to you: use tools like the Task Parallel Library to efficiently manage worker threads on your behalf. If you cannot do that, then at least avoid sharing memory across threads. If you cannot do that, then at least lock all the memory that you do share across threads. Writing low-lock code should be the last thing you consider doing, not the first.

The attentive reader will note that neither of those two “answers” actually answered the question. It’s time for the verbose answer!

First off, let’s talk about atomicity.

You are correct that the C# specification promises that reads and writes of 32 bit integer fields are atomic. That is, they cannot be “torn”. A write to a 64 bit integer variable can be treated as two independent writes to two adjacent 32 bit variables, and therefore two threads that are both trying to assign values to those two adjacent variables at the same time can race. It is possible to end up with a value that is half from the first thread and half from the second thread, which is very confusing. The 32, 16, and 8 bit integer types and the bool and char types are all guaranteed to be atomic for individual reads and writes, provided that the developer has not done something silly like force the integer to be misaligned.

You are also correct that more complex operations, such as test-and-set or increment are not atomic, and therefore need to be protected against races.

So the unlocked read will at least be atomic in the sense that we never get a “torn” value out. Now, if we had something like:

    lock (threadLock)
    {
        value += 5;
        DoSomething();
        value += 5;
    }

then the unlocked access is immediately suspicious because it can read a value after the first increment but before the second increment, and that might be unexpected. Let’s assume that you are not in that situation.

Now let’s talk about locks. Locks provide two guarantees in C#.

First, there is the obvious purpose of a lock. The locked object is a monitor; only one thread may have a given monitor locked at a time. A thread which attempts to take a monitor that has been acquired by another thread will block until it can obtain access.

There is a second guarantee of a lock that is more subtle. The C# compiler, JIT compiler and CPU are all permitted to make optimizations provided that the optimization is undetectable by the current thread. In particular, reads and writes of variables may be re-ordered or eliminated entirely provided that doing so is not observable to the current thread. For example, consider the program fragment:

someInteger = someOtherInteger;
someIntegerJustChanged = true;

Given that there is no code between the two statements there is no way for the thread that runs this code to detect what order the mutations happen in. It would be perfectly legal for these two mutations to be re-ordered if the CPU decided that it was more efficient to do so. The current thread cannot tell the difference, but another thread could observe the mutations to happen in the “wrong” order. (I note that on x86-based hardware this particular reordering of writes is actually never observed; those CPUs do not perform this optimization. There is no requirement that all C# programs be run only on x86-based hardware!)

A lock introduces a restriction on these optimizations. In particular: no read may be moved before the beginning of a lock statement, and no write may be moved after the end of a lock statement. Let’s compare these two pieces of code:

  public int GetValue()
  {
    return value; // No lock!
  }

vs

  public int GetValue()
  {
    lock(threadLock) { return value; }
  }

What’s the difference?

In the latter version the CPU may not move the read backwards past the beginning of the lock, and may not move any write that happened before the method was entered to a point after the end of the lock.

In the former, the CPU may as an optimization choose to move the read arbitrarily far “backwards in time” with respect to other reads and writes on this thread provided that doing so is not detectable by the current thread. But that fact could be observed by another thread, and since we know that this program definitely reads memory locations on multiple threads, I am worried. Perhaps some other thread is expecting that the read of this value will be observed to occur in a particular order with respect to the write of some other value. You said that this is a greatly simplified version of the real program, and therefore we cannot reason from the correctness of this program to the correctness of the original program! There are a number of scenarios:

Scenario one: The authors of the code do not know that their program is correct without the lock, and do not have a solid customer-impacting performance reason to elide the lock, but are doing so anyways just for the heck of it. Fortunately for them, by sheer luck their code does not actually have a customer-impacting bug, and equally fortunately, by sheer luck no future edit to the code will introduce such a bug. Also, no future update to the C# compiler, JIT compiler or CPU optimization will ever introduce a bug either.

Scenario two: The original program has a subtle bug, or a bug will be introduced in the future via a code change, compiler update, and so on. Some other thread assumes that the unlocked read will not be moved backwards in time with respect to some other read or write, but in fact this optimization is not only permitted, but will be performed. This fact becomes observable on a thread that then behaves incorrectly as a result. The code therefore has a bug that is difficult to detect and practically impossible to reproduce.

Scenario three: The authors of the original code, who we recall are asking the question about thread safety to begin with, are actually asking a rhetorical question. They fully understand all the ramifications of every possible CPU reordering and have analyzed all those possible reorderings to determine that there is never a bug in their program under any of those conditions. Moreover, they wish to avoid the roughly 10 or 20 nanosecond penalty of the uncontested lock for performance reasons, because that one-hundred-millionth of a second penalty both observable by and unacceptable to their users. And finally, the authors will carefully review every single change ever made to this program in the future with an eye to the possible bugs introduced by reordering optimizations. Avoiding that 10 ns penalty is worth the enormous cost in expert code reviews and the high risk of getting it wrong.

My guess is that the developers are 99% likely to be in scenario one, 0.99% likely to be in scenario two, and 0.01% likely to be in scenario three. I also hazard a guess that the developers have no solid customer-impacting performance reason to elide that lock. Is the expected benefit of a 10 ns savings that no customer will notice really worth the cost, that by chance some impossible-to-detect-and-reproduce bug is in this program? I doubt it. Therefore my recommendation would be to always lock every access to a field that can be shared across threads, even if you think that eliding the lock is 99.99% likely to be harmless. The nano-scale benefit is simply not worth the macro-scale costs of verifying that the elision is correct, and certainly not worth the risk that the code is incorrect now or might become incorrect in the future. Code which avoids locks is extremely difficult to reason about, so make your life easier: if you can, don’t write shared memory code to begin with. If you have to, use locks consistently.


UPDATE: A number of commenters have asked if marking the field volatile magically prevents reordering bugs. The specific problem with that proposal is that volatile reads and locks do not have the same semantics. A volatile read can be moved backwards in time with respect to a volatile write, and the x86 processor will actually do so, but a read, volatile or otherwise, cannot be moved backwards in time past the beginning of a lock. The more general problem is: we have a toy example that is greatly simplified from the real code, and therefore we don’t know what invariants the real code relies upon. Trying to deducing whether a real gun is safe by examining a toy gun is a dangerous proposition.

Next time on ATBG I’ll give an example where every variable is volatile, but eliding locks on some of them can produce an unexpected violation of what appears to be a program invariant.


As always, if you have questions about a bug you’ve found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

  1. I’d love to see a programming language which makes it impossible to share memory across threads by accident. I’m not sure what that would look like, but I think it’s a problem worth trying to solve.

    In C#, it’s impossible to guarantee this property for any class field. It’s possible to avoid sharing fields across threads, of course, but I can’t label a field with “this must only be accessed on the thread that instantiated the class”.

    Again, I’m not sure how exactly this guarantee would be enforced by the compiler, but wouldn’t it be great if that was possible.

    1. There are a bunch of ways to do it. Here are a few off the top of my head:

      Make every object be immutable and every variable be assigned exactly once, and therefore you get thread safety for free. No writes means no races.

      Affinitize certain objects to certain threads; this is the “apartment model” where an object may be only accessed by the thread that creates it. The runtime can detect objects being acccessed on the wrong thread and produce an exception.

      Eliminate the idea of threads altogether and instead make the process the unit of concurrency. Code which wishes to modify an object in another process must use a well-defined inter-process-communication layer to send a message to the object that requests that it modify itself.

      Make thread categories — UI thread, worker thread, etc — first-class concepts in the language. When you declare a member, you also declare what kinds of thread it can run on. If a method declared as main thread only accesses a field declared as worker thread only then that’s a type system error, and vice versa.

    2. JavaScript has a very interesting model in this regard: there are just no such feature as shared memory. You can do things in other threads through special objects called ‘workers’, but any data passing between a ‘worker’ and the main UI thread is necessarily copied by the runtime.

    3. Rust uses its semantics to ensure that you have to jump through serious hoops to achieve data sharing. In Rust, a piece of data either has a unique owner who is the only one who can access it (and ownership can be transferred to another thread, but then the original thread no longer has access), or the data is shared but immutable. There are special ways to circumvent this stuff, but only within an unsafe block.

      I believe Erlang simply doesn’t have shared memory between its lightweight processes.

  2. Eric, I was halfway through reading your reply and wondering when you were going to say “… much like Erlang”. Immutable everything, single assignment, process-centric with shared-nothing message passing between them.

  3. One thing that you don’t make clear enough (in my opinion) is that in certain situations the compiler+JIT may even eliminate the read completely!
    E.g. if the trivial getter is used inside a loop, it may get inlined and then the value may be kept inside a CPU register rather than read from memory each time. As surprising as it may seem, this would be perfectly legal. The coder may be surprised to notice that the loop doesn’t seem to “see” value changes from another thread. This may even become pathological if value is used as an exit condition for the loop, e.g. a simple spin-wait: while (GetValue() == 0);. This can become an infinite loop with no way out, even if another thread changes value.

    For completeness sake you could say that — all other atomicity considerations aside — the read may be “safe” by turning it into a VolatileRead, that cannot be eliminated by the compiler nor the runtime.

    1. Good point, but unfortunately making the read volatile without a lock is not sufficient to ensure that it is not moved! Though the sort of caching you describe is not legal with a volatile read, on x86 even a volatile read may be moved backwards in time with respect to volatile writes. Perhaps I should show an example of how that can happen.

      1. Hi Eric. Really enjoyed the article. I too had the thought about using volatile to solve the problem and would really appreciate an example of how it wouldn’t necessarily be sufficient, provided that you have the time :) .

      2. You’re right that in theory a volatile read may move up before a previous volatile write on x86.

        But not in this case, since the writes are inside locks, and locks imply full fences.

        So I still think that in this case — considering only the value field and not any other potential operation depending on it — you’d be good with the volatile read.

        1. Correct, in this toy example the writes are in locks. My concern is that the customer who asked the question gave me a simplified version of their program, not the real version. One of the reasons it is so difficult to write multithreaded code in any language is because it often requires knowledge of the details of the entire program. Two parts might each function well independently but their combination has a behaviour that is unexpected. (Think about two independently tested components that take out nested locks in opposite orders, for example.)

          Are there writes in the rest of the customer’s real program that should not be re-ordered with respect to this read? I don’t know; I can’t see that program. It’s therefore a bad idea to say that their real program is safe just because the toy version looks good.

    2. Yes! So, to ensure you’re in scenario three, you need to know exactly how the compiler you are using generates its MSIL. You need to know exactly how every version of the runtime that your user base uses or will use will execute that MSIL on every CPU that they use or will use. Then, along with detailed knowledge of every re-ordering optimisation, every register windowing optimisation and any other funky trick those CPUs’ microcode might pull, you can work out whether your property will be in a register or not, and how reads and writes may be re-ordered.

      Then you need to work out whether you’ve got a bug or not.

      If you don’t do this, you don’t even know if one thread will see writes made by another thread at all, let alone in what order.

      This, incidentally, makes the C# memory model equivalent to the Java one, which makes me happy to be on familiar ground.

      Personally, I just use the lock, and regard it as a bug if I don’t.

  4. I would also like to add the hint of additional mechanisms:

    - A lock() blocks always, reading and writing. If you do not need reading blocks, consider using a ReaderWriterLock (also SlimReaderWriterLock) instead, with it you can block concurrent writings, but allow multiple readings.

    - If you have no other code that dealing with a basic types then the Interlocked Class Family allows you to read/write/excahnge/compare them in a threadsave way without locking, and this is much faster.
    You can use them with Int32, Int64, IntPtr, Object, Single, Double. See http://msdn.microsoft.com/de-de/library/system.threading.interlocked(v=vs.110).aspx

    1. Reader/writer locks are a good idea but should be justified by real performance numbers. If you in fact have a small number of readers and reading is fast then the additional overhead entailed by the reader/writer lock can be significant.

  5. Strange, I was under the impression that adding the volatile keyword to the private field was enough to make the compiler emit a MemoryBarrier and prevent reordering around this field.

    1. Volatile barriers are half-barriers; locks are full barriers.

      That means that the compiler, jitter or CPU can re-order volatile reads with respect to volatile writes. And in fact the x86 CPU can do so.

      I’ll give an example next time.

  6. One thing I’ve found frustrating is that there is no “Interlocked.Read” (or “Interlocked.Write”!). Yes, you can do it by setting up an extra variable and doing an Interlocked.Exchange, but there’s a fairly wide range of programming cases where someone is simply setting, incrementing, and decrementing a simple int.

    The Interlocked operations give you ordering guarantees (as far as I know) and they guarantee atomicity (which isn’t a big plus on 32 bit reads and writes). Atomicity aside, they would may the code clearer (I’m protecting this read/write), and provide the ordering guarantee.

    1. The Thread.VolatileRead/Write methods introduce a full fence. They could in theory each introduce a half fence, but they don’t, and there are people who depend on the full fence now, so this can’t change. The Volatile.Read/Write methods introduce a half fence.

  7. I think, that the 10 ns penalty is not whole price of the lock statement. Bigger problem is possibility of a deadlock, which is quite a commonplace problem than a theoretical reordering. At least I have fought in my career lot of deadlocks and no reordering bug.

    1. Good point, though a deadlock is unlikely if the lock only covers a variable modification. Deadlocks become likely when locks do something complicated, so try to keep lock bodies as simple as possible.

      1. I Agree. Sometimes however a method looks very simple, but a deadlock is still possible: “lock(threadLock) { _someObject.DoSomething(value); }” Problem with that code is that we don’t see what is happening inside DoSomething, especially when there was used dependency injection.

  8. While we are on the subject of potential hazards of eliding code for minor optimizations, is there any reason why

        lock (threadLock)
        {
          if (value != 0 )
            value = 0;
        }
    

    can’t be reduced to :

        lock (threadLock)
        {
            value = 0;
        }
    

    The only scenario I can think of where that could be a problem, is if “value” represented a memory-mapped I/O port. But I’d think in .NET that would require an “unsafe” and/or a “pin” at the very least.

  9. I’m confused, should we stop using volatile running flags?

    volatile bool running = true;
    while (running) { /* code*/ }

  10. assuming the following code:

    int iError = 0;
    try{
      [...] //do something
      iError = 1;
      [...] // do something
      iError = 2;
      [...] // do some other stuff
      iError = 3;
      [...] // final stuff
    } catch (Exception) {
      throw new EMyException(iError);
    }
    

    Assuming the code only does some minor initialization but exceptions might be raised anyway (let’s say a Thread Abort Exception…).
    Is this code safe or can the assignments be reordered by the compiler?

    1. Let me repeat the primary rule: an optimization may not change the behaviour of a program as observed from that thread. Since iError does not appear to be a variable that is being read or written on another thread, questions of whether the writes can be observed to be re-ordered from another thread are somewhat pointless.

  11. the CPU may as an optimization choose to move the read arbitrarily far “backwards in time” with respect to other reads and writes on this thread provided that doing so is not detectable by the current thread. so you say that this may be a bad thing. Could you give any example when this could be a bad thing? I could not think of any even imaginary situation. How this can actually be observed by other thread? So if the initial thread somehow uses the value, it will inflict write. So for the state of thread to become observable it is necessary to produce a write, enforcing a read for sure. So between there is a read in a code and write the value is undefined state (like quantum superposition) and observing it (by writing) you fix it.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *