Software Testing Blog

Some enumerator bounds checks are omitted

This week’s episode of ATBG requires that you first understand the previous episode, so please read that first if you haven’t already. I’ll wait.

Welcome back. Reader Xi Chen notes that the mutable struct implementation of the list enumerator mentioned last time does implement IEnumerator<T> and IEnumerator, and asks why it is that the Current property of the struct has two different implementations, depending on whether the property is accessed through the nongeneric interface, the generic interface, or directly on the struct. Thanks for the great question!

The struct is essentially little more than a reference back to the list being enumerated, the index of the current element, and a local cache of the current element. (There is also some gear in there to detect the bug where a collection is mutated while being iterated, but that is a topic for yet another day.) The implementation of the Current property exposed by the struct simply returns the this.current field which is the cache. The generic interface method does the same; there is no explicit implementation of it. The explicit interface implementation of the nongeneric version by contrast is much more complicated:

Object System.Collections.IEnumerator.Current {
  get {
    if( index == 0 || index == list._size + 1) {
      ThrowHelper.ThrowInvalidOperationException(
        ExceptionResource.InvalidOperation_EnumOpCantHappen);
    }
    return Current;
  }
}

To know for sure why the authors of this struct made the choices they did we’d have to ask them, but I can make some educated guesses as to what is going on here. (An interesting side point here is that apparently the index is one-based, not zero-based as is customary, but that’s not germane to the point of today’s episode.)

The key facts here are (1) lists get enumerated very frequently, so the overhead has to be low, and (2) 99.99999% of the time, the Current property is going to be accessed inside a foreach loop that is talking directly to the unboxed struct. The compiler ensures that the loop does not step outside the bounds of the list. Therefore the complicated test and its exception-throwing consequence would the vast majority of the time merely add bulk to the code, or prevent it from being inlined altogether. That would be a large performance penalty for no real gain, aside from finding bugs in those exceedingly rare cases where a user gets the enumerator “manually” and does not realize that it is a mutable struct, as in the previous episode.

However, if the user is using the non-generic IEnumerator interface then we know that they do not care about the performance or type safety benefits of generics. The enumerator has already been boxed, the value returned by the Current property is likely to be boxed, and therefore the additional cost of the bounds check and conditional exception is not that large. Since the cost of the additional safety is trivial and the likelihood that it will catch a real bug is much higher, it seems reasonable to add a check.

What is not particularly clear to me is why the generic interface’s method is not implemented with a check; there doesn’t seem to me to be a compelling performance reason to avoid the check in that case. To answer that question we’d have to ask the implementers.

Finally, I note that in the absense of hard empirical evidence, all musings about performance are of course mere chitchat. The BCL team takes correctness, safety and performance all very seriously and I know that they would not have chosen performance over safety unless they had good reason to believe that the savings was real, and that the vast majority of the time the dangerous practice would not end up biting the developer. When writing your own code, make sure you’re not choosing a small performance gain over a large decrease in safety; these sorts of decisions should be driven by data, not by anecdotal examples as shown here.


As always, if you have questions about a bug you’ve found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

  1. I feel that your story is slightly off.

    For me, the checks done by IEnumerator.Current are a matter of backwards compatibility.

    The 3.5 documentation for the IEnumerator.Current property specifies that accessing the property when the enumerator is placed before the first element (index==0) or after the last (index==list._size+1) will result in an InvalidOperationException being thrown.

    The existence of that piece of documentation implies that existing implementations of that interface could be relied on to throw that exception.

    In that situation, writing new implementations of IEnumerator that did not respect that could break instances of existing, “working but incorrect” code that consumed that interface. (However, we can note that the “Exceptions” block disappears from the documentation of that property after .NET 4.0 – at that point, the IEnumerator interface is probably of interest mostly to implementors of IEnumerator<T>, who would probably appreciate the contract being relaxed in that way).

    On the other hand, the IEnumerator<T>.Current property has never made any promise of throwing an exception – meaning that an implementation is authorized to return default(T) in these buggy cases.

    I feel that the performance reason that you mention (“call can be inlined”) was a good enough reason to adopt that new behavior on the enumerator structure. And once that behavior was in for the structure, consistency (why introduce a subtle trap for people to fall into when they replace List with IList?) would in my opinion be a good enough reason to use that new behavior on the enumerator interface.

  2. The MSDN documentation of IEnumerator.Current states that “Current also throws an exception if the last call to MoveNext returned false, which indicates the end of the collection.”

    In IEnumerator this comment is gone – they changed the contract for some reason. Current is “undefined” under these conditions and may or may not throw.

    Curiously, the C# compiler does not adhere to IEnumerator’s documentation when generating iterators using yield.

  3. When designing an interface, there will often be scenarios which should never occur if the interface is used properly, but could occur in case of improper usage. Some such scenarios would represent things the caller will almost certainly know about [e.g. calling `Current` before the first `MoveNext`]; some should represent things the caller might not know about [e.g. calling `MoveNext` after a delegate invoked by the caller has modified the collection without the caller's knowledge]. Requiring that an exception be thrown in every invalid usage case may in some cases require an implementation to do more work than would otherwise be necessary [e.g. if an enumeration which is supposed to return 101, 102, 103, etc. it would be easier to have `Current` report a value which starts at 100 and is incremented on every `MoveNext`, than to have a special-case exception which gets thrown if called before the first `MoveNext`] while adding little value. Worse, usage cases which would be invalid with some collections might be valid and useful for others. How useful would `ConcurrentDictionary.GetEnumerator` be, for example, if any modification to the collection was required to invalidate the enumerator?

    In designing an interface, I think a better approach would be to specify the allowable consequences of various invalid usages; one of the allowable consequences should generally be to throw an exception specifically related to that improper usage, but the exception should only be required if behavior couldn’t be guaranteed to match one of the others.

    For example, `Current` might specify that if called before the first `MoveNext`, it may legitimately return any value the implementation sees fit or throw `InvalidOperationException`. If no exception is thrown, the call must have no side-effects. If the exception is thrown, the call may either have no side-effects, or may invalidate the enumerator such all future calls to `MoveNext` and `Current` will also throw `InvalidOperationException`.

    Likewise, with regard to enumerating a collection which is modified, a useful rule might be: “Any modification to a collection will be allowed to arbitrarily invalidate the enumerator, unless the particular collection’s contract specifies otherwise; once invalidate, `MoveNext` must throw an exception; `Current` may or may not, subject to conditions below. Actions upon a collection are required to abide by certain conditions until the first exception is thrown by either `MoveNext` or `Current`.

    - Enumeration of any collection must report exactly once any items which were present throughout enumeration.

    - Enumeration of any collection must report no more than once any item which was added or deleted throughout enumeration.

    - For purposes of these requirements, an item is modified, or which is deleted and re-added, may be regarded as one which was deleted and a different one which was added [and may thus be enumerated zero, one, or two times].

    - Multiple successful calls to `Current` without an intervening call to `MoveNext` must return the same item, even if the collection is modified, but a call to `Current` after a modification may throw an exception.

    - If `Reset` is used to make multiple passes through an enumeration, no guarantee is made as to whether they will yield the same data, or whether calling `Reset` upon an invalidated enumerator will restart it. If `Reset` is implemented but cannot restart an invalidated enumerator, it should throw `InvalidOperationException`.

    I expect Microsoft didn’t feel like going into such a level of detail when they wrote the specs for `IEnumerable/IEnumerable[T]/IEnumerator/IEnumerator[T]`, though such specifications are helpful in deciding what clients should or should not be entitled to rely upon.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *