Software Testing Blog

Why does my code not crash?

For a bit of a change of pace, today on ATBG I’ll talk about mostly C and C++, with a little Java and C# thrown in at the end.

A very common question I see on StackOverflow in the “C” and “C++” tags is “here’s a clearly buggy program that I wrote; why does it not AV / segfault / crash when I run it?” (What Unix calls a segmentation fault, Windows calls an access violation (AV); though there are some subtle differences, for the purposes of this article they are the same thing. I’ll just say AV throughout.) Often it is something simple like:

  int* p = malloc(2 * sizeof(int));
  if (p != NULL)
  {
    int i = p[4]; // Oops -- but why no AV?

The short answer is: your program has undefined behaviour, and undefined behaviour means that literally anything can happen. If you are lucky, your program’s undefined behaviour will be something pleasant, like an AV that takes down the entire process. This sounds unpleasant, but really it is the best thing that can happen to you. Such a crash is waving a big red flag that says that there is a fatal bug in your program, and that will make it easier to find and fix. Undefined behaviour could also include, oh, I don’t know, maybe secretly sending your password to hostile attackers and not telling anyone about it for years on end. That’s clearly worse than crashing.

AVs are therefore highly desirable, so it makes sense that people who write buggy programs would like an AV to happen, to facilitate removing the undesired undefined behaviour. However, the question as asked is difficult to answer; any question of the form “why is the world not the way I want it to be?” is vague. The world isn’t the way you want it to be because you didn’t implement the world; someone else did and if you want to know why they made the choices they did, you’ll have to ask them.

A better way to phrase the question is in its “what?” form: What causes a program with undefined behaviour to AV? This form of the question doesn’t require psychoanalyzing the motivations of others, it just requires describing the world.

To understand what causes an AV we’ll need a very basic sketch of how virtual memory works on modern hardware. This sketch will deliberately gloss over many important details; do not treat it as a tutorial on modern memory management. In particular I am going to completely blur the line between what services are provided by the operating system and which are provided by the hardware; I’ll treat everything as coming from the OS.

Let’s imagine a modern 32 bit operating system. Each process is given a 32 bit address space of its very own. You can think of this as an array with 232 bytes in it, numbered from 0 through 232-1. That’s four billion bytes, which might be more physical memory than is available, and in all likelihood the process is going to use only a few million of those bytes for any particular purpose. (A process could also allocate more memory than fits into address space, but that’s a subject for another day.)

The way we solve both these problems is we split up virtual address space into units called pages; on most machines a page is 212 bytes, so there are 220 – about a million – pages in the virtual address space. The first page of address space has addresses 0 through 4095, the second page is 4096 through 8191, and so on.

The operating system then provides APIs which deal with pages. Suppose you wish to allocate two contiguous pages of memory and write values into them. The operating system must do two things. First it must somehow allocate actual physical storage for that memory; perhaps it puts it on disk or perhaps it puts it in RAM, at its discretion. Second, it must map those two pages of memory to two contiguous unused pages in the process address space. The process maintains a list of which of the million pages are in use and which are free. If there is a free block of the appropriate size then the operating system marks those pages as allocated in the virtual address space, and it gives you back the address of the start of that block.

Let me make sure this is clear: the operating system only deals with memory in 4KB chunks. You are likely used to using malloc or new or some other allocator that gives you a block of memory of arbitrary size. The implementation of malloc probably asks the operating system for pages, and then it further divides those pages up as it sees fit.

Now that we have all this background I can describe under what circumstances an AV happens. Suppose you try to dereference a pointer in order to read from the address in the pointer. What happens behind the scenes? The operating system looks at the top 20 bits of the pointer and from that it knows what page of virtual address space the pointer was on. The operating system then checks to make sure that page is valid. If it is then the operating system knows whether that memory is physically backed by RAM or disk, and if so, where in RAM or disk it is, and the OS does whatever is necessary to get you the value associated with that memory. If however the page in the top 20 bits is invalid then the operating system knows that you are trying to access a memory page that you have never allocated, and makes an AV.

This gives us enough background information to sensibly answer some more targeted questions:

Why does dereferencing a null pointer always result in an AV?

The operating system never marks the very first page as valid, and null is almost always implemented as having address zero. Therefore the operating system will always AV when you try to read or write a null pointer.

When I dereference, let’s say two elements past the end of a valid array, and I get an AV, what caused the AV?

The array happened to end near the end of a valid page and the next page was not valid. Adding the size of two elements to the end of the array happened to give a pointer whose top 20 bits indicated that it was on an invalid page, so the operating system made an AV.

When I dereference, let’s say two elements past the end of a valid array, and I do not get an AV, why didn’t I get an AV?

Either the array ended in the beginning or middle of a valid page and the address two elements after its end was still on a valid page, or the array ended at the end of a valid page, but the page after it happened to also be valid. You read memory that as far as the rules of C are concerned, you had no right to read, but the operating system doesn’t know that. From its perspective your process asked for a page, it got one, and then it read from it; the operating system thinks you’re doing just fine.

Does all of the above apply just the same when I use unsafe pointers in C#?

Yep. Unsafe pointers are unsafe if you use them improperly. Hence the name “unsafe pointer”. They work just the same: if you dereference one that happens to be on a valid page, you get no AV, even if you happened to obtain that pointer through bogus arithmetic.

When I use memory-safe references in C# or Java, for example when I dereference an array, I always get an exception when I dereference past the end of an array. Since the operating system might not detect that the address is invalid, how do I consistently get an exception?

Magic!

No, not magic. Since the operating system does not reliably provide this service, the CLR and JVM runtimes do; how precisely they do so is an implementation detail of the runtime.


As always, if you have questions about a bug you’ve found in a C, C++, C# or Java program that you think would make a good episode of ATBG, please send your question along with a small reproducer of the problem to TheBugGuys@Coverity.com. We cannot promise to answer every question or solve every problem, but we’ll take a selection of the best questions that we can answer and address them on the dev testing blog every couple of weeks.

  1. I am curious how this check by the OS of every dereferenced pointer affects performance. Does this mean that every time you dereference something it takes 3-4 times more time than it would if this check was not done?

    1. @Stilgar – There is no check by the OS, it’s done by the CPU hardware as a part of the virtual to physical address translation.

    2. Stilgar, to answer your question you must think about how memory works in modern hardware and operating systems. The performance problems associated with memory access are not in the mapping from the virtual address space to RAM; the performance problem in memory is either (1) missing the CPU cache, which gates performance on the RAM access speed plus the significant delay due to there being actual space between the RAM and the CPU — the speed of light is large but finite — or (2) missing RAM entirely and having to go to page file, which gates performance on spinning a freakin’ metal disk.

      Table lookups are extremely fast; it’s not the problem. Poor data locality is the problem.

      1. Thanks. So what you are saying is that in theory there is a cost but in practice the bottleneck is much more severe and is entirely in another area.

        1. To be more specific, the CPU’s Memory Management Unit takes virtual addresses and *in parallel* translates them into physical addresses, using the map setup by the OS. (The OS has to know how to lay out the maps so the hardware can understand them, so the C struct used by the kernel exactly maps to the underlying hardware representation).

          The CPU has a cache dedicated to this mapping – the Translation Lookaside Buffer. Part of the out of order pipelining process may be translating addresses in parallel to current execution, but the TLB isn’t infinite and part of the overhead of switching processes (aka “context switch”) is flushing the user mode process’ portion of the TLB and switching to a different address space. That’s also why most OSes have the division between kernel and user address spaces – so every single system call/API call doesn’t flush the TLB… The kernel’s half can just stay mapped all the time.

          If the mapping is not present *or* certain bits are set then the MMU asks the CPU to raise an interrupt which allows the OS to take control and respond. This is part of what Ring 0 vs Ring 3 actually are at the hardware level – the CPU knows a process is a ring 3 (user mode) process so if it attempts to write to memory mapped in the kernel’s area that triggers an interrupt. The OS installs an interrupt handler during boot that basically says “jump to address X”, which is inside the kernel’s address space. That handler can decide to quickly dismiss the interrupt, raise an exception, or go ahead and do a full process switch to a kernel process (swapping the active virtual address maps).

          Trying to modify the interrupt handler from user mode triggers a CPU fault that again gives the OS control. One of the other faults besides AV is during a fault because the memory pages in the map are valid, but not mapped to physical memory anywhere – they are paged out to disk. In that case the OS loads them from disk, maps them to physical memory, then clears the fault and resumes execution of the original process. If the OS screws up and triggers a secondary fault during this handling it can cause major bad things to happen – usually the memory holding critical routines like this are prevented from being paged out or unmapped by the OS. IIRC The OS can install a double fault handler which usually just BSODs the machine because it means something is majorly corrupt, and I think a triple fault just makes the CPU reset but I’m not 100% certain about that.

          It’s a lot more complicated than that… Read up on it if you are interested.

    3. This “check” is not so much a separate and removable safety check that the computer does on your behalf, it’s a necessary side-effect of the fact that your process is running in a virtual address space, which necessitates that your program’s pointer be translated into something real in order to be used.

      The only way to avoid this check would be to not translate between a virtual and a real address at all. This would mean that neither the OS or the hardware could provide services like a pagefile, that the program would need to be aware of what area of the address space it can safely use (making it impossible to load two programs if they made use of the same values in memory), the program would need to be aware of how much physical RAM is present on the machine, and be recompiled if that changed. There would be no memory protection between processes.

      That’s quite a lot to sacrifice.

      Also, the mapping is extremely fast. Extracting the top twenty bits out of a pointer, or combining them back in again, don’t need to use integer math operations since pages are a power of two in size. These operations can be represented directly in the wiring on the chip in the hardware. To split out the top twenty bits, you just lay those traces out one way on the chip, and lay the other twelve out another.

  2. And that’s why it’s recommended to use GFlags, or other memory safety tools when performing dynamic security testing. Doing so, these bugs can be detected, and even they can cause AVs in the test environment.

  3. It might be interesting to see more details on how GFlags / Application Verifier and tools of their ilk work.

    Someone did explain their interpretation to me, and IIRC those tools drop in “guard blocks” all over the place, and if anything touches those blocks you get magic fireworks that signal the end of the universe.

    1. There are several techniques:

      1. One is making each allocation on its own page, and unmapping any page when the allocation is freed. This obviously wastes a ton of memory, but any attempt to read or write outside the page will immediately fault. Can’t catch small overruns though.

      2. Guard regions or markers. This inserts extra padding bytes in-between every allocation, usually set to some specific pattern. Then on the next allocation or after free you can check these regions to make sure they are unmodified.

      3. For stacks you can setup guard regions then in the prologue/epilogue of each function call you can check those guard markers to make sure they haven’t been modified. Like #2, this is more about catching invalid writes but doesn’t help with reads.

      There are a lot more techniques than just these, including ones that use compressed tables of valid allocations and check every pointer access (read or write) against the table to determine if the memory was validly allocated. IIRC one really clever one I saw only had a ~7-12% performance penalty at runtime and used the fact that allocations are aligned to more quickly lookup the allocation info in the map, along with static analysis to elide runtime checks from known-safe operations.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Current day month ye@r *