Memory stomp allocator for Unreal Engine 4.

As I have been working on the memory tracking feature one of the issues I had to deal with is memory stomps. They are hard to track even with the debug allocator provided by Unreal Engine 4. I will go over what are the cases to handle and the actual implementation.

Symptoms of a memory stomp.

The symptoms of a memory stomp could be clearly evident as an explicit error about a corrupted heap, or as complex as unexpected behavior without any crash at all which is why they are so hard to catch. This is exacerbated by the fact that any performance-aware allocator won’t actually request pages to the OS on every allocation but rather request multiple pages at once and assign addresses as necessary. Only when they run out of available pages will they request new pages to the OS. In that situation the OS won’t be able to do anything to let us know that we messed up (by for example, throwing an access violation exception). Instead execution continues as usual but behavior isn’t what is expected since we could be effectively be operating with memory that is unrelated to what we want to read or write. Just as an example, I had to deal with the case where some CPU particles were behaving in a way that they would end up at origin and color change every now and then. After looking for a logic error, I was able to determine that the issue had nothing to do with the code changing the color or transform but rather a memory overrun on unrelated code which ended up in the same pool in the binned allocator. Another example is when pointers get overwritten. If you don’t see that the pointer itself was written somewhere else, rather than having the data that is being pointed to corrupt, you may waste some time. Depending on the nature of the code accessing the data pointer it may or may not crash. Still, the symptom would be similar to that of overwritten data.

Types of memory stomps.

A memory stomp could be defined as doing different type of operations with memory that are invalid. All these invalid operations are hard to catch due to their nature. They are:

  • Memory overrun. Reading or writing off the end of an allocation.
static const size_t NumBytes = 1U << 6U;
uint8_t * const TestBytes = new uint8_t[NumBytes];
// Memory overrun:
static const size_t SomeVariable = 7U;
TestBytes[1U << SomeVariable] = 1U;
delete [] TestBytes;
  • Memory underrun. Reading or writing off the beginning of an allocation.
static const size_t NumBytes = 1U << 6U;
uint8_t * const TestBytes = new uint8_t[NumBytes];
// Memory underrun:
TestBytes[-128] = 1U;
delete [] TestBytes;
  • Operation after free. Reading or writing an allocation that was already free.
static const size_t NumBytes = 1U << 6U;
uint8_t * const TestBytes = new uint8_t[NumBytes];
delete [] TestBytes;
// Operation after free:
TestBytes[0] = 1U;

One confusion that does raise up is if dereferencing a null pointer could be considered a memory stomp. Given that the behavior when dereferencing null is undefined, it wouldn’t be possible to say that it is effectively a memory stomp. So to keep things simple, I rely on the target-platform OS to deal with that case.

How it works.

The stomp allocator work by using the memory protection features in the different operating systems. They allow us to tag different memory pages (which are usually 4KiB each) with different access modes. The access modes are OS-dependent but they all offer four basic protection modes: execute, read, write, and no-access. Based on that the stomp allocator allocates at least two pages, one page where the actual allocation lives, and an extra page that will be used to detect the memory stomp. The allocation requested is pushed to the end of the last valid page in such way that any read or write beyond that point would cause a segmentation fault since you would be trying to access a protected page. Since people say that a picture is worth a thousand words, here is a reference diagram: MallocStompDiag

Diagram not to scale. The memory allocation information + the sentinel only accounts for 0.39% of the two pages shown.

As it is visible, there is a waste of space due to the fact that we have to deal with full pages. That is the price we have to pay. But the waste is limited to a full page plus the waste in the space of the first valid page. So the stomp allocator isn’t something that you would have enabled by default, but it is a tool to help you find those hard to catch issues. Another nice benefit is that this approach should work fine on many platforms. As a pull request to Epic I’m providing the implementation for Windows, Xbox One, Linux and Mac. But it can be implemented on PlayStation 4 (details under NDA) and perhaps even on mobile platforms as long as they provide functionality similar to what’s provided with mprotect or VirtualProtect. Another aspect for safety is the use of a sentinel to detect an underrun. The sentinel is the last member of the AllocationData which is shown as “Memory allocation information” which is necessary to be able to deallocate the full allocation on free, and to see how much information to copy on Realloc. When an allocation is freed then the sentinel is checked to see if it is the value expected (which in my code it is 0xdeadbeef or 0xdeadbeefdeadbeef depending if it is a 32-bit or 64-bit build). If the sentinel doesn’t match the expected value then there was an underrun detected and the debugger will be stopped. But that will only work for very specific cases where the sentinel is overwritten. So changing the layout with a different mode would help with this issue. Here is the diagram: MallocStompDiagUnder

Diagram not to scale.

This basically flips where the no-access page is set. This allows finding underruns that happen as long as it writes before the “Memory allocation information” shown in the diagram. That piece of data is small (the size depends on the architecture used to compile). The sentinel is still present to deal with small underrun errors. The underrun errors that manage to go below that point will actually hit the no-access page which will trigger the response we want from the OS. The last aspect is that any attempt to read or write from memory already free will fail as it happens. A performance-aware allocator (such as the binned allocator) would add those pages to a free list and return them when someone else request memory without actually making a request to the OS for new pages. That is one of the reasons why they are fast. The stomp allocator will actually decommit the freed pages which makes any read or write operation in those pages invalid.