Memory allocation and tracking.

After that optimization interlude, let’s go back to memory allocation and tracking. As I mentioned last time I would like to go over the different requirements of the memory allocation and tracking features necessary for shipping a AAA game without having a horrible time doing memory optimizations.

Solution Requirements

Most AAA games have a lot of resource that go in and out of system and video memory constantly. This is more exacerbated in sandbox games, and less so in stadium games (such as NBA 2K15). In any case, the solution needs to be fast and provide proper data related to all allocations happening. A fast solution that doesn’t provided proper tracking facilities won’t do, and a solution with proper tracking facilities but it is slow won’t do either. Both aspects are important in equal measure. At the same time, ALL allocations must go through it which implies that client code nor third party libraries won’t allocate memory on their own and global new and delete operators should be overridden.

Tracking Information

The solution must provide relevant allocation tracking information. The information must be as general as how much memory is used overall, as specific as what was the address returned for a specific allocation, and everything in between. Any allocation must have relevant tracking information that is defined when it was allocated and can be used as a reference for programmers to detect issues.

General Information

The general information provided should be extremely simple. It should provide the following items:

  • Allocated bytes.
  • Number of allocations.
  • Peak allocated bytes.
  • Peak number of allocations.

Allocations Grouping

Just like there are different groups/teams focused on different areas of a game, allocations should also be grouped. Some of the groups are rendering, gameplay, UI, audio, etc. The different groups have different memory allocation patterns and requirements. As such, it is a good idea to group allocations based on that criteria because it provides the following benefits:

  • Optimal allocation setup. Not all groups have the same allocation needs so it is better to allow allocation setup per group. That means that for example, not all groups need mutexed allocators, not all groups may use the same small block allocator, etc.
  • Budget tracking and enforcement. It is essential to have each group have their own amount of ram that they can keep track of, and systems programmers can assign in agreement with the different groups. Basically this will guarantee that they all have the their fair share and that everybody keeps memory usage under control.
  • Easier to detect corruption issues. Since all allocations have a group assigned with a proper allocation setup, it isn’t that hard to solve corruption issues or issues during allocation. The grouping provide good initial context.
  • Better performance. Since mutexes are not required in all the groups or allocators that cost can be avoided. On groups that require mutexes there is also less contention since there isn’t a single mutexed allocator (such as the global allocator) to lock. It is also possible to make trade-offs between absolute performance, and peak memory usage when it comes to defining how memory is allocated.

Allocation Naming

All allocations should be “named” in order to have proper ways to recognize the different allocations. What the name implies is up to whoever requests the memory, and you may enforce naming conventions, but that tag should be available in order to track memory allocations. For the sake of performance, these tags should only be available on non-shipping builds.

Allocation Scoping

The solution must allow the stacking of scopes per thread to provide more context to allocations. This provides better context than simply using the callstack of the allocation, and it is far cheaper that grabbing the callstack. On Unreal an example might be where are scope is created during UObject creation so all allocations happening for that UObject are nested under that scope. All non-scoped allocation would still belong to a global scope. Here is an example of a scope stack with the allocation as a leaf node and its relevant data:

Main Thread												Pointer				Bytes	Group
	Global Scope
		UGameEngine::Init
			/Game/Maps/LandscapeMap.umap
				AddToWorld
					PersistentLevel
						ConstructObject
							FPhysXAllocator::allocate	0x000000000b093fe0	131720	Physics

Allocation flagging

Allocations may provide optional flags that depending on the allocator used it will have a certain meaning. Some flags examples are:

  • Lifetime flags. Provides hints as to how long this allocation is expected to live. This allows the allocator to be more clever when allocating memory to reduce memory fragmentation.
  • Clear allocation flag. Let the allocator know that you want the memory allocated cleared to zero before returning.

Performance

The solution must provide acceptable performance even in non-shipping builds with the tracking features enabled. And by acceptable that means that the frame time should never go above 50ms per frame with tracking features enabled. When it goes above that threshold people start trying to avoid using the tracking features and that’s the slippery slope you will have to recover from at the worst possible time, shipping time. Also performance hit and general overhead on shipping builds should be slim to none.

Allocations Grouping

To implement the best possible allocation scheme without exposing the complexities to the client code, it make sense to define multiple allocators per group. Those allocators would be called in order, which ever allocator succeeds at allocating the memory returns the allocation. So for example under a normal group three allocators could be added:

  • Static Small Block Allocator (SSBA). This could be a static small block allocator that doesn’t grow and accepts allocations up to 256 bytes.
  • Dynamic Small Block Allocator (DSBA). This could be a dynamic small block allocator that grows as necessary and accepts allocations up to 1024 bytes.
  • Standard Allocator (SA). Standard OS allocator that allocates any size allocation.

So a request for 1032 bytes would try the SSBA first, the DSBA, and finally it will request the memory to the SA. It is also perfectly fine to just have one allocator if the allocator you use provides the functionality you need. For example, you may use jemalloc which already handles different sized allocations by itself with the proper locking mechanisms.

Conclusion

With all this data we ought to be able to create the proper facilities for memory allocation and tracking. Next time we will delve into the API that we would eventually implement. If you think I forgot any other item please don’t hesitate to comment.

Advertisements