Limitations of memory tracking features in Unreal Engine 4.

Since June last year I have been working on a AAA game based on Unreal Engine 4 as a contractor. One of the big issue on most AAA titles, and which certainly bit me on this project, is memory consumption. It is a tough challenge to provide the best experience to the end-user while doing so within the constraint of the current available memory. And even more so when working in a sandbox game which is my case. But while the Unreal Engine 4 does provide some facilities for tracking memory, I have come to see that they are not up to the task when it comes to big sandbox games. But first let’s dig into the two means provided by the engine, memory report commands and MallocProfiler.

Notes: For this post I will assume that you have read the blog post by Ben Zeigler called “Debugging and Optimizing Memory” on the official Unreal Engine blog. Also, this article is based on what’s available in Unreal Engine 4.6.

Memory report commands

The memory reports commands are a bunch of different console commands that allow you to see the memory usage in general. All those commands are bundled together in a single very convenient command that’s “Memreport –full”. Behind the back it executes the following commands:

Mem FromReport
obj list -alphasort
rhi.DumpMemory
LogOutStatLevels
ListSpawnedActors
DumpParticleMem
ConfigMem
r.DumpRenderTargetPoolMemory
ListTextures -alphasort
ListSounds -alphasort
ListParticleSystems -alphasort
obj list class=SoundWave -alphasort
obj list class=SkeletalMesh -alphasort
obj list class=StaticMesh -alphasort
obj list class=Level -alphasort

All these command rely on three mechanisms for gathering data: global allocator stats, memory tracking stat define, and each object’s UObject::GetResourceSize().

Global allocator stats

LogMemory: Platform Memory Stats for WindowsNoEditor
LogMemory: Process Physical Memory: 722.09 MB used, 945.73 MB peak
LogMemory: Process Virtual Memory: 732.14 MB used, 1379.78 MB peak
LogMemory: Physical Memory: 9624.43 MB used, 24551.27 MB total
LogMemory: Virtual Memory: 1026.31 MB used, 8388608.00 MB total
LogMemory:
LogMemory: Memory Stats:
LogMemory: FMemStack (gamethread) current size = 0.00 MB
LogMemory: FPageAllocator (all threads) allocation size [used/ unused] = [0.00 / 0.56] MB
LogMemory: Nametable memory usage = 1.54 MB

This is the most basic information. The depth of the data provided is very limited but it is the first basic information to look at when assessing memory issues. The first four lines is data that comes from GlobalMemoryStatusEx and GetProcessMemoryInfo which are completely generic, and it’s data that you can already see in the Task Manager.

The last three lines are specific bits of memory which usually don’t take that much but it is still useful to see. The FMemStack size is the memory used by the linear stack allocator. That allocator is a singleton and it is used for multiple things such as storing data for rendering composition passes, shadows information, etc.

The second line refers to the allocation size done by the page allocator. This allocator it also store data internally split at in normal page size allocations and small page size allocations. The statistics provided include both normal page size and small page size allocations. The FMemStack uses the FPageAllocator so they are related.

The last entry just shows the size of the table that stores all the FName entries. Since FNames are used throughout the engine it can be a sizable piece of memory, especially in sandbox games with lots of different UObjects.

Memory tracking stat defines

The Unreal stat system provides the following macros to create stat entries for memory tracking:

#define DECLARE_MEMORY_STAT(CounterName,StatId,GroupId)
#define DECLARE_MEMORY_STAT_POOL(CounterName,StatId,GroupId,Pool)
#define DECLARE_MEMORY_STAT_EXTERN(CounterName,StatId,GroupId, API)
#define DECLARE_MEMORY_STAT_POOL_EXTERN(CounterName,StatId,GroupId,Pool, API)

#define INC_MEMORY_STAT_BY(StatId,Amount)
#define DEC_MEMORY_STAT_BY(StatId,Amount)
#define SET_MEMORY_STAT(StatId,Value)

#define INC_MEMORY_STAT_BY_FName(Stat, Amount)
#define DEC_MEMORY_STAT_BY_FName(Stat,Amount)
#define SET_MEMORY_STAT_FName(Stat,Value)

These macros allow you to define, set, increase, and decrease each memory tracking stat entry. But the fault of this approach is that it needs to be increased and decreased properly, and in a place that it is actually near where the memory is allocated or freed. For example, let’s look at one example in the engine

void FPrimitiveSceneInfo::RemoveFromScene(bool bUpdateStaticDrawLists)
{
	check(IsInRenderingThread());

	// implicit linked list. The destruction will update this "head" pointer to the next item in the list.
	while(LightList)
	{
		FLightPrimitiveInteraction::Destroy(LightList);
	}

	// Remove the primitive from the octree.
	check(OctreeId.IsValidId());
	check(Scene->PrimitiveOctree.GetElementById(OctreeId).PrimitiveSceneInfo == this);
	Scene->PrimitiveOctree.RemoveElement(OctreeId);
	OctreeId = FOctreeElementId();

	IndirectLightingCacheAllocation = NULL;

	DEC_MEMORY_STAT_BY(STAT_PrimitiveInfoMemory, sizeof(*this) + StaticMeshes.GetAllocatedSize() + Proxy->GetMemoryFootprint());

	if (bUpdateStaticDrawLists)
	{
		RemoveStaticMeshes();
	}
}

In that piece of code we can see how STAT_PrimitiveInfoMemory is decreased, but not anywhere near where the actual memory was freed. The memory was freed inside the allocator defined for the StaticMeshes array and the scene proxy, and all of that was triggered by remove an element from the octree. If someone makes changes to the memory usage of the octree then this stat would reflect wrong memory consumption which leads to wrong decisions when optimizing memory usage. The same happens if FPrimitiveSceneInfo changes, especially when new containers are added to the class.

The process of having up to date allocation data by means of stat defines is very error prone. This data does get out of date, it gets written wrong, and it doesn’t actually track memory, just estimates of what the programmer thinks it could consume memory. And the last mechanism, the use of UObject::GetResourceSize() has the same issues.

Memory tracking with UObject::GetResourceSize()

Objects:

                                         Class  Count  NumKBytes  MaxKBytes  ResKBytes ExclusiveResKBytes
                            AIPerceptionSystem      1          0K          0K          0K          0K
                                AISense_Damage      1          0K          0K          0K          0K
                               AISense_Hearing      1          0K          0K          0K          0K
                            AISense_Prediction      1          0K          0K          0K          0K
                                 AISense_Sight      1          0K          0K          0K          0K
                                  AISense_Team      1          0K          0K          0K          0K
                                 AISense_Touch      1          0K          0K          0K          0K
                                      AISystem      1          0K          0K          0K          0K
                                  AmbientSound     21         15K         15K          0K          0K
               AnimNotify_PlayParticleEffect_C     11          1K          1K          0K          0K
                                  AnimSequence     78      53453K      53453K      53333K      53333K
                        AnimSingleNodeInstance     85        241K        242K          0K          0K
                                 ArrayProperty    729         85K         85K          0K          0K
                           AssetObjectProperty      2          0K          0K          0K          0K

In the ideal situation this function provides memory usage of a specific UObject. The function is defined this way:

/**
 * Returns the size of the object/ resource for display to artists/ LDs in the Editor. The
 * default behavior is to return 0 which indicates that the resource shouldn't display its
 * size which is used to not confuse people by displaying small sizes for e.g. objects like
 * materials
 *
 * @param	Type	Indicates which resource size should be returned
 * @return	Size of resource as to be displayed to artists/ LDs in the Editor.
 */
virtual SIZE_T GetResourceSize(EResourceSizeMode::Type Mode)
{
	return 0;
}

It should just take a call to the specific UObject::GetResourceSize() and we should get the data which is extremely useful data to have. It tries to answer the question “what UObjects do I need to optimize in the scene?”, “which UObject-based classed do I need to optimize?” and questions of that nature. But again, this is as good as the implementation written for it. This is another mechanism that gets outdated, that is implemented wrong (for example, by assuming the data is stored with an object and not just pointers to the data), or that is just empty. For example, one may ask “which skeletal mesh in my scene should I optimize?” Let’s look at the implementation in the engine as of Unreal Engine 4.6:

SIZE_T USkeletalMesh::GetResourceSize(EResourceSizeMode::Type Mode)
{
	return 0;
}

So that’s not good. This mechanism is useless or outdated. So to fix this I had to write something like this:

SIZE_T USkeletalMesh::GetResourceSize(EResourceSizeMode::Type Mode)
{
	SIZE_T ResSize = sizeof(USkeletalMesh);

	ResSize += Materials.GetAllocatedSize();
	ResSize += SkelMirrorTable.GetAllocatedSize();

	for (int32 i = 0, n = LODInfo.Num(); i < n; ++i)
	{
		ResSize += LODInfo[i].GetResourceSize(Mode);
	}
	ResSize += LODInfo.GetAllocatedSize();

#if WITH_EDITORONLY_DATA
	for (int32 i = 0, n = OptimizationSettings.Num(); i < n; ++i)
	{
		ResSize += OptimizationSettings[i].GetResourceSize(Mode);
	}
	ResSize += OptimizationSettings.GetAllocatedSize();
#endif

	for (int32 i = 0, n = MorphTargets.Num(); i < n; ++i)
	{
		ResSize += MorphTargets[i]->GetResourceSize(Mode);
	}
	ResSize += MorphTargets.GetAllocatedSize();

	ResSize += RefBasesInvMatrix.GetAllocatedSize();

	ResSize += ClothingAssets.GetAllocatedSize();

#if WITH_APEX_CLOTHING
	for (int32 i = 0, n = ClothingAssets.Num(); i < n; ++i)
	{
		ResSize += ClothingAssets[i].GetResourceSize(Mode);
	}
#endif

	ResSize += CachedStreamingTextureFactors.GetAllocatedSize();

	ResSize += Sockets.GetAllocatedSize();

	return ResSize;
}

And then again, you have to check that the called GetResourceSize() functions actually return the proper value. For example ClothingAssets is an array of FClothingAssetData but that struct didn’t have a GetResourceSize() implementation (probably because that isn’t a UObject in itself but we still need the memory usage of that resource). And also, this implementation will get outdated, and perhaps I missed some data when I implemented it. You just can rely on this to get proper memory usage data.

MallocProfiler

The malloc profiler is a completely different approach. Basically this is a feature in the engine that allows to keep track of all memory allocated through new and delete operators as well as anything that happens through the global memory allocator. It collects basic stats such as memory operations, modules loader, etc. but also the actual memory operations (allocations, frees, reallocations) together with the callstack for each. The grabbing of the callstack isn’t optional since the callstack is the only piece of data that differentiates one allocation from another. But the fact that it has to capture the callstack makes it incredibly slow to use in-game, and it generates a huge output file. For example, I have seen capture files of ~15GBs, and each frame took more than one second (yes, second, not millisecond) while the game was running.

The data can be visualized with a custom tool written in C# called MemoryProfiler 2. This is a extremely slow and inefficient tool in terms of memory. A 15GB capture takes close to 30 minutes to open and more that 80GBs of system memory. Again, the issue isn’t only related to the tool but also the fact that there is just a sheer amount of data generated, and that is made worse by the fact that the opening of the data is single threaded with the code structured in a way that makes it hard to multithread. But even if the tool was faster, and the data was compressed properly, it still wouldn’t help that much because the information is limited. Lets look at the data:
MemoryProfiler2
Here we see the most relevant tab in terms of memory usage by resources because it visualizes the different allocations as a call tree. In the screenshot the allocations related to the USkeletalMeshes serialization allocations are visible. The issue is that if you have many USkeletalMeshes, you can’t find out which ones of those should be optimized. You just know that serializing all the USkeletalMeshes takes a certain amount of memory. This doesn’t provide really useful data to tackle memory optimizations except in the cases where you are doing general optimizations (such as rearranging the data inside USkeletalMesh to reduce compiler padding).

Next steps

After looking at the limitations, it is obvious that there is a place in the engine for a better way to do and track allocations. We need something that would cover up the whole spectrum of allocations, that would allow us to keep track of them, but at the same time do so without making it impossible for the game to run fast enough when it’s enabled. Next time I will start digging into a solution for that and be more explicit about the requirements.

Advertisements