Last time I talked about the implementation of MallocTracker and the integration with Unreal Engine 4. While I did a good effort to try to explain the insides of MallocTracker, one thing that was clearly lacking is how to interpret the output generated by MallocTracker. In this post I will talk about MallocTracker Viewer and looking at the output generated.
When allocations are not tagged and scopes are not defined then the output generated by MallocTracker is not that meaningful but some context is still provided. Let’s look at three allocations as text:
0x0000000063b40000,Main Thread,Unknown,127471600,GlobalScope,UnnamedAllocation 0x000000008bda0000,Main Thread,Unknown,123401431,GlobalScope,UnnamedAllocation 0x00000000ba8d83d0,Main Thread,Unknown,48,GlobalScope,UnnamedTArray
As you can see there is one line per allocation, and under each allocation there is a field. The format for each line is:
Address, Thread name, Group name, Bytes, Scope Stack, Name
So even with the allocations completely untagged there still the thread context that might be useful. But let’s see what we get when we use the UObject MallocTracker scope support as well as adding a scope in FArchiveAsync::Precache() and tagging the single allocation done in that function with the filename of the loaded file:
0x0000000063b40000,Main Thread,UObject,127471600,GlobalScope|HierarchicalInstancedStaticMeshComponent|HierarchicalInstancedStaticMeshComponent,../../../../dev/UnrealEngine/PZ4.8/../../../Unreal/KiteDemo/Content/Maps/GoldenPath/Height/Height_x2_y2.umap 0x000000008bda0000,Main Thread,Systems,123401431,GlobalScope|FArchiveAsyncPrecache,../../../../dev/UnrealEngine/PZ4.8/../../../Unreal/KiteDemo/Content/KiteDemo/Environments/Landscape/Materials/M_Grass_Landscape_01_INST_HOLE.uasset 0x00000000ba8d83d0,Main Thread,Unknown,48,GlobalScope|Material|M_CustomDepthBillboard,UnnamedTArray
With the proper tagging the data is far more meaningful. We now know that a 121.56 MiB allocation was done for the HierarchicalInstancedStaticMeshComponent UObject that was loading the Height_x2_y2.umap file from disk. Same quality of output is available in the other three allocations. While it does seem that the output is very useful, it is pretty much impossible to deal with huge number of allocations in an actual game. The Kite demo after loading everything has over 600k number of allocations that go from 16 bytes to 121.56 MiB happening in 18 different threads.
One sensible thing to do is to import that data into a spreadsheet such as Excel which can already deal with the CSV output generated by MallocTracker. That is very useful when trying to look for specific allocations such as looking for the biggest allocations and such. But in terms of navigation it is still hard to deal with it since there any many duplicated that could be collapsed (meaning allocations with same name, size, group and thread but different addresses). So it is necessary to make something custom. Enter MallocTrackerView.
MallocTrackerView is a tool I’m writing in C# to be able to visualize the data generated by MallocTracker better than I could do with Excel or a text editor. This is my first piece of code written from scratch in C#, in the past I only had to do relatively minor changes to C# tools such as an optimization for Unreal Engine’s MemoryProfiler2. Initially my idea was to write it in Python since I had already used it while doing some research for Electronic Arts and I thought that it was a good fit. The reason I decided against it is that Unreal Engine 4 doesn’t use it at all and it seems the only presence of it is on third party libraries. Using C++ was also a possibility but integrating a proper UI library would be too much. So considering that there are already a bunch of tools written in C# within the engine’s depot I decided to do it in C#. The first thing I had to consider once I had chosen C# is that I would need support for a TreeView control that had support for columns since the idea was to have three columns:
- Name. This would be the thread name, scope name or allocation name depending on where you are in the hierarchy.
- Size. Accumulated size of the allocations under that node.
- Count. Accumulated number of allocations done under that node. Since there can be multiple allocations with the same name under the same thread and scope stack, it is possible that a leaf node has a count higher than one.
Since the default C# TreeView control didn’t properly support that and I really wasn’t into writing my own solution (after all one of the reasons to use C# is to focus on the actual tool rather than infrastructure), I did a rather quick search to see how people solved the need for a similar control. As usual Stack Overflow pointed me to the solution which was to use ObjectListView. After doing a bit more research it seemed like people were happy with it so I decided to use it.
MallocTrackerView right now is very much alpha but in its current state it is fairly useful because it covers the most relevant use case that isn’t covered by a text editor or Excel which is to look at the data in terms of scopes. Let’s go over it by opening a MallocTracker dump done in the Kite demo:
That’s how the UI looks like after opening a MallocTracker dump. The root nodes in the treeview are the different threads. As you go deeper into the tree the scopes are shown and the allocations as well, here is an example:
Here we are looking at the allocations done in the ScotsPineMedium_Billboard_Mat material UObject from the main thread. This is where we can see the usefulness of having such system available, we now know the specific allocations related to that UObject, and correctly accounting for the actual memory allocated rather than relying on hand-written GetResourceSize and serialization numbers. By comparison, this is the info you get from the obj list command related to ScotsPineMedium_Billboard_Mat:
Even better is to filter the allocations by scope name where we can see related allocations. Here is the output when looking at the allocations that contain the scope name “ScotsPineMedium”:
And you can dig even deeper by combining filters where I filter by allocation group allocations with a certain name:
I think this shows clearly what can be done when having MallocTracker enabled even if only three allocations were properly tagged.
I made a pull request to integrate MallocTracker to Unreal Engine 4 and it was rejected by Epic. I think it was worth going over their reasons for not integrating because someone else might have the same doubts. Let’s go over their concerns as shown in the response to the pull request:
- It’s too much support burden to maintain those tags. I think this concern comes from the lack of understanding on how the system works. As a programmer you don’t have to tag the allocations. It is better to tag allocations, but if you don’t you still have the scopes to provide context to the allocations as you have seen on the previous examples. If you decided to tag an allocation (and you can use MallocTrackerView to determine which allocations to tag) then you don’t have to worry about that allocation ever again. The tagging is a change in a single line of code (be it an FMemory call, a container constructor, etc) you don’t need to maintain. What is a burden is maintaining all the UObject’s GetResourceSize() functions so I beg to differ. It is also worth noting that this kind of tagging isn’t something that isn’t sure it can scale, I have seen it used in full AAA games throughout the code without this concern being relevant at all.
- It most likely won’t be used by anyone at higher level. This makes the assumption that non-engine people currently have a good grasp on the memory usage of their game code and assets. While it is true that the MemReport command does offer some insights, it is still not enough and it is certainly harder to visualize than using MallocTracker with MallocTrackerView.
- At a lower level it usually doesn’t provide enough information. The only tool provided by the engine that does a better job is MallocProfiler with the MemoryProfiler2 tool. But the issues with it are that it is basically unusable in any memory-constrained environment (particularly consoles), the performance is completely unacceptable to the point that the game becomes unplayable, and many of the allocations have the exact same callstack even though they refer to completely different UObjects. Instead MallocTracker runs which pretty much the same performance as having it disabled, it has a considerably lower memory overhead (at least an order of magnitude smaller, for example it needs 35.3MiB to store 614,145 allocations), and it does provided proper information even in terms of allocations done to create UObjects from Blueprints.
In the end it is fine to have this pull request rejected, after all Epic can’t accept whatever is sent their way, but I happen to think that having this in would be rather useful for everybody. But to Epic’s credit, given the open nature of the engine anybody can ask me for the changes and get to use this even if Epic themselves won’t integrate it. That is much better that you can probably do using Unity or some other closed engine.