The importance of profiling automation

Needless to say that one of the basic pillars of immersion in a game is performance perceived as smoothness and responsiveness. A game with terrible performance breaks the immersion that could have been present otherwise. But even if we understand that as an industry, we still don’t seem to put constant effort on how to make sure that pillar in our games is solid. A few months ago I went to Sony’s PlayStation Devcon 2017 where I got to talk to many fellow rendering programmers as well as tools developers. One of the things that caught my attention was that even though most (if not all) the people I got to talk to were working on big AAA games or technology, few of them showed any interest on profiling automation. This was also evident during a rendering round table where I was pretty much the only one who asked for output of metrics from Sony’s tools to use in automation. I thought that perhaps the work and knowledge on automation were left to SQEs (Software Quality Engineers), but when I asked many told me they had few or no automation tools to check for performance regressions. That is alarming because AAA games tend to be huge beasts with a lot of people making content and writing features, but rarely do you have as many people as you need to keep the performance on check as the game is evolving. In the end it seems like we rely on heroic efforts at the end of the project or even patches after launch. So my idea with this short piece is to explain why profiling automation is important and some of the required features to get better automation.

Game with terrible performance on release.

The performance of a game is one of the things that seems to become critical only at the end of a project. That goes in line with the constant worry about “premature optimization” which is far more often “overdue optimization” and it is the justification for poor architectural choices and poor use of available resources. But while you might think that you don’t have the time to focus on performance as you are trying to add new features, it is always important to see the performance impact of the code added. Many programmers think they can get by just on assumptions but unfortunately most if not all programmers (myself included) make mistakes when assessing performance without profiling data. For some unknown reason to me programmers are fully aware of the limitations of estimating tasks during a project, yet they are very confident on their understanding of what the performance issues are on a project or how the code they just submitted impacts it.

Personal experience.

My perceptions are based on the fact that before joining EA Vancouver I was a consultant and I was mostly contracted to improve performance. My clients knew that they had performance issues they needed to fix, but they couldn’t have an employee assigned to the task, nor hire a new full-time employee just to fix that. Having done that type of work for so long I told my clients that I would be working on whatever they asked for (I always signed staff-augmentation-type of contracts) but I always recommended that they assigned me a task to just do profiling. Thankfully every single client agreed but they also told me that they knew what the issues were and what to look for. Instead I would profile CPU, GPU, and IO without looking specifically for the performance issues the client wanted fixed. More often than not the profiling data showed that the biggest performance issues were completely different than what they thought. That constant repetition of the experience only served to reinforce my thought that as developers we are just as good estimating performance as we are estimating tasks in a project. But that wasn’t the only common issue among my different clients. Many of them blamed other teams or technology for their performance issues, and when they knew nobody else was to blame they often found out too late about performance regressions they introduced. I can’t even count the number of times that someone had to be sent on a scavenger hunt trying to find the change or changes that caused a performance regression in the last week or month.

Solution.

What is the solution to this issue? Profiling. But there are two distinct approaches to handle this. One is to change the culture of the programmers so they profile every change they make (even if they don’t think it is performance-sensitive code), and at the same time change the culture of management to make sure it gets done with a proper allocation of time for the task. That’s pretty much impossible to do, especially for teams within a game that don’t have the same performance focused mindset. The odds of getting that done on a rendering or systems teams are considerably higher than on the UI team. But even in the rendering team it is hard to allocate the time to get that done, and it is even harder for non-technical management to understand the importance of doing that. The other approach is to move away from manual profiling and processes and instead rely on automation. Automated performance runs can get rid of the issue by providing data as often as your hardware infrastructure would allow. In essence, a good automation system with a good dashboard would do the high level profiling that an engineer would have done, and even make available the low-level profiling data for the engineers. At the same time the dashboard can provide high level performance numbers for technical management. With that data in hand they can ask each team about performance issues and start planning and allocating time to get them fixed.

Defining profiling smoke.

Before you even look at tools and metrics you want to gather, it is critical that your team defines what a profiling run is. This is critical because if the runs are not consistent in terms of what the game is doing during profiling, then it will be extremely hard to decipher performance trends out of the profiling data noise. And not only that, but people will not value the profiling data since they know the performance spike might go down by itself by time the next automated profiling run gets done. Obviously each team within a game will have different profiling scenarios that they consider important so it is critical that any automated profiling covers those scenarios. But it is also important that the profiling smoke tests not be segmented per team and instead have a single smoke that covers all the different scenarios. The value of doing that is that since the different subsystems in a game run concurrently, many issues will arise in the interaction points of those subsystems which are owned by different teams. For example, the UI team might think that the scenario they need to test is on a bunch of UI-heavy screens but perhaps a change in UI impacts the performance during normal gameplay even though they might just have a few UI elements for the HUD. The other important aspect is determinism. Most games are not fully deterministic, but it is important that the profiling smoke be as deterministic as possible within the constraints of the game. Any engineer should have the ability to clearly correlate the data of two different profiling smoke runs and that’s only possible if there is an acceptable level of determinism on the runs.

Data to gather and process.

The next step is to determine what type of data to gather and how that gathering process impacts the performance of the profile smoke itself. The level of details can be as low as capturing instruction-level counters at a fast rate, to very high level timers that just gathers how long a section of the game took to execute (for example, time to run through a whole level). The benefit of the low level approach is that the engineer would have access to specific detail to tackle a problem, but at the same time the amount of data generated is too big and it is bound to impact the performance of the profiling smoke itself. The high level approach has the benefit of reducing the amount of data to gather and process, it is very lightweight on its impact, but at the same time it doesn’t offer enough information to the engineer more so than just “this ran slower/faster than before”. After doing this for a while I have found the happy-medium to be timestamped counters on a per-frame basis to be the best solution. The data they offer is good enough to pinpoint specific issues (assuming there are enough markers), it is a model that works just as good on CPU and GPU, and it tends to be reasonable to manage in terms of cycles and memory spent per frame. There are multiple tools that can generate and visualize this data:

Intel GPA Microsoft PIX SN Systems Razor RAD Game Tools Telemetry

Unfortunately, to the best of my knowledge only Intel VTune and RAD’s Telemetry support exporting the profiling data (feel free to correct me on the comments). Any system that can’t export the data won’t work for this. If that’s the case, then you will have to implement the CPU and GPU markers on your own. This will allow you to export the data for the automation system, and can also be used for an in-game profiler if desired.

One thing to keep in mind is that the data will be used for full runs instead of a few frames. This means that whatever system you implement (or product you purchase) must be able to cope with a considerable amount of data. Even with a low granularity of CPU and GPU markers per frame you still generate a sizable amount of data. Just as an example, in one of the profiling smokes I can run at work I can capture 45 minutes of data with around 120 markers per frame. So a full capture contains the data for around 20 million markers which means that just storing the beginning and end 64-bit timestamp (with no name) for a marker would generate ~320 megabytes of uncompressed data. At the same time, whatever tool you use to generate metrics from the source data must be able to cope with that amount of data, especially if you want to have a quick turnaround. Once you have all the data you can generate the relevant metrics that will be the “electrocardiogram” of the game’s performance. The data that you decide to display will be highly dependent on the needs of the game but here are a few ideas.

  • General
    • CPU frame time.
      • Breakdown per subsystem or team.
    • High-water CPU frame time.
    • High-water number of consecutive CPU spikes.
    • Number of allocations per frame.
    • Loading
      • Time to load a level.
      • Time to leave a game and go back to the main menu.
      • High-water memory use during loading phases.
  • Rendering
    • GPU frame time.
    • High-water GPU frame time.
    • High-water number of consecutive GPU spikes.
    • High-water video memory used.
    • Number of missed v-syncs.

Conclusion.

When everything is set and done the whole team will be assured that whatever they do will be profiled. That means that performance won’t just vanish as features are added. In the worst case scenario, you will have performance issues but you will have the ability to either revert a changelist or decide how performance will be recovered in order to leave that new piece of code in the depot. Another huge benefit is that this workflow will also help put performance in the mind of the developers and not just get a feature implemented at any cost. So in the end the technical management will be assured visibility on the performance as the game evolves, the engineers will have the proper profiling data when technical management complains, and the rest of production will know if what they plan to deliver to the end user will actually be achieved in terms of performance.

Advertisements