Performance issues in XCOM 2

Hey! My name is Alexander, I am the head of computer graphics programmers at Gaijin in projects CRSED and Enlisted… Sometimes, in my spare time, I research how the graphics work in other games and find something interesting there.

I recently decided to figure out why XCOM 2 is slowing down on my laptop. While studying the rendering of this game, I found a number of places that could be accelerated without any problems. The results of my little research have resulted in a video: https://www.youtube.com/watch?v=CuPPc2Z8lTk

Below is a transcript of this video.


Good day!

You’ve probably played XCOM 2, or at least heard of it. She came out in 2016. Made on Unreal Engine 3.5. If I evaluate XCOM as a game in general, I liked it. Addictive gameplay, nice picture, interesting story.

The only problem I ran into was the low FPS, especially in close-up shots. At base and tactically, this problem is less visible. The average FPS was around 25-30. And I wondered if the game was squeezing all the available power out of my laptop GTX 1050, or if it could be done better. Now I will show you 6 optimizations that could help the developers to improve the performance of this game.

Capturing frames

For graphics analysis, I used RenderDoc version 1.12. He captured several frames without any problems, which I then reviewed. I took one shot from the menu, the base shot, the tactical map shot, and the shot shot.

They all share common performance issues. The passes you see here (screenshot below) are sequential draw calls for which the same render targets are set, i.e. the textures to draw the result to.

Bold G-buffer

The first optimization is related to reducing the size of the G-buffer. The longest pass is filling the G-buffer (> 16 ms). This can be seen both in the timings of various passes and in the general timeline.

In total, the G-buffer includes 5 textures in RGBA16F format, that is, the textures have 4 16-bit channels and contain real numbers.

1080 resolution requires about 80 MB of video memory for all this, which is not that much for modern video cards, but the problem is that all these textures need to be filled. Writing to a texture is much more expensive than reading, so a lot of used textures is the norm, and a lot of render targets is no longer very good.

So, G-buffer contains the following textures:

  1. Colors of emissive (i.e. luminous) materials (moreover, the alpha channel of this texture is empty).

  2. Albedo or just color without lighting (the alpha channel contains Ambient Occlusion).

  3. Normals (the alpha channel stores the number of one of 4 materials)

  4. Material parameters (metal color + roughness).

  5. Additional normals for anisotropic materials (translucency in the alpha channel is a parameter that shows how much a surface lets light through)

The fourth channel could be removed from the emissive texture. And thus, instead of 16 MB, 12 MB is required.

The albedo texture could well be stored as 4 8-bit channels with normalized real numbers (that is, numbers from 0 to 1). This would reduce this texture by half. Up to 8 Mb.

The standards are stored raw. You can compress them while writing, thereby reducing the amount of data, and unpack them when reading. [Подробнее можно прочитать тут]… This, of course, takes more time to execute the code, but significantly reduces the amount of data required.

The material accepts only 4 different values, which means that it packs well in 2 bits. Let’s assume that we put these two bits to the parameters of the materials. Then for the normals there are 2 channels of 16 bits each. Only 8 MB for my screen resolution.

We will leave the material parameters unchanged, except for encoding the material number into the same texture.

The last texture is the parameters for translucent materials. The first 3 components are unit vectors, which means that they can also be encoded into 2 real numbers. There are 3 channels left. Moreover, translucent materials are not emissive. At least, in the captured frames, I have not seen this. So, we can combine this texture with the emissive texture, and we now spend 0 MB on it.

In total, we need 12 MB for emissivity and translucency, 8 MB for diffusion, 8 MB for normals, and 16 for material parameters. Only 44 MB. Almost half the memory. I think this would greatly speed up the passage for filling the G-buffer.

Absence of objects in the preliminary pass

Another optimization that could reduce the amount of data written to the G-buffer is the more aggressive use of prepass. Prepass is a preliminary rendering of a scene to a depth buffer. It is performed in order to reduce the number of overwrites of G-buffer pixels by discarding pixels that have not passed the depth test. The current pre-pass will optimize rendering, but you can get better results.

When writing G-buffer, some pixels are redrawn up to 24 times.

Judging by the driver calls, there are no copies of the depth texture or reads of this texture on the CPU between the prepass and the G-buffer pass. So, theoretically, all the geometry that is drawn in the G-buffer could be drawn in the prepass. Thus, it could be done even faster. And given that this is the longest pass in the entire frame, optimization would not be superfluous.

No instantiation is used

Let’s leave pixel optimizations and turn to geometry. As you may have noticed (note the calls to DrawIndexed in the previous screenshot), objects are drawn strictly one at a time. This is because the DrawIndexed call is used for drawing instead of DrawIndexedInstanced, which allows you to draw several identical objects at a time.

And there are many identical objects here. Without going into the details of making individual draw calls and in what order and how the video card executes them, I want to note that using instantiation would require much fewer calls to DirectX functions, which means fewer commands would be sent to the video card. Already this could give an increase in FPS.

Level of Details

And the last optimization associated with drawing the scene is the level of details system. There is no point in drawing detailed geometry if it is far away and takes up a couple of dozen pixels.

First, subpixel triangles slow down rendering. You can read more in this article… Secondly, it makes no practical sense. For example, out of almost a thousand triangles of this object, we will only see a couple of dozen.

Using less detailed geometry could significantly reduce the number of triangles drawn. Naturally, this would speed up the rendering.

Full Screen SSAO (Screen Space Ambient Occlusion)

The second longest pass after filling the G-buffer is preparing the SSAO texture. It takes 8 to 10 ms. And the problem with this pass is that it is full screen.

As I told on stream on GTAO, these effects are best done at half the screen resolution. At the Pros at Activision Blizzard managed to fit AO rendering in half a millisecond… They measured it on the PlayStation 4, and I measured it on a laptop and comparing the time this way is not entirely correct. Nevertheless, I note that my video card has 2.5 times less GFLOPS, and the AO calculation in the game is 20 times slower than in the article from Blizzard. In general, I think we can conclude that the full-screen pass for AO can be significantly accelerated.

Depth of Field

And the last obvious bottleneck is the depth of field. XCOM takes a very interesting approach to this effect. 3 million triangles are drawn. Each of them corresponds to a pixel of the texture at half the screen resolution.

Depending on the depth corresponding to the pixel, the position of the triangle is selected. And the triangle is drawn to the left or right side of the final texture. Thus, the original image is halved based on depth.

The sheer number of subpixel triangles is most likely causing this draw call to take a long time. The problem is, for a triangle that only covers one pixel, the shader is done for 4 pixels. For those who are interested in the details, I again recommend reading this article

To speed up this algorithm, you can use a computer shader. Then, for each texel, the shader will be executed once.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *