Making Grass in Unity with GPU Instancing

Good afternoon! I want to share with you some experience in optimization using GPU Instancing.

The problem statement is something like this: a game for mobile platforms, one of the elements of which is a field with grass. Photorealism is not required, low poly style. But at the same time, the player must be able to interact with the grass. In our case, mow. We will do it on Unity 2021.2.7 (but there is no hard link to the version) with URP.

The end result can be seen on the gif.

preparations

First, let’s create our low poly blade of grass. Since the camera never rotates in the game, you can save a little and use only two triangles:

It is unlikely that this mesh will look like grass if it remains static, so we need something that looks like wind. We will do it with a shader and for this we need to properly prepare the UV, namely: the lower vertices should be at the bottom of the texture, and the top one, respectively, on top. More or less like this:

Now let’s make a Shader Subgraph, which we will use in our further experiments:

A few explanations. The input is several parameters responsible for the wind and world position of the summit. To calculate its wind shear, we run the position through Simple Noise so that all the blades of grass on the stage sway out of sync. And then through UV and Lerp we determine how much wind shear should be applied to a given vertex.

Now we can carry out our experiments.

I also want to immediately note that all performance measurements will be made on not the fastest Xiaomi Redmi 8A

Option “on the forehead”

Before optimizing something, we must make sure that there is a need for it. Let’s see how the SRP Batcher will show itself on the number of blades of grass that will suit us visually.

First, let’s make a full-fledged shader:

Now let’s start sowing the field:

_startPosition = -_fieldSize / 2.0f;
_cellSize = new Vector2(_fieldSize.x / GrassDensity, _fieldSize.y / GrassDensity);

var grassEntities = new Vector2[GrassDensity, GrassDensity];
var halfCellSize = _cellSize / 2.0f;

for (var i = 0; i < grassEntities.GetLength(0); i++) {
  for (var j = 0; j < grassEntities.GetLength(1); j++) {
    grassEntities[i, j] =
    new Vector2(_cellSize.x * i + _startPosition.x, _cellSize.y * j + _startPosition.y) 
      + new Vector2(Random.Range(-halfCellSize.x, halfCellSize.x),
		    Random.Range(-halfCellSize.y, halfCellSize.y));
  }
}
_abstractGrassDrawer.Init(grassEntities, _fieldSize);

We generate a uniform field with a small amount of randomness. We have it flat, so Vector2 is quite enough for the coordinates. Next, we pass this array of coordinates to “drawing”. Now it’s just placing the GameObject in the right amount.

public override void Init(Vector2[,] grassEntities, Vector2 fieldSize) {
  _grassEntities = new GameObject[grassEntities.GetLength(0), grassEntities.GetLength(1)];
  for (var i = 0; i < grassEntities.GetLength(0); i++) {
    for (var j = 0; j < grassEntities.GetLength(1); j++) {
      _grassEntities[i, j] = Instantiate(_grassPrefab,
      new Vector3(grassEntities[i, j].x, 0.0f, grassEntities[i, j].y), Quaternion.identity);
    }
  }
}

We get this picture:

Visually, we achieved what we wanted (in a real game, a different camera angle, there is a decor, a texture of the ground, etc., besides, the grass itself is sown only in the right areas, so it looks better, but the number of blades of grass is the same: 15-20k). But fps certainly fails. So you can think about optimization.

GPU Instancing

After reading official documentation it becomes clear that this is exactly what we need. Let’s try to use it.

In the material inspector of our grass, you can find the “Enable GPU Instancing” checkbox and turn it on. But unfortunately, It is not that simple. If we run the scene and look at Frame Debugger, we can see that SRP Batcher is still running, but not GPU Instancing. The thing is that these two technologies are incompatible, so we need to disable SRP Batcher somehow. There are several ways to do this.

1. Turn off in URP settings

This method can hardly be called recommended. No wonder that in the latest versions of Unity this checkbox was removed from the inspector. But you can still find it through the Debug mode of your URP asset’s inspector. This method does not suit us, because besides grass there are many other things in the game for which SRP Batcher suits us.

2. Make the shader incompatible

The method is not the most convenient, because it is necessary edit the shader itself. And in the case of Shader Graph, you must first generate the shader code, which is then edited. And every time the graph changes, this operation must be repeated. Further according to the documentation: “Add a new material property declaration into the shader’s Properties block. Don’t declare the new material property in the UnityPerMaterial constant buffer”. For some reason, this didn’t work in my case and the shader remained SRP compatible, so I just commented out the buffer declaration:

//CBUFFER_START(UnityPerMaterial)
float4 _MainTexture_TexelSize;
half _WindShiftStrength;
half _WindSpeed;
half _WindStrength;
//CBUFFER_END

Despite the inconvenience of this method, I chose it. It’s just that the instantiation mechanism and the prefabs themselves remain the same as in the first experiment.

3. Make the renderer incompatible

Perhaps the easiest way. In addition, it may be necessary to use one shader with SRP Batcher and with GPU Instancing. To do this, it is enough to assign to the renderer of the object MaterialPropertyBlockbecause he not compatible with SRP Batcher. In addition, MaterialPropertyBlock can be of practical use. Let’s say if we wanted to implement asynchronous swaying of grass not through random noise, but with some of our own parameters. Or paint the grass when you click on it.

After any of these manipulations, we will see that the instancing has worked:

You can take measurements:

Graphics API

You can also draw a mesh using GPU Instancing directly through the Graphics API, without the need to use a GameObject. There are 3 methods for this.

one. Graphics.DrawMeshInstanced

The easiest way. We pass the mesh, material and array of matrices. Does not require a custom shader, but has a limit of 1023 instances per call. It won’t be enough.

2. Graphics.DrawMeshInstancedIndirect

Has no limit on the number of instances that is now passed through ComputeBuffer. But the positions can no longer be transferred directly, but must be transferred as a buffer to the material. Which in turn requires refinement of the shader.

3. Graphics.DrawMeshInstancedProcedural

Similar to the previous one, used when the number of instances is known from the script. We will stop on it.

First, let’s transfer the positions of our blades of grass to the material:

_positionsCount = _positions.Count;
_positionBuffer?.Release();
if (_positionsCount == 0) return;
_positionBuffer = new ComputeBuffer(_positionsCount, 8);
_positionBuffer.SetData(_positions);
_instanceMaterial.SetBuffer(PositionsShaderProperty, _positionBuffer);

Next is the drawing itself, which should be called every frame:

private void Update() {
  if (_positionsCount == 0) return;
  Graphics.DrawMeshInstancedProcedural(_instanceMesh, 0, _instanceMaterial,
	  _grassBounds, _positionsCount,
	  null, ShadowCastingMode.Off, false);
}

If we try to use our previous material, then all the thousands of blades of grass will be drawn in one place, because our shader must now take the coordinates from the buffer. Therefore, we modify the shader:

We needed two nodes with custom functions.

We need the first one so that we can use the unity_InstanceID, with which we will pull the position of the instance from the buffer. The solution was found somewhere on the internet and a complete theoretical justification at the moment can not give. I would be grateful for links with explanations.

#pragma instancing_options procedural:ConfigureProcedural
Out = In;

The second function is directly extracting the position from the buffer.

#if defined(UNITY_PROCEDURAL_INSTANCING_ENABLED)
StructuredBuffer<float2> PositionsBuffer;
#endif

float2 position;

void ConfigureProcedural () {
	#if defined(UNITY_PROCEDURAL_INSTANCING_ENABLED)
	position = PositionsBuffer[unity_InstanceID];
	#endif
}

void ShaderGraphFunction_float (out float2 PositionOut) {
	PositionOut = position;
}

void ShaderGraphFunction_half (out half2 PositionOut) {
	PositionOut = position;
}

Now let’s take measurements:

In addition to good performance, this method allows you to use the Compute Shader to calculate the values ​​in the buffer. For example, transfer calculations for grass flying across the scene from C # to a shader. For example like this.

Some more optimization

The camera in our game does not rotate, but it can move along the Z axis. Therefore, we will add a slight semblance of cooling. Since we placed the grass on a conditional grid, we can roughly understand what part of the array of blades of grass we should display at the moment.

In the class of our glade, we will monitor the camera and calculate the boundaries.

private void Update() {
	if (_camera.transform.hasChanged) {
		UpdateCameraCells();
	}
}

private Vector3 Raycast(Vector3 position) {
	var ray = _camera.ScreenPointToRay(position);
	_plane.Raycast(ray, out var enter);
	return ray.GetPoint(enter);
}

private void UpdateCameraCells() {
	if (!PerformCulling) {
		_abstractGrassDrawer.UpdatePositions(Vector2Int.zero, new Vector2Int(GrassDensity, GrassDensity));
		return;
	}
	var bottomLeftCameraCorner = Raycast(Vector3.zero);
	var topLeftCameraCorner = Raycast(new Vector3(0.0f, Screen.height));
	var topRightCameraCorner = Raycast(new Vector3(Screen.width, Screen.height));
	var bottomLeftCameraCell = new Vector2Int(
		Mathf.Clamp(Mathf.FloorToInt((topLeftCameraCorner.x - _startPosition.x) / _cellSize.x), 0,
			GrassDensity - 1),
		Mathf.Clamp(Mathf.FloorToInt((bottomLeftCameraCorner.z - _startPosition.y) / _cellSize.y), 0,
			GrassDensity - 1));

	var topRightCameraCell = new Vector2Int(
		Mathf.Clamp(Mathf.FloorToInt((topRightCameraCorner.x - _startPosition.x) / _cellSize.x) + 1, 0,
			GrassDensity - 1),
		Mathf.Clamp(Mathf.FloorToInt((topRightCameraCorner.z - _startPosition.y) / _cellSize.y) + 1, 0,
			GrassDensity - 1));
	_abstractGrassDrawer.UpdatePositions(bottomLeftCameraCell, topRightCameraCell);
}

In instanced rendering, let’s add a buffer update.

public override void UpdatePositions(Vector2Int bottomLeftCameraCell, Vector2Int topRightCameraCell) {
	_positions.Clear();
	for (var i = bottomLeftCameraCell.x; i < topRightCameraCell.x; i++) {
		for (var j = bottomLeftCameraCell.y; j < topRightCameraCell.y; j++) {
			_positions.Add(_grassEntities[i, j]);
		}
	}

	_positionsCount = _positions.Count;
	_positionBuffer?.Release();
	if (_positionsCount == 0) return;
	_positionBuffer = new ComputeBuffer(_positionsCount, 8);
	_positionBuffer.SetData(_positions);
	_instanceMaterial.SetBuffer(PositionsShaderProperty, _positionBuffer);
}

In GameObject rendering we will enable/disable grass objects.

public override void UpdatePositions(Vector2Int bottomLeftCameraCell, Vector2Int topRightCameraCell) {
  for (var i = 0; i < _grassEntities.GetLength(0); i++) {
    for (var j = 0; j < _grassEntities.GetLength(1); j++) {
      _grassEntities[i, j].SetActive(i >= bottomLeftCameraCell.x && i < topRightCameraCell.x && j >= bottomLeftCameraCell.y &&
      	j < topRightCameraCell.y);
    }
  }
}

Well, let’s take measurements of our most productive option.

And this is how it will look at ~30 fps acceptable for many games.

I would also like to note a noticeable gain in memory due to the refusal to use GameObject. And the more instances we want, the greater this difference will be.

Underwater rocks

Unfortunately, instancing is not supported on all devices. You can check this via SystemInfo.supportsInstancing. For these devices, you can enable the fallback option with the good old GameObject, simply by reducing their number.

And of course, Samsung distinguished itself with at least the Galaxy A50. Though SystemInfo.supportsInstancing returns true (which is apparently true because there are no exceptions), but something is wrong with the buffer. All the grass on this device is stubbornly drawn at one point. Therefore, I had to do a test with crutches. But we can talk about this another time, if someone is interested.

Conclusion

In this article, I did not pursue the goal of a deep comparison and analysis of the differences between SRP Batcher and GPU Instancing. Yes, and the instancing itself is described by me rather superficially. I rather wanted to talk about where it can be useful. And how he helped solve one specific problem.

The project from the article can be found here.

Fair criticism is welcome. Thanks for attention!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *