Cascadeur: predicting a character's six-point pose


We want to outline in general terms about the first achievements with deep learning in character animation for our Cascadeur program.

While working on Shadow Fight 3, we accumulated a lot of combat animation – about 1100 movements with an average duration of about 4 seconds. It seemed to us long ago that this could be a good dataset for training some kind of neural network.

Once we noticed that when the animators make the first sketches of ideas on paper, then they just need to draw a literally stick man to imagine the character's pose. We thought that since an experienced animator can set a pose well in a simple pattern, it is quite possible that the neural network can handle it. From this observation, a simple idea was born: let's take only 6 key points from each pose – wrists, ankles, pelvis and base of the neck. If the neural network knows only the positions of these points, can it predict the rest of the pose – the position of the 37 remaining points of the character?

How to arrange the learning process, it was clear from the very beginning: at the entrance, the network receives the positions of 6 points from a specific pose, at the output it gives the positions of the remaining 37 points, and we compare them with the positions that were in the initial position. In the evaluation function, you can use the least squares method for the distances between the predicted positions of the points and the source.

For the training dataset, we had all the movements of the characters from Shadow Fight 3. We took poses from each frame, and we got about 115,000 poses. But this set was specific – the character almost always looked along the X axis, and the left leg was always in front at the beginning of the movement. To solve this problem, we artificially expanded the dataset by generating mirror poses, and also randomly rotating each pose in space. It also allowed us to increase the dataset to two million poses. We used 95% of our dataset for network training and 5% for parameterization and testing.

image

We took a fairly simple neural network architecture – a fully-connected five-layer network with an activation function and an initialization method from Self-Normalizing Neural Networks. On the last layer, activation is not used. Having 3 coordinates for each node, we get an input layer of 6 * 3 elements and an output layer of 37 * 3 elements. We searched for the optimal architecture for hidden layers and settled on a five-layer architecture with the number of neurons of 300, 400, 300, 200 on each hidden layer, but networks with fewer hidden layers also produced good results. L2 regularization of network parameters was also very useful, it made predictions smoother and more continuous.

A neural network with such parameters predicts the position of points with an average error of 3.5 cm. This is a very high error, but the specifics of the problem must be taken into account. For one set of input values, there may be many possible output values. Therefore, the neural network eventually learned to issue the most probable, averaged predictions. However, when the number of input points increased to 16, the error decreased by half, which in practice yielded a very accurate prediction of the pose.

But at the same time, the neural network could not give out a completely correct pose, preserving the lengths of all bones and the correct joint joints. Therefore, we additionally launch an optimization process that aligns all the solid bodies and joints of our physical model.

In practice, the results were quite convincing – you can see them in our video. But there is also a specificity due to the fact that the training dataset is combat animations from a fighting game with weapons. For example, a character seems to suggest that he turns with one shoulder towards the enemy, as in a fighting stance, and accordingly turns his feet and head. And when you stretch out his hand, the brush does not turn as if it were hit with a fist, but like when hit by a sword.

The Banzai Games team requires a Deep learning researcher. Read more about the vacancy here.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *