Seven Talented AI @ Unity 2020 Trainees Part 1

Prospective students on the course “Unity Game Developer. Basic” we invite you to attend an open webinar on the topic “2d puzzle platformer”

In the meantime, we are sharing with you the translation of interesting material.


Each summer, AI @ Unity recruits a group of high-performance technology interns to advance our mission to empower Unity developers with AI and machine learning tools and services. Last summer was no exception and the AI ​​@ Unity team was delighted to welcome 24 talented interns. This series focuses on seven fellow researchers and engineers from the ML-Agents and Game Simulation teams: Yanchao Sun, Scott Jordan, PSankalp Patro, Aryan Mann, Christina Guan, Emma Park, and Chingiz Mardanov. Read on to learn about their experiences and accomplishments during their internship at Unity.

In the summer of 2020, we recruited 24 interns to the AI ​​@ Unity organization, seven of whom we will consider here. What is particularly remarkable is that all seven projects were experimental in nature, which undoubtedly helped us expand the boundaries of our products and services. All seven projects listed below will eventually be included in the base product in the coming months as new features that will surely delight our users.

The seven interns whose projects are reviewed in this series were part of the ML-Agents and Game Simulation teams.

  • ML-Agents Team is an applied research team that develops and maintains ML-Agents toolbox, open source project. The ML-Agents toolkit allows games and simulations on Unity to serve as a learning environment for machine learning algorithms. Developers use ML-Agents to train in-game AI or customize character behavior using deep reinforcement learning (RL) or imitation learning (IL). This avoids the tedium of traditional manual methods and hardcode. Besides the documentation on GitHub, you can read more about ML-Agents at this blogspot and research article

  • The Game Simulation team is a group of software product developers whose mission is to enable game developers to test and balance their game by executing multiple runs in the cloud in parallel. Game simulation was launched earlier this year, and you can learn more about it by reviewing the research on this topic that we have published with our partners iLLOGIKA and Furyion

As Unity grows, so does our internship program. The AI ​​@ Unity internship program will expand to 28 positions in 2021. In addition, we are hiring employees in other locations, including Orlando, San Francisco, Copenhagen, Vancouver and Vilnius, in a variety of roles, from software development to machine learning research. If you are interested in our 2021 internship program, you can leave your application here (We advise you to monitor this link, as we will publish additional internship vacancies in the coming weeks). Now we invite you to enjoy the many varied projects of our talented interns from the Summer 2020 Kit!

Yanchao Sun (ML-Agents): Transfer Learning

In most cases, a behavior trained with RL will work well in the environment in which it was trained, but in a similar, slightly modified environment, it will throw tangible errors. As a result, a little tweak to the dynamics of the game requires us to abandon the previous course of action and train everything from scratch. In the summer of 2020, I developed a new Transfer Learning algorithm specifically tailored to the incremental game development process.

Problem: Game development is incremental; RL – no

Game development happens in stages – a game usually starts with a simple prototype and gradually gets more complex. However, RL is not incremental and training takes time. Using ML-Agents in game development can be very costly as it can take a developer hours or even days to see how an RL agent will react to a change. While some approaches can make policy in the learning process more general, such as domain randomization, they only apply to game variations that can be clearly delineated prior to learning and cannot adapt to future arbitrary game evolutions.

Solution: Separating Presentation and Forecasting

Obviously, a small tweak to the dynamics of the game, or how the agent interacts with it, should leave the rest of the game largely unchanged. For example, giving an agent an improved sensor with a higher resolution changes the way it observes the world, but does not change how the world works. Following this logic, we have developed a new approach to transfer learning that extracts basic features of the environment that can be carried over to the next iteration of the same environment. The transfer learning algorithm uses the knowledge gained from solving one problem to facilitate the learning of another, but related problem. The transfer of knowledge about unchanging aspects of a problem can significantly increase the speed of learning in a new field.

Our work proposes a model that separates agent observation representations and environmental dynamics. So when the observation space changes but the dynamics do not, we reload and fix the dynamic view. Likewise, when the dynamics change, but the observation space does not, we reboot and fix the sensors. In both cases, the transferred parts act as regularizers for other parts of the model.

To test our method (which will be available in future versions of ML-Agents), we chose environments 3DBall and 3DBallHard from the ML-Agents toolbox. These environments have the same dynamics, but different observation spaces. We’ve also added an additional penalty to the number of agent actions. The agent’s goal is to balance the ball with the least amount of energy. To test the observation change, we first trained the policy based on the 3DBall model and then moved the modeling part for training to 3DBallHard. Compared to the standard algorithm Soft Actor-Critic and model-based single-task learning, the transfer learning method gets the most reward. We also evaluated the case of a change in dynamics by increasing the size of the ball. The results show that transfer learning is superior to single-tasking methods and is more stable!

Scott Jordan (ML-Agents): Task Parameterization and Active Learning

To customize the RL, the developer defines the desired behavior by specifying a reward function, a scalar function that returns a value indicating the desirability of a particular outcome. Writing a reward function can be challenging and often needs to be adjusted to accurately reflect the intent of the developer. This trial and error workflow can become extremely costly due to the RL’s need for data. As part of this internship, I looked at ways to improve this process using task parameterization and active learning algorithms.

Problem: Reward features need fine tuning

Consider an agent tasked with traveling to a given location. Most likely, the agent will learn to run to the target location as quickly as possible. For a developer, this behavior may be undesirable, and the game scenario may be too difficult for a human. Thus, the developer modifies the reward function to punish the agent for moving too quickly. The developer then retrains the agent using the new reward function and observes the learned behavior. This process is repeated until the game developer is satisfied. This iterative process can become extremely costly as the agent’s tasks become more complex.

Solution: Parameterization and task splitting

To eliminate the need to pinpoint the correct target, we use parameterized definition tasks of the agent’s objective function. In this option, the agent task has some parameters that can be defined or sampled. For the example from the previous paragraph, the agent’s task will be to move to a given location, and the speed at which the agent does this is a parameter. So, instead of specifying a single behavior for training, the developer instead specifies a range of behaviors, for example, so that the agent moves to a target location, but at different speeds. Then, after training, the problem parameters can be adjusted to best reflect the desired behavior. The ML-Agents toolkit currently includes an instance of the Walker agent, whose task is to operate at variable speed. In addition, we also created a version Puppo variable speed that tells him how fast to run while playing fetch, and has a head height parameter that teaches Puppo to jump on his hind legs.

Parameterized tasks are useful for learning several behaviors, but the question of which parameterizations we should use in training is not trivial. The naive approach is to use a random sample of parameters, but it can be ineffective for a number of reasons. We choose a smarter alternative – active learning (Da Silva et al., 2014). Active learning is a technique for learning which parameters to use for training in order to maximize the expected improvement of an agent during training. This allows the agent to define all the parameters of the task with fewer samples. Below, we compare active training with uniformly randomized parameters for the problem from the previous Puppo example with head height and variable speed parameters.


Learn more about the course “Unity Game Developer. Basic”.

Sign up for an open lesson “2d puzzle platformer”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *