Turbolift – a tool for large-scale refactoring

Skyscanner systems are hardly small-scale. Our website and app are used by millions of travelers every month, and we handle mind-boggling request volumes using a microservice architecture that is far from small in itself. Collectively, we have several hundred microservices and microsites (web applications that support a specific portion of our site) powered by hundreds of AWS Lambda instances and libraries. Each of these tools are stored in their own GitHub repository, which has some benefits in terms of separation of concerns, but comes at a cost: when the same change needs to be done in all of these repositories, how can it be done?

Most of our microservices use shared shared libraries, so updating to install a new security patch, improve resiliency, or add monitoring functionality (for example) is often a relatively straightforward task performed by the Dependabot service.

However, not every change you need to make is done in the library. Despite our best efforts, we have still there are boilerplate configurations and code that needs to be improved from time to time. And while we are reducing the number of repositories where possible (including by merging repositories when it makes sense), we still have many repositories left.

We need to be able to perform fairly complex changes simultaneously in dozens and hundreds of repositories.

For a long time we have been developing our own internal system called Codelift. Primarily, it was a batch processing system that at night applied a Python change script to each of hundreds of repositories, submitting code change proposals to other people’s repositories (PRs) for all such changes. But, as it turned out, it is very difficult to write such a script that would reliably work with all repositories. The main bottleneck was the need for skilled people to test these change scenarios. And the scripts themselves often required multiple rounds of customization to overcome the inevitable glitches. The Codelift system was gradually decommissioned, but the need for it remained.

The emergence of Turbolift

The Turbolift System is a rethinking of the process of making mass changes.

  • Previously, in order to write a reliably working script for making a change in the Codelift system, engineers had to create local copies (clones) of many or even all of the repositories involved just to test the functionality of this change. But if the engineers are going to create local copies of the repos anyway, why not make it part of the process?

  • Preparing change scripts in Python had its limitations: sometimes the easiest way to implement a change is to simply call a command from the shell, or run a more specialized refactoring tool such as codemod or comby. Sometimes it is preferable to call an editor or an integrated development environment – this will be, albeit cumbersome, but the surest way. Sometimes the simplest option is to automatically change, which will work for 95% of the repositories, followed by manual configuration for multiple repositories where such configuration is required.

  • Availability of change scenarios by itself useful only if you plan to re-perform the same bulk refactoring operation. But in many cases it can be stated with full confidence that these are one-time changes. And while it’s important to log what we did, it doesn’t have to be done in the form of a reusable script.

  • One of the subtle problems with Codelift was that all of its PR proposals came from a bot user: at the same time, the owners of the Codelift system were expected to thoroughly review each change, and this became a serious bottleneck. We realized that it would be best to create proposals for code changes in other people’s repositories by an engineer who is actually responsible for their implementation. In this case, the transparency of the distribution of rights, easier feedback and no need to create a whole team of intermediaries are ensured.

The Turbolift system automates the most tedious stages of this process: massive parallelization of alternative options for performing the process, cloning, creating PR proposals without introducing any contradictions when making the actual changes themselves. Engineers can directly validate, modify, and test their changes using the tools they need, allowing them to work with much more impact than just submitting a script to a batch system and waiting for results.

There are obvious downsides to creating clones on developer machines: it takes time and disk space. But, in our opinion, reducing the information load on engineers will offset these costs.

Turbolift started out as a hastily written bash of bash scripts, but it quickly proved its usefulness. Now that we’ve rewrote this system in Go, tidied it up and made it an open source tool, I would like to share with you. Compared to the original version, Go has helped make this tool more user-friendly and maintainable in the long run. We have many ideas for further development of this tool, and we welcome any suggestions from you on how to improve it.

If you start working with Turbolift, we advise you to pay special attention to the needs of engineers analyzing proposals for code changes in other people’s repositories, especially if many such PR proposals are being created. The README file for this project contains several guidelines that we have developed internally to help change authors keep them within reasonable limits.

How Turbolift helped us

  • As the time was approaching for any internal SSL certificate to expire, our industrial platform support team used Turbolift to execute PRs from hundreds of repos that contained links to the expiring certificate.

  • Turbolift is used by our web support team to standardize versions and test libraries on our microsites.

  • Our production platform support team used Turbolift to fix a bug that was once made in a code template and then replicated across multiple repositories.

  • Regional teams were able to clean up and update the repository metadata files that tracked owner and other information. Updating these files was a routine but necessary work before, as regional teams changed names or changed owners of repositories.

In total, over the past three months, using Turbolift, we have sent over 1200 internal proposals for code changes in other people’s repositories. Each of these cases denotes a resolved issue or a corrected technical flaw that would otherwise turn into hand-crafted PR proposals. We hope that engineers at Skyscanner and others will take full advantage of the simplified workflow when making large-scale changes.

Turbolift is written in Go – a compiled language from Google, which you will learn from scratch in a year in a course Go backend developer – from key concepts in IT, Linux basics to Go for DevOps… We use a model of fundamental education, so you will receive not only practical skills, but also a strong theoretical basis, learn to think in a new way – and experts in their field and mentors will help you with this, who will be happy to answer your questions and pass on their knowledge to you. …

find outhow to level up in other specialties or master them from scratch:

Other professions and courses

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *