Experience in optimizing high-load systems

Hello! My name is Innokenty Kornilov, I am a leading development engineer at Bercut, where I have been working since 2013.

I develop and support the “Autoassembly” system, which dates back to 2007. This is a configuration management system that automates the processes of building, versioning and releasing releases; a key tool that ensures consistency in software development and release across our company. Thanks to Autoassembly, we can manage the assembly process of components and systems, maintain the version and readiness status of products, and ensure their correct delivery to customers.

The article is devoted to the history of overcoming some technical difficulties that arose at various stages of the project, and to a description of the key points that determined our approach to working on the system.

I acted in different roles on the project – from an analyst and architect to a development and testing specialist, team leader (I also participate in such commercial projects as Business Rules Engine, Mobile Number Portability, Service Profile Management and others).

Key features of the Autoassembly system

“Autoassembly” ensures the unified process of software development and release adopted by Bercut, providing the following capabilities:

  • Ensures the required quality and efficiency of development, implementation and maintenance of our solutions.

  • Uniquely identifies all artifacts used or created during the project.

  • Ensures the release of maintainable solutions without our participation.

  • Identifies and monitors the status of design artifacts at any given time, including their suitability and degree of completion.

  • Manages the assembly procedure of components and systems to ensure reproducibility, efficiency and necessary control.

  • Manages changes, ensuring the necessary integrity and traceability for configurations of design artifacts at all stages of development, delivery and maintenance.

    If you specify the exact version identifier of the system/component, you can always get exact instructions for assembling this version, and using these instructions:

  1. Build an identical copy from scratch from source code on “clean” equipment and obtain a full set of accurate and up-to-date operational and technical documentation for this version;

  2. Reproduce any change for any of the project artifacts from version to version, including changes to all artifacts included in this one.

  • Provides a unified development workflow and unified versioning of artifacts.

    Process-guaranteed versioning eliminates the possibility of problems such as overwriting tags in Git. “Autoassembly” controls and supports this process. Bercut has a strict backwards compatibility policy for developed components. Version changes within the first two digits guarantee users that the component will remain functional, will not require regression testing, and is only a bugfix.

  • Ensures consistency in artifact configurations. There is no need to create your own build scripts, as we use centralized compilers and build engines. There is no “zoo” of different approaches to assembly in different teams. A unified development and assembly process eliminates situations where some specialist has left and no one knows how to assemble a component. Assembly is always guaranteed.

  • Declaratively describes the configuration of components and systems.

    Bercut has many types of components that are assembled for various operating systems and using various compilers, from specific hardware platforms to NodeJS. “Auto-assembly” provides a single declarative configuration, which greatly simplifies the process of introducing new employees and the transition between projects.

  • Collects our solutions for various platforms and compilers.

Platforms:

WIN32, WIN64, SOLARIS 9, SOLARIS 10, SOLARIS 10×64, LINUX POWER PC, LINUX SUSE, JAVA, MULTIPLATFORM, LINUX SUSE x32, RHEL 5.5, RHEL 5.5 x32, RHEL 7, CENTOS 7, ALMA 9

Compilers:

DELPHI 6.0, DELPHI 10.0, DELPHI 10.3, DELPHI 11.0, DELPHI 11.1, MSVC 6.0, MSVC 8.0, MSVC 17.0, GCC 3.0, GCC 3.2, GCC 3.3, JDK 14, JDK 15, GCC 4.1, SUNC 5.8, LDC 2.0, T AB 1.4 Tab 1.6, Tab 2.0, Tab 2.1, JDK 16, JDK 17, JDK 18, JDK 19, JDK 130, JCDK 2.1, Liferey 5.1, NetBeans 6.5, Netbeans 8.2, Open ESB Studio 3.1, GCC 4.8.3,, WSDL 3.1, WSDL 3.2, WSDL 3.3, EXTENSION*.

*EXTENSION is an extension point that allows you to collect artifacts using any external builders, compilers and command line tools such as Gradle, Maven, Angular CLI, npm, Sencha Cmd and others.

  • Provides code reuse. Bercut has established a clear division between the artifacts it produces: components and systems. We assemble our products like construction kits, forming systems from components, and each artifact can have its own life cycle. We use a single registry and artifact repository. When developing new components and systems, we actively use previously created artifacts.

  • Provides simultaneous work with different version control systems.
    Historically, AutoBuild has been able to simultaneously interact with various version control systems (StarTeam, SVN, Bitbucket, GitLab, etc.).

  • Automates routine tasks: managing branches in the version control system (VCS) when releasing new versions of artifacts, creating labels for releases, and many other operations.

  • Provides tracking of component usage. We can clearly track which artifacts each component is used in, as well as which systems it is included in. This is especially important if an error is detected in a component, because… it is possible to determine exactly which systems it is integrated into. This information allows you to determine which systems require a reissue or update for the customer.

  • Support for local assembly gives developers the opportunity to independently assemble a component or system on their devices, guaranteeing an identical result to centralized assembly through the AutoBuilds portal.

  • The system can be used to conduct escrow proceduresallowing, based on shipped releases, to create a repository with source codes, from which identical releases can be created at the customer’s site.

  • Allows you to make changes to an artifact collected several years ago, rebuild it and get an exact copy with the corrections made.

    Even if team members leave, any other developer can easily join in, make the necessary changes, build the artifact, and be guaranteed to get the same product as before, only taking into account the changes made.

  • Collects metrics and analyzes code quality.

  • Provides the develop, pre-release and release life cycle.

    At the development stage, the artifact is transferred to pre-release status, which is a frozen section that cannot be reassembled. An artifact in pre-release status is submitted for testing, and if the tests are successful, the product is released.

  • Provides mechanisms for releasing patches.

    Allows you to release a minor patch to an already released system release.

System Description

What is “Autoassembly” from a technical point of view? It consists of the main server and build agents. Each operating system is served by a separate build agent.

Initially, on the main server, in addition to the system core and portal, there was a PostgreSQL database and a build agent.

There are several types of artifacts in Auto Assembly, such as components and systems. It is important to note that our products and solutions are component-based. We do not write every new product from scratch, but develop individual components responsible for specific functionality. We can reuse previously developed components: new components use existing ones.

A system is a final product that consists of components and may include other systems. The release build of the system is shipped to the customer.

“Autoassembly” is also a unified process for developing and releasing software at the company level. Any artifact in a configuration management system has its own unique version. Artifacts include system and component versions, builds, hot fixes, and so on. Each build has its own readiness status. “Auto-Assembly” eliminates the situation in which the customer may receive an artifact with the same version, but with different versions of the source code.

Components and systems are configured according to a single principle, regardless of what operating systems, what compilers the artifact is assembled for, or what language it is written in. The developer simply specifies how this artifact should be assembled, and “Auto-Assembly” does the rest.

To create a new component in AutoBuild, the developer goes to the portal, creates a component, specifies the version and name of the component, specifies compilers and operating systems, dependent components that are necessary to build the new artifact. When creating a component, AutoBuild automatically generates a path in the version control system in which the source code will be stored. Systems are created using the same principle.

When a developer submits his component for assembly (via the GUI, IDE or by commit), the task ends up in the general “Auto-Build” queue. In addition to the component being built, dependent components that use this component are also queued if they are not in pre-release or release status.

The queue displays build tasks submitted by all developers and informs about the predicted completion time of builds.

After a component is assembled, its assembly, which has its own unique version, is uploaded to the internal “Autoassemblies” repository. The component can then be used to assemble other components or systems.

Database optimization: indexes and splitting into separate machines

With the increase in the number of artifacts and the number of assembly tasks, the process began to slow down.

To increase performance and reduce the load on the central server, we decided:
• move the build agent to a separate machine from a central server;
• move the database to a separate machine;
• virtualize some physical machines and add resources to them.

We also analyzed existing indexes, identifying those that were not used in queries or were used too rarely. These indexes were removed to reduce the load on table update operations because Each index requires re-indexing when adding, updating, or deleting rows, which results in slower operations.

Then, based on statistical analysis of the most frequent and “heavy” queries, we created new indexes. The key point was to work with B-tree indexes, which provide efficient search for a range of values ​​and significantly reduce search time by reducing the number of pages that need to be read. The goal was to reduce the number of sequential scans of tables, replacing them with index scans. In parallel with the work on indexes, we reviewed and optimized SQL queries; Some queries were rewritten. Especially in that part of the logic where there were multiple calls to the database. Small multiple queries have been replaced with queries that return all the necessary data at once.

All these measures made it possible to increase the speed of assembly of systems and components by 2 times.

Scaling using agent pooling and parallel builds

At the next stage, Avtosborka continued to expand its functionality:

  • Along with SVNintegration with Bitbucket;

  • Integration with Jira has appeared;

  • Added the formation of a build history, which, regardless of the VCS used, showed code changes within each build, each release.

  • The list of supported operating systems has increased. At that time, “Autoassembly” was able to collect artifacts for different versions and different bit sizes of operating systems such as Solaris SPARC, RHEL, Windows, LinuxPWPC, LinuxSUSE, CentOS, etc. In total there were more than 12 options for combinations of OS and bit depth. Also, “Autoassembly” supported more than 30 versions of various compilers.

The number of products produced has increased significantly, and as a result, the load on the system.

Assembling each component is a complex task that involves not one, but several assembly processes. For example, for some types of components, for each operating system and compiler specified in the configuration, two types of assemblies are produced – one for debugging, and the second for the main one. If the artifact is supposed to be used in five different operating systems and each requires two versions for two compilers, in the end, taking into account the presence of a debugging version, we get 20 binary files created within one assembly.

In this case, the assembly of a component also entails the assembly of components that depend on it, if they need to be reassembled and are in the Develop status. Initially, the functionality for rebuilding dependent components was introduced for C++ applications. Staging a single component can lead to hundreds of builds to keep all associated artifacts consistent and up to date.

During the analysis, two significant problems were identified, the solution of which made it possible to speed up the entire process several times. The first bottleneck was the sequential assembly of the artifact for various operating systems, when the assembly for the next OS began only after the completion of the assembly for the previous one. As a solution, functionality was implemented that parallelized this process, which made it possible to speed up the assembly of one artifact assembled for different OSes by more than 4 times.

The second question was related to the simultaneous assembly of various artifacts for one operating system.

As I said above, one build agent was allocated in the system for assembly for one OS. As the workload grew, one agent became insufficient. Parallelizing the assembly on one host was also not a complete solution, since when assembling 3–4 components simultaneously, the assembly time of one component increased multiple times. As a solution, agent pool support was added for each OS. This improvement made it possible to simultaneously collect several artifacts under the same OS on different agents.

A load balancing mechanism was also added that provided the least loaded agent from the pool. Assembly for one operating system began to be performed on different hosts, in parallel, in 10 threads. This speeds up processing of the build queue by more than 10 times.

New challenges, integration with GitLab and process optimization

New challenges have arisen.

It was necessary to introduce new requirements from production. Integration with GitLab was added, and main development moved to use this version control system. In parallel, some systems and components remained in SVN and Bitbucket. Large-scale migrations of developed projects from Bitbucket and SVN to GitLab were carried out. Added functionality for continuous integration and CI/CD delivery, thanks to which any push in GitLab could initiate the build of the corresponding components, their dependencies and the systems in which these components are included. Avtosborka now has functionality for automatically shipping releases to customers. To interact with external systems, the RabbitMQ message broker was added, and an automatic analysis of the quality of the source codes of the assembled components was organized. Additional development and integration with Intellij IDEAwhich allows you to manage the state and assembly of features and artifacts directly from the IDE.

The number of connections between artifacts has increased. The number of artifacts in the system has increased. After analyzing the operating logic, we found a new point for optimization. When an artifact is assembled for various platforms, a dependency tree of this artifact is formed for assembly. And since there can be quite a lot of dependencies, creating such a tree can be a very expensive operation.

We have added an in-memory cache to the dependency tree. That is, during the assembly process, the dependency tree was formed once, and then it was reused when assembling for various operating systems. The presence of such a tree allowed us to speed up the assembly tens of times.

Previously, build agents used compilers located on the main server. To improve performance, compilers were placed directly on each build agent. In addition, to ensure the current state of the compilers, a special program was developed to synchronize them. Agents also have RAM disks, which are sections of RAM used as an ultra-fast alternative to a hard drive. Processes that require frequent and intensive disk access were transferred to them, which further speeded up the build process.

Conclusion

At the moment, “Autoassembly” is a highly loaded system that ensures the simultaneous work of hundreds of developers who collect their artifacts every day. For example, over the past month, > 10,000 tasks were delivered to the assembly. The AutoAssembly repository currently stores > 1,000,000 artifacts, and this number continues to grow.

She has gone through many changes in recent years. There are new challenges ahead. There are several key focuses in the near future:

  1. Optimizing database queries and improving indexes. We will continue to improve database performance to speed up data processing and improve overall system speed.

  2. Implementation of database sharding. This will help distribute the load more effectively, avoid bottlenecks when data volumes grow, and increase system fault tolerance.

  3. Using Redis for caching. This will reduce the access time to key information and reduce the load on the database, which will speed up the process of assembling and processing artifacts.

  4. Dynamic pool of build agents with containerization using Docker and Kubernetes. This will allow you to dynamically increase the number of agents depending on the current load on the system, providing flexibility and scalability of the pool to handle peak tasks.

  5. Transition to an event-driven architecture with RabbitMQ. This will allow the system to respond to events in real time, improve parallel processing of tasks and increase the flexibility of interaction between components, which will provide better scalability and resistance to high loads.

  6. Integration with domestic services. We plan to expand integration with domestic services by including project management systems, repositories, task schedulers, etc.

    With each new stage, Autoassembly will continue to develop and provide configuration and assembly management in our company.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *