Automation of CQC on CI

These days there is an internal Aventus IT conference. And I was there, drinking beer… Just kidding. He spoke on the topic of automating code quality control. And I would like to present you an article based on the materials of my speech.

Today there were two excellent reports about tests on the project. I was glad that I heard some of my ideas in them. For example, about the advantages of running tests on CI instead of a local machine. It turns out that I'm not the only one.

For my part, I want to talk about code quality control in addition to tests. I tried to make the report as practical as possible. However, not without the necessary lyrics. In this article I will start with some perhaps banal things, from which we will move on to less obvious and, at the same time, important things.

Automation motivation

First, let's figure out why this is needed.

The software product has a customer, so first of all we will answer what the business needs: the business needs a system that performs business tasks. And one of the business requirements is that this system must work stably. Business essentially doesn’t care how many crutches we put in there. He doesn't care about the beauty of the code. The main thing that interests him is that the system must work… And at the lowest possible cost. And this is where we have common ground.

How can we ensure the quality of a software product? It all starts with manual control:

It takes a lot of time. Appropriate salaries. So Code Review should ideally be carried out by a stronger specialist than the one who wrote the code. If we are not talking about the cross-review training function, then the main task of the review is to find errors and weaknesses in the proposed solution. The same goes for QA. They are unreasonably underrated. Ideally, a QA specialist should understand the logic of the system better than the developer! After all, he controls the developer. This requires appropriate qualifications. And the salary. It's expensive!

What happens if you automate code review? And now we are not talking about tests that are live and that must constantly change along with changes in the project. There are many things that need to be configured once and then continue to receive profit all the time. All that remains is to pay for server time for the runner on CI. How much does a month of computer time allocated to a runner cost on a CI project? 50-100-200-300-500 dollars? Here the support team can give more accurate numbers. Of course, this is not all the costs, however, it is much cheaper than the time of 2-3 employees (QA + dev). Let's take an estimated $5,000 for all participants in the process.

Now take the difference between these numbers and multiply by 12 months and at least 2-3 years. And count for 10 years. Many projects live even longer. The figure is impressive.

What we are setting up

Being a lead, first of all, I started saving my time on code reviews. And I installed the corresponding scripts on my project.

PHP CS Fixer (code style checker)

We start by checking the code style.

Why is this important? Although we write the code only once, we read the code many, many times. And the faster a developer understands someone else’s code, the faster he will complete his task. To do this, the code of different developers should look as similar as possible.

There is an opinion:

Code maintainability, including readability, is more important than the optimality of algorithms!

Firstly, gone are the days of mainframes, when it was much more expensive to buy additional hardware than to hire a team of developers writing machine code.

Secondly, most of any project is not high-load. And if we encounter a load problem, we can always add hardware and then FINELY optimize performance. This way the overall cost of creating and owning the system will be lower.

It is important to adhere to standards here. A project in Symfony means we include the rules adopted in Symfony components. This immediately removes the question “I think it’s prettier this way.” And when all projects adhere to a standard approach, the threshold for new developers to enter the project is reduced.

How to simplify implementation into a project?

  1. See which errors are not common and correct them. The fixer can do this automatically.

  2. Temporarily disable some rules that are difficult to correct immediately.

  3. Use baseline. The fixer doesn't do this out of the box, so I created a package that provides this functionality: https://github.com/Aeliot-Tm/php-cs-fixer-baseline. It allows you to softly disable scanning of some files until they are modified. This is especially useful on projects with a large code base. Additionally, we speed up checking on CI due to the fact that only new files and modified ones are checked.

Code sniffer

squizlabs/php_codesniffer

An alternative to the previous style checking script. We use it on the project, because There are useful rules that are not in the fixer. For example, the length of the string.

PHPStan (static code analysis)

He looks for errors in logic in the code. For example, a mismatch of data types in operations. For example, when we compare cold with soft)) Null-pointer and much more.

An important parameter: “level” from 1 to 8 ー this is something like a set of rule presets. The higher the number, the more rules are included.

Has many plugins. We use:

  • phpstan/phpstan-doctrine – checks DQL, determines return types of repository magic methods, etc.

  • phpstan/phpstan-symfony – allows you to correctly determine the types returned from the container and more.

  • phpstan/phpstan-phpunit – improves mock checking.

They can be connected either manually in the main config or using phpstan/extension-installer.

There is also a subtlety with their settings.

For phpstan/phpstan-doctrine you must provide a file that returns an ObjectManager.

doctrine:
   objectManagerLoader: ../../tests/object-manager.php
   allowNullablePropertyForRequiredField: true

Phpstan/phpstan-symfony requires an application loader and a path to the container cache xml file.

symfony:
   console_application_loader: ../../tests/console-application.php
   container_xml_path: ../../var/cache/test/App_KernelTestDebugContainer.xml

YAML file linter

There are a lot of YAML files on the project. So you can’t go anywhere without a linter. We use the yamllint console utility. And in order not to waste time on installing it, we assembled our own docker container.

TWIG file linter

If you render anything from twig files, then do not forget to check their syntax with a linter:

bin/console lint:twig

Checking the consistency of translation files

Is it multilingual? How to check?

There is a standard Symfony command that covers some of the needs:

bin/console debug:translation --all uk

But she can't do everything. For example, it does not compare several languages ​​with each other in search of missing values. In addition, on older versions of Symfony it was completely unsuitable for CI, because did not give error codes when they were detected.

If the translations are in YAML files, and as a rule they are in them, then install the package https://github.com/Aeliot-Tm/translation-maintainand enjoy life. I once did it myself. And set up the command for CI:

bin/console aeliot_trans_maintain:lint:yaml base file_transformed

It has different ready-made rules and sets of them. An additional bonus is that the package provides a number of additional features for managing translation files at the development stage. Including normalization of file structure and automatic translation of missing values ​​into the desired language. You can read more in the article on Habré: https://habr.com/ru/articles/555954/

Container linter

Most Symfony projects have configuration files in YAML format. The YAML file linter will highlight only syntax errors. To check for logical errors in defining parameters and services, you need a container linter. And it's in Symfony out of the box:

bin/console lint:container

Checking the Database Schema

In Symfony projects, the database structure is usually described using attributes/annotations. Next, migrations are created to transfer structure changes. Unfortunately, not all Symfony developers create migrations using these commands:

Can be used doctrine:migrations:generate and manually adding SQL statements. Or the developer may forget to create the migration or fail to save it. Or work in one branch, perform migrations. Then switch to another and create a new one there. At the same time, forgetting to roll back the previous migration. After that, he will go to remove unnecessary instructions from the new migration and can delete something necessary.

Over time, you will almost certainly develop a desynchronization between the database structure described in the entities and the actual database structure. How not to miss this on prod? As you already understood, we set up a command for CI:

bin/console doctrine:schema:validate -n -v

Here two arguments are responsible for displaying more detailed information about the error and disabling interactive mode for correct operation on CI.

What if you have tables in your database that are not governed by doctrine? You need to set up filtering and then Doctrine will do everything you need!

PHPUnit

What would it be like without him? I won't dive deep. I'll just note a few points:

Firstly, final classes can be mocked perfectly. Just need the package dg/bypass-finals. Then one line in the bootstrap file and off we go.

BypassFinals::enable();

We will now debate whether the final instruction is necessary or unnecessary. As well as discussing the creation of one-used interfaces only for the sake of tests (read test code in production). Just accept the fact that you can get them wet.

Not for free, of course. PHPStorm will swear and PHPStan will be unhappy (baseline to the rescue), but it will work.

Secondly, we wrap all operations in the database in transactions and roll back changes automatically using dama/doctrine-test-bundle. Just install the package and add the extension to the configs. And you won't need to monitor this in the test and tearDown setup.

<extensions>
​ <extension class="DAMA\DoctrineTestBundle\PHPUnit\PHPUnitExtension"/>​
</extensions>

Thirdly, if you want to play with test coverage control, you will need an additional package https://github.com/Aeliot-Tm/phpunit-codecoverage-baselinebecause PHPUnit out of the box can calculate test coverage, but cannot compare the previous state and the new one.

Checking composer files

We remember about self-control. The composer's own files should also be checked, because… developers are people too and may forget to comment out some of the changes. The composer's own team will help control this:

composer validate --strict

Keeping track of unused dependencies

I have a policy that broken code should be removed. From the source of bugs and reading it, time is wasted. The same goes for packages (libraries). During the project's life cycle, we may start using some libraries and then stop using them. And it is not always possible for a developer to control and remove dependencies in composer files in a timely manner. Therefore, we install the package icanhazstring/composer-unused. It allows you to control not only installed packages, but also the requirements for the presence of certain PHP modules in the system.

Explicitly adding dependencies

To avoid getting into trouble and losing a used component when updating packages. Installing the package https://github.com/maglnet/ComposerRequireChecker

If this package is not installed and configured, then any package update is like walking barefoot through a minefield with a gun to your temple. When looking for a solution to a business problem, a developer may take an installed package, but install it as an implicit dependency. And after the update, this package may disappear from the system. And it’s good if the project breaks right away. What if the dependency was used in part of the project that was not covered by tests? Here the joyful bugs shout: “Users, we are coming to you” 🙂

Security vulnerabilities check of composer

Even at the installation stage, it is necessary to check packages for known vulnerabilities. To do this, install the package as a dev dependency: roave/security-advisories

But this is not enough. We need to check the dependencies used for the presence of known vulnerabilities after installation. To do this, install the package: enlightn/security-checker

And set up the script:

vendor/bin/security-checker security:check composer.lock

Scan container on vulnerabilities

We check the security of Docker container components using the trivy utility. We do this when assembling the basic core. And then according to the schedule or additionally in the main project, because From always before your eyes.

https://github.com/aquasecurity/trivy

Creating tickets using TODO comments in code

TODOs in code are a great tool, but without proper control they turn into garbage. To avoid this, some projects simply prohibit their use.

Usually how? The developer added TODO in the code and only he knows about it. Or even worse, when the project prohibits adding such comments with words, either fix it right away or create a normal ticket. It takes a long time to create a ticket. So the developer doesn't do anything about it.

And then the Lead comes to the business and asks for time to fix some ephemeral technical issues. debt. And no one can answer the question “how much is there”, since no one counted it.

And again my package: https://github.com/Aeliot-Tm/todo-registrar

So, for each issue, a ticket appears in JIRA (or Github issues) and is assessed for grooming. And then the magic of management happens to him. This greatly simplifies technical management. debt.

There is an article on Habré where everything is described in detail https://habr.com/ru/articles/832994/

We monitor the removal of outdated TODOs

We install the package: https://github.com/staabm/phpstan-todo-by

And we configure a separate launch of PHPStan for it, so that this job is controlled by an isolated set of rules.

Automation of control of architectural solutions

There are two ways:

  1. Custom rules for PHPStan

  2. Installing the package PHPAT Which is essentially an add-on to PHPStan and provides a simplified interface for setting up your own rules.

Section results

And that's not all that can be configured on CI! There is an opinion:

The quality of a project's code can be approximated by the number of automatically checked rules before the developer's code gets into the main code base.

Good practices in setting up scripts

Creating a single configuration point for dev-tools

Ideally, scripts should be run on the local machine in the same way as on CI. You can give instructions, but a long command with many parameters may be inconvenient for local use. In addition, during the application development life cycle, you may need to adjust script settings or move something somewhere. Add here work in different versions (branches). And here the creation of a single entry point arises. After a series of experiments, the choice fell on composer scripts. This way we have a single entry point for both CI and local development. With a constant list of commands in different versions of the code. + Make file to simplify calling a Docker container from outside.

So, for example, when the number of dev script configs in the project root began to exceed reasonable sections and we began to move them to a separate folder. For developers, nothing has changed. Then we began to switch to using PHAR files to install the same scripts instead of the composer. And nothing has changed for the developer either. He called the composer cs-fix command and continued. Regardless of what branch he is currently in.

Moving configs for dev scripts to a separate folder

Most scripts by default rely on the presence of a config in the project root. This allows you to simplify calling the script and not have to pass the path to it. Over time, there are a lot of configs. Especially when you have to divide scripts into several calls with different settings to speed up pipelines. In our project, this is a scripts folder with subfolders inside.

Using “inheritance” in job settings

The YAML file syntax supported by Gitlab CI allows you to put the general properties of a job into a separate block and inherit from it. This greatly improves the readability of the code and reduces the burden of maintaining it due to reduced copy-paste. After this, adding a new job turns into a couple of lines:

'Composer check obvious requiring of packages':
   <<: *run_tests
   script:
       - composer require-check


'Composer check unused dependencies':
   <<: *run_tests
   script:
       - composer unused -- --output-format=gitlab


'Doctrine database Schema Validation':
   <<: *run_tests_integration
   script:
       - composer db-validate

Here you can see that in CI we use the same single point for scripts. As a result, any changes and reconfigurations of these scripts do not force us to reconfigure Gitlab jobs.

Reducing dependency-hell using PHAR/Phive

Each dev-script can bring its own dependencies. The more dependencies there are in a composer.json file, the more difficult it is to update them to new versions. As a result, this may lead to the impossibility of installing certain components. This problem can be solved by installing dev scripts as PHAR files. The easiest way to do this is with Phive. This is something like composer only for installing PHAR files.

Important note! By default, Phive installs packages into a project as symlinks and without the .phar extension. In this case, PHPStorm does not understand what contents this file has and syntax highlighting in script configs does not work. Therefore, I recommend always correcting the package installation parameters in the phive.xml file.

Speed ​​of passing pipelines on CI

It is very important how quickly your pipelines are processed. How long can they be performed and what time is considered normal?

I have come across a statistical analysis in the literature that says that 15 minutes to complete a pipeline is acceptable. I would say that no more than one and a half minutes is acceptable. And I have arguments:

Firstly, burning prod. Business does not like such words. Our task as engineers is to calmly take into account all the risks. With automatic checks configured, it will not be possible to “push” some code into production until the checks are completed. Yes, and it's dangerous. And the longer the pipelines take, the longer the downtime in case of a fakap. We want to reduce such risks.

Secondly, stop tormenting developers with running local checks! Objectively, not all guys have very powerful computers. And it is not always possible to provide the appropriate equipment. Running all checks locally can take a long time. And during this time the developer cannot do anything. When CI checks are fast, it becomes more convenient to check the code on the server. It's fast. In addition, I pushed the code and can immediately do something else useful. We have a winning of 10-30 minutes on each push. And over the course of a year, a substantial supply accumulates. Plus, we have a good habit of always pushing on the server. This ensures the safety of the code and prompt access to it by the entire team.

Thirdly, competition for resources. The more developers push code into the repository at the same time, the longer you will have to wait until the runner’s resources are freed up to run your tests. This is important when there is a massive push by developers. If your runner works 30 parallel jobs. And in one pipeline there are 10 of them. Then it’s enough for the 3rd developer to push the code, and everyone else will do their turn. As a result, 15 minutes on the pipeline turns into 30 or 45 minutes. If your maple is executed in a minute, then if it takes 2-3 minutes, no one will really feel it.

Fourth, cost reduction. The faster checks are completed, the less resources need to be allocated to runners. And less developer downtime waiting for automated checks to complete.

Optimizing container assembly

The first thing to do is to optimize the container assembly. Now we won’t go into detail, but just consider the general optimization plan.

Let's look at the example of php.

First of all, the so-called layer-by-layer assembly. It speeds up pipelines well due to layer caching.

The next stage is to divide the image into parts.

  1. Installing basic modules (all used in the production before copying the code into it)

  2. Dev layer. Installing things like xDebug and stuff. A separate image based on the previous one.

  3. Installing vendors with/without dev

The first two, as rarely changed, are moved to a separate repository and manually versioned. 3rd ー We automatically version according to the hash of composer.json and phive.xml. And we store the 3rd layers in the main repository. In two versions: with and without dev dependencies. We change the version of the 3rd layer in the code and commit it to the same branch automatically in the same pipeline. It can be a little confusing, but the container build time for most merge requests is a matter of seconds.

So in most cases we use a ready-made container and do not waste time assembling it.

Same with other containers. Wherever you manage modules and global settings.

Warm up the cache once for all jobs in the pipeline

All we have to do is copy the code into the container and warm up the cache. To do this, we created a separate step responsible for assembling artifacts

'Build Common Artifacts':
    stage: build-artifacts
    image: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
    artifacts:
        untracked: true
        expire_in: 7 days
    script:
        - bin/decode_files "." "${IV}" "${KEY}"
        - bin/console cache:clear
    variables:
        APP_ENV: "test"

Next, in the basic template for job tests, we specify the dependency on this job:

.run_tests: &run_tests
    stage: tests
    image: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
    needs:
        - job: 'Build Common Artifacts'
          artifacts: true

Mark jobs as tornable

GitLab has great instructions for jobs

interruptible: true

Its essence is that if the job has not had time to work, then the next push to the branza will interrupt its execution. This frees up resources and speeds up the pipeline.

We split jobs and parallelize them

  • We split the tests into presets

  • Parallels is a great utility. Especially for tests in combination with Paratest. Allows you to run each individual test in parallel processes. This further reduces the overall pipeline time.

  • Despite the fact that PHPStan works in parallel threads according to the number of available processor cores. This is sometimes not enough. We split one PHPStan job into several to check different rules in parallel and possibly different sections of the project. Therefore, we break it down into different jobs. It allows you to fine-tune in which parks to check the code, and in which only to scan it for the correct analysis of the dependencies used.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *