How to embed a static code analyzer in a legacy project and not demotivate the command

PVS-Studio protects the programmer’s sleep

Trying a static code analyzer is easy. But to implement it, especially in the development of a large old project, you need skill. With the wrong approach, the analyzer can add work, slow down development and demotivate the team. Let’s talk briefly about the right way to integrate static analysis into the development process and start using it as part of the CI / CD.

Introduction

Recently I’ve been attracted by the publication “Getting Started With Static Analysis Without Overwhelming the Team“. On the one hand, this is a good article, which is worth getting to know. On the other hand, it seems to me that it didn’t give a complete answer how to painlessly introduce static analysis into a project with a lot of legacy code. The article says that you can put up with technical debt and work only with the new code, but there is no answer to what to do with this technical debt afterwards.

Our PVS-Studio team offers its own view on this topic. Let’s look at how the problem of introducing a static code analyzer generally arises, how to overcome this problem and how to gradually eliminate technical debt painlessly.

Issue

Running and seeing how a static analyzer works is usually easy[[1]. You can see interesting errors in the code or even terrible potential vulnerabilities. You can even fix something, but then many programmers give up.

All static analyzers give false positives. This is a feature of the static code analysis methodology, and nothing can be done with it. In the general case, this is an unsolvable problem, which is confirmed by Rice’s theorem.[[2]. Machine learning algorithms will not help either[[3]. If even a person cannot always say whether one or another code is wrong, then you should not expect this from the program :).

False alarms are not a problem if a static analyzer is already configured:

  • Disabled rule sets are disabled;
  • Disabled some irrelevant diagnostics;
  • If we are talking about C or C ++, macros containing specific constructs are marked up, because of which there are useless warnings in every place where such macros are used;
  • Own functions that perform actions similar to system functions (its analogue memcpy or printf)[[4];
  • Pointly using comments disabled false positives;
  • Etc.

In this case, we can expect a low level of false positives at the level of the order of 10-15%[[five]. In other words, 9 out of 10 analyzer warnings will indicate a real problem in the code, or at least a “strong smell code”. You must admit that such a scenario is extremely pleasant, and the analyzer is a real friend of the programmer.

Warnings

In reality, in a large project, the initial picture will be completely different. The analyzer generates hundreds or thousands of warnings on legacy code. It is impossible to quickly understand which of these warnings in the case and what is not. To sit down and begin to deal with all these warnings is irrational, since the main work in this case will stop for days or weeks. Typically, a team cannot afford such a scenario. There will also be a huge number of diffs that spoil the history of changes. A quick mass editing of so many fragments in the code will inevitably result in new typos and errors.

And most importantly, such a feat in the fight against warnings makes little sense. Agree that since the project has been working successfully for many years, then most of the critical errors in it have already been fixed. Yes, these fixes were very expensive, I had to debug, get negative user feedback about bugs and so on. A static analyzer would help to fix many of these errors even at the stage of writing code, quickly and cheaply. But at the moment, one way or another, these errors are fixed, and the analyzer mainly detects non-critical errors in the old code. This code may not be used, it is used very rarely, and an error in it may not lead to noticeable consequences. Perhaps, somewhere the shadow of the button is not the right color, but this does not bother anyone to use the product.

Of course, even non-serious errors still remain errors. And sometimes a real vulnerability can hide behind an error. However, to give up everything and deal with defects that are poorly manifested for days / weeks seems like a dubious idea.

Programmers look, look, look at all these warnings at the old working code … And they think: we can do without static analysis. Let’s go better to write new useful functionality.

In their own way, they are right. They believe that for starters they must somehow get rid of all these warnings. And only then can they benefit from the regular use of the code analyzer. Otherwise, the new warnings will simply sink into the old ones, and nobody will pay attention to them.

This is the same analogy as with compiler warnings. It is not without reason that they recommend keeping the number of compiler warnings at 0. If there are 1000 warnings, then when they become 1001, no one will pay attention to it, and it is not clear where to look for this latest warning.

Incorrect implementation of static analysis

The worst thing in this story is if someone on top at this moment forces you to use static code analysis. This only demotivates the team, since from their point of view there will be additional bureaucratic complexity that only interferes. No one will look at the analyzers’ reports, and all use will only be “on paper”. Those. Formally, the analysis is built into the DevOps process, but in practice, nobody benefits from this. We heard detailed stories when communicating on stands from conference visitors. Such an experience can permanently, if not always, discourage programmers from contacting static analysis tools.

Implementation and elimination of technical debt

In fact, there is nothing complicated and scary in introducing static analysis even into a large old project.

CI / CD

Moreover, the analyzer can immediately be made part of the continuous development process. For example, in the PVS-Studio distribution kit there are utilities for convenient viewing the report in the format you need, and for notifying developers who wrote problematic sections of code. For those who are more interested in launching PVS-Studio from under CI / CD systems, I recommend that you familiarize yourself with the corresponding section documentation and series of articles:

But back to the issue of a large number of false positives in the first stages of implementing code analysis tools.

Fixing existing technical debt and working with new alerts

Modern commercial static analyzers allow you to study only new warnings that appear in new or modified code. The implementation of this mechanism is different, but the essence is the same. In the PVS-Studio static analyzer, this functionality is implemented as follows.

To quickly start using static analysis, we offer PVS-Studio users to use the mass alert suppression mechanism[[6]. The general idea is as follows. The user started the analyzer and received many warnings. Since a project that has been developed for many years, is alive, develops and makes money, then most likely there will not be many warnings in the report indicating critical defects. In other words, critical bugs have already been fixed in one way or another in more expensive ways or thanks to feedback from customers. Accordingly, everything that the analyzer now finds can be considered a technical debt, which is impractical to try to eliminate immediately.

You can tell PVS-Studio to consider these warnings as irrelevant (postpone technical debt for later), and it will not show them anymore. The analyzer creates a special file where it stores information about so far uninteresting errors. And now PVS-Studio will issue warnings only on new or changed code. Moreover, all this is implemented smartly. If, for example, an empty line is added to the top of the source file, then the analyzer understands that, in fact, nothing has changed, and will remain silent. This markup file can be embedded in the version control system. The file is large, but it’s not scary, because often it makes no sense to lay it.

Now all programmers will see warnings related only to the new or changed code. Thus, the analyzer can begin to use what is called the next day. And it will be possible to return to technical debt later, and gradually correct errors and configure the analyzer.

So, the first problem with the implementation of the analyzer in a large old project is solved. Now let’s figure out what to do with technical debt.

Bug Fixes and Refactoring

The simplest and most natural is to take some time to parse the massively suppressed analyzer warnings and gradually deal with them. Somewhere it is necessary to correct errors in the code, somewhere to conduct refactoring in order to tell the analyzer that the code is not problematic. A simple example:

if (a = b)

Most C ++ compilers and analyzers swear at such code, since there is a high probability that they really wanted to write (a == b). But there is a tacit agreement, and it is usually noted in the documentation that if there are additional brackets, then it is believed that the programmer consciously wrote such code, and there is no need to swear. For example, in the PVS-Studio documentation for diagnostics V559 (CWE-481) it is explicitly written that the following line will be considered correct and safe:

if ((a = b))

Another example. Forgotten in this C ++ code break or not?

case A:
  foo();
case B:
  bar();
  break;

PVS-Studio analyzer will generate a warning here V796 (CWE-484). Perhaps this is not a mistake, and then you should give the analyzer a hint by adding the attribute [[fallthrough]] or for example __attribute __ ((fallthrough)):

case A:
  foo();
  [[fallthrough]];
case B:
  bar();
  break;

We can say that such code changes do not fix errors. Yes, it is, but there are two useful steps. Firstly, the analyzer report eliminates false positives. Secondly, the code becomes more understandable for people involved in its maintenance. And it is very important! For the sake of this alone, it is already worth carrying out minor refactoring so that the code becomes more understandable and easier to maintain. Since the analyzer does not understand whether “break” is needed or not, it will be incomprehensible to fellow programmers as well.

In addition to error correction and refactoring, you can precisely suppress clearly false analyzer warnings. Some irrelevant diagnostics can be disabled. For example, someone finds warnings pointless. V550 about comparing float / double values. And someone considers them important and worth studying.[[7]. It’s up to the development team to decide which warnings are relevant and which not.

There are other ways to suppress false warnings. For example, macro markup has already been mentioned. All this is described in more detail in the documentation. The most important thing is to understand that if you gradually and systematically approach working with false positives, there is nothing wrong with them. The vast majority of uninteresting warnings disappear after configuration, and there are only places that really require careful study and some changes in the code.

Also, we always help our clients set up PVS-Studio if there are any difficulties. Moreover, there were cases when we ourselves eliminated false warnings and corrected errors[[8]. Just in case, I decided to mention that such an option for expanded cooperation is also possible :).

Ratchet method

There is another interesting approach to gradually improve the quality of the code, eliminating the warning of the static analyzer. The bottom line is that the number of warnings can only decrease.

Ratchet

The number of warnings that the static analyzer generates is fixed. Quality gate is configured in such a way that now it is possible to lay only such a code that does not increase the number of operations. As a result, the process of gradual reduction in the number of operations due to the analyzer settings and error correction is launched.

Even if a person wants to cheat a little and decides to go through a quality gate not by eliminating the warnings in his new code, but by improving the old third-party code, this is not scary. All the same, the ratchet rotates in one direction, and gradually the number of defects will decrease. Even if a person does not want to edit their own new defects, he will still have to improve something in the neighboring code. At some point, easy ways to reduce the number of warnings end, and the moment comes when real errors will be fixed.

This methodology is described in more detail in a very interesting article by Ivan Ponomarev, “Implement static analysis into the process, rather than look for bugs with it,” which I recommend to anyone who is interested in improving the quality of the code.

The author of the article has a report on this topic: “Continuous Static Analysis

Conclusion

I hope that after this article, readers will be more friendly with the tools of static analysis and will want to introduce them into the development process. If you have any questions, we are always ready advise users of our static analyzer PVS-Studio and help with its implementation.

There are other typical doubts as to whether a static analyzer can really be convenient and useful. I tried to dispel most of these doubts in the publication “Reasons to introduce the PVS-Studio static code analyzer into the development process”[[nine].

Thank you for your attention and come download and try the PVS-Studio analyzer.

Sitelinks

  1. Andrey Karpov. How to quickly see interesting warnings generated by the PVS-Studio analyzer for C and C ++ code?
  2. Wikipedia Rice’s Theorem.
  3. Andrey Karpov, Victoria Khanieva. Using machine learning in static analysis of program source code.
  4. PVS-Studio. Documentation. Additional diagnostic settings.
  5. Andrey Karpov. Characteristics of the PVS-Studio analyzer using the EFL Core Libraries example, 10-15% of false positives.
  6. PVS-Studio. Documentation. Mass suppression of analyzer messages.
  7. Ivan Andryashin. About how we tested static analysis on our project of a training simulator of endovascular surgery.
  8. Pavel Eremeev, Svyatoslav Razmyslov. How PVS-Studio Team Improved Unreal Engine Code.
  9. Andrey Karpov. Reasons to introduce PVS-Studio static code analyzer into the development process.

If you want to share this article with an English-speaking audience, then please use the link to the translation: Andrey Karpov. How to introduce a static code analyzer in a legacy project and not to discourage the team.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *