The code analyzer is wrong, long live the analyzer

Foo (std :: move (buffer), line_buffer - buffer.get ());

It is bad to combine many actions in one expression of the C ++ language, since such code is hard to understand, hard to maintain, and it is also easy to make a mistake in it. For example, create a bug by combining different actions when evaluating function arguments. We agree with the classic recommendation that the code should be simple and straightforward. And now let’s consider an interesting case, when formally the PVS-Studio analyzer is wrong, but from a practical point of view, the code should still be changed.

Argument Evaluation Order

What will be told now is a continuation of the old story about the order of calculation of arguments, which we wrote about in the article “The depth of a rabbit hole or an interview in C ++ at PVS-Studio”.

The brief essence is as follows. The order in which function arguments are evaluated is unspecified behavior. The standard does not specify the exact order in which compiler developers must evaluate the arguments. For example, left to right (Clang) or right to left (GCC, MSVC). Prior to the C ++ 17 standard, when side effects occurred when evaluating arguments, this could lead to undefined behavior.

With the advent of the C ++ 17 standard, the situation has changed for the better: now the calculation of the argument and its side effects will begin to be performed only from the moment all the calculations and side effects of the previous argument are performed. However, this does not mean that now there is no room for error.

Consider a simple test program:

#include <cstdio>
int main()
{
  int i = 1;
  printf("%d, %dn", i, i++);
  return 0;
}

What will this code print? The answer still depends on the compiler, version, and mood. Depending on the compiler, both “1, 1” and “2, 1” can be printed. Indeed, using the Compiler Explorer, I get the following results:

  • program compiled with Clang 11.0.0, produces “1, 1”.
  • program compiled with GCC 10.2, produces “2, 1”.

There is no undefined behavior in this program, but there is an unspecified behavior (order of evaluation of arguments).

Code from CSV Parser project

Let’s go back to the code snippet from the CSV Parser project, which I mentioned in the article “Checking the collection of header-only C ++ libraries (awesome-hpp)”.

The parser and I know that arguments can be evaluated in a different order. Therefore, the analyzer, and after it myself, considered this code to be erroneous:

std::unique_ptr<char[]> buffer(new char[BUFFER_UPPER_LIMIT]);
....
this->feed_state->feed_buffer.push_back(
    std::make_pair<>(std::move(buffer), line_buffer - buffer.get()));

PVS-Studio warning: V769 The ‘buffer.get ()’ pointer in the ‘line_buffer – buffer.get ()’ expression equals nullptr. The resulting value is senseless and it should not be used. csv.hpp 4957

In fact, we are both wrong and there is no mistake. We will talk about the nuances further, but for now let’s start with a simple one.

So, let’s see why it is dangerous to write code like this:

Foo(std::move(buffer), line_buffer - buffer.get());

I think you can guess the answer. The result depends on the order in which the arguments are evaluated. Consider this in the following synthetic code:

#include <iostream>
#include <memory>   

void Print(std::unique_ptr<char[]> p, ptrdiff_t diff)
{
    std::cout << diff << std::endl;
} 

void Print2(ptrdiff_t diff, std::unique_ptr<char[]> p)
{
    std::cout << diff << std::endl;
} 

int main()
{
    {
        std::unique_ptr<char[]> buffer(new char[100]);
        char *ptr = buffer.get() + 22;
        Print(std::move(buffer), ptr - buffer.get());
    }
    {
        std::unique_ptr<char[]> buffer(new char[100]);
        char *ptr = buffer.get() + 22;
        Print2(ptr - buffer.get(), std::move(buffer));
    }
    return 0;
}

Let’s use the Compiler Explorer again and see the output of this program compiled by different compilers.

Clang compiler 11.0.0. Result:

23387846
22

The GCC Compiler 10.2. Result:

22
26640070

We expect the result, and you can’t write like that. This is what the PVS-Studio analyzer is actually warning about.

I would like to put an end to this, but everything is a little more complicated. The fact is that we are talking about passing arguments by value, and when instantiating a function template std :: make_pair everything will be different. Let’s continue to dive into the nuances and find out why PVS-Studio is wrong in this case.

std :: make_pair

Check out the cppreference site and see how the function template has changed. std :: make_pair

Until C ++ 11:

template
std :: pair make_pair (T1 t, T2 u);

Since C ++ 11, until C ++ 14:

template
std :: pair make_pair (T1 && t, T2 && u);

Since C ++ 14:

template
constexpr std :: pair make_pair (T1 && t, T2 && u);

As you can see, once upon a time std :: make_pair took arguments by value. If at that time there were std :: unique_ptr, the above code was indeed incorrect. Whether this code would work or not was a matter of luck. In practice, of course, such a situation would never arise, since std :: unique_ptr appeared in C ++ 11 as a replacement std :: auto_ptr

Let’s go back to our time. Starting with the version of the C ++ 11 standard, the constructor started using move semantics.

There is a subtle point here that std :: move does not actually move anything, it just converts the object to rvalue-link. This will allow std :: make_pair pass pointer to new std :: unique_ptrleaving nullptr in the original smart pointer. But this pointer transfer won’t happen until we get inside. std :: make_pair… By that time, we have already calculated line_buffer – buffer.get (), and all will be well. That is, the function call buffer.get () cannot return nullptr at the moment when it is calculated, regardless of when exactly it happens.

Sorry for the complicated description. The bottom line is that this code is perfectly valid. And in fact, the PVS-Studio static analyzer gave a false alarm in this case. However, our team is not sure that we should rush to make changes to the analyzer’s logic for such situations.

The king is dead, long live the king!

We figured out that the triggering described in articleturned out to be false. Thanks to one of our readers who drew our attention to the implementation feature std :: make_pair

However, this is the case when we are not sure if it is worth improving the analyzer’s behavior. The point is that this code is too convoluted. You must admit that what the code we have analyzed does not deserve such a detailed investigation, which dragged on a whole article. If this code requires so much attention, then it is very bad code.

It is appropriate here to recall the article “False positives are our enemies, but may still be your friends“The publication is not ours, but we agree with it.

This is perhaps the very case. The warning may be false, but it points to a better place to refactor. It is enough to write something like this:

auto delta = line_buffer - buffer.get();
this->feed_state->feed_buffer.push_back(
  std::make_pair(std::move(buffer), delta));

Or, in this situation, you can make the code even better by using the method emplace_back:

auto delta = line_buffer - buffer.get();
this->feed_state->feed_buffer.emplace_back(std::move(buffer), delta);

This code will create the final object std :: pair in the container “in place”, bypassing the creation of a temporary object and its transfer to the container. By the way, the PVS-Studio analyzer suggests making such a replacement by issuing a warning V823 from a set of rules for code micro-optimizations.

The code will become unambiguously simpler and clearer to any reader and analyzer. There is no merit in cramming as many actions as possible into one line of code.

Yes, in this case it was lucky and there is no error. But it is unlikely that the author, when writing this code, kept everything that we discussed in his head. Most likely, it was luck that played. And another time you may not be lucky.

Conclusion

So, we figured out that there is no real error. The analyzer generates a false positive. Perhaps we will remove the warning just for such cases, but perhaps not. We’ll think about it later. After all, this is a rather rare case, and code where arguments are evaluated with side effects is dangerous in general, and it is better to avoid it. It is worth refactoring at least for preventive purposes.

View code:

Foo(std::move(buffer), line_buffer - buffer.get());

easy to break by changing something elsewhere in the program. Code like this is hard to maintain. And it is also unpleasant in that it can give a false feeling that everything is working correctly. In fact, this is just a coincidence, and everything can break if you change the compiler or optimization settings.

Make your code easier!

If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov. The Code Analyzer is wrong. Long live the Analyzer !.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *