Checking Clang 11 with PVS-Studio

PVS-Studio: Still worthy!

From time to time we have to write articles about checking the next version of a compiler. It is not interesting. However, as practice shows, if this is not done for a long time, people begin to doubt whether the PVS-Studio analyzer deserves the title of a good catcher of bugs and potential vulnerabilities. Perhaps the new compiler already knows how to do this? Yes, compilers do not stand still. However, PVS-Studio is also developing, again and again demonstrating the ability to find errors even in the code of such high-quality projects as compilers.

Time to double-check the Clang code

Honestly, I took the article as a basis for this text “Checking the GCC 10 compiler using PVS-Studio“. So if it seems to you that you’ve already read some paragraphs somewhere, then you don’t think :).

It’s no secret that compilers have their own built-in static code analyzers, and they are also evolving. Therefore, from time to time we write articles about how the PVS-Studio static analyzer can find errors even inside compilers and that we are eating bread for a reason :).

In fact, you cannot compare classical static analyzers with compilers. Static analyzers are not only about finding errors in the code, but also a developed infrastructure. For example, this is integration with systems such as SonarQube, PlatformIO, Azure DevOps, Travis CI, CircleCI, GitLab CI / CD, Jenkins, Visual Studio. These are advanced mechanisms for mass suppression of warnings, which allows you to quickly start using PVS-Studio even in a large old project. This is a notification distribution. And so on and so forth. However, it is still the first question that is asked: “Can PVS-Studio find something that compilers cannot find?” This means that we will again and again write articles about checking these compilers themselves.

Let’s get back to the topic of checking the Clang project. There is no need to dwell on this project and tell what it is. In fact, not only the code of Clang 11 itself was tested, but also the LLVM 11 library on which it is built. From the point of view of this article, it makes no difference whether a defect is found in the compiler or library code.

The Clang / LLVM code seemed to me much clearer than the GCC code. At least all these terrible macros are missing, and modern features of the C ++ language are actively used.

Despite this, the project is large, and without preliminary settings of the analyzer it is still very tedious to view the report. Basically, half-false positives interfere. By semi-false positives, I mean situations when the analyzer is formally right, but there is no sense in warning. For example, a lot of such positives are issued for unit tests and generated code.

Example for tests:

Spaces.SpacesInParentheses = false;               // <=
Spaces.SpacesInCStyleCastParentheses = true;      // <=
verifyFormat("Type *A = ( Type * )P;", Spaces);
verifyFormat("Type *A = ( vector<Type *, int *> )P;", Spaces);
verifyFormat("x = ( int32 )y;", Spaces);
verifyFormat("int a = ( int )(2.0f);", Spaces);
verifyFormat("#define AA(X) sizeof((( X * )NULL)->a)", Spaces);
verifyFormat("my_int a = ( my_int )sizeof(int);", Spaces);
verifyFormat("#define x (( int )-1)", Spaces);

// Run the first set of tests again with:
Spaces.SpacesInParentheses = false;               // <=
Spaces.SpaceInEmptyParentheses = true;
Spaces.SpacesInCStyleCastParentheses = true;      // <=
verifyFormat("call(x, y, z);", Spaces);
verifyFormat("call( );", Spaces);

The analyzer warns that the variables are assigned the same values ​​that they already contain:

  • V1048 The ‘Spaces.SpacesInParentheses’ variable was assigned the same value. FormatTest.cpp 11554
  • V1048 The ‘Spaces.SpacesInCStyleCastParentheses’ variable was assigned the same value. FormatTest.cpp 11556

Formally, the analyzer returned the correct response, and this is the code fragment that should be simplified or corrected. At the same time, it is clear that, in fact, everything is fine and that there is no point in editing something either.

Another example: the analyzer issues a huge number of warnings to the auto-generated Options.inc file. You can see the “code sheet” in the file:

The code

And PVS-Studio sends warnings to all this:

  • V501 There are identical sub-expressions to the left and to the right of the ‘==’ operator: nullptr == nullptr Options.inc 26
  • V501 There are identical sub-expressions to the left and to the right of the ‘==’ operator: nullptr == nullptr Options.inc 27
  • V501 There are identical sub-expressions to the left and to the right of the ‘==’ operator: nullptr == nullptr Options.inc 28
  • and so on for each line …

All this is not scary. All this can win: disable checking for unnecessary files, mark up some macros and functions, suppress certain types of alarms, and so on. It is possible, but doing this as part of the task of writing an article is not interesting. Therefore, I did exactly the same as in the article about the GCC compiler. I studied the report until I had 11 interesting code examples to write an article. Why 11? I thought that since the Clang version is 11, then let the fragments be 11 :).

11 suspicious code snippets

Fragment N1, modulo division by one

% 1 - modulo division by 1

Cool mistake! I love these!

void Act() override {
  ....
  // If the value type is a vector, and we allow vector select, then in 50%
  // of the cases generate a vector select.
  if (isa<FixedVectorType>(Val0->getType()) && (getRandom() % 1)) {
    unsigned NumElem =
        cast<FixedVectorType>(Val0->getType())->getNumElements();
    CondTy = FixedVectorType::get(CondTy, NumElem);
  }
  ....
}

PVS-Studio warning: V1063 The modulo by 1 operation is meaningless. The result will always be zero. llvm-stress.cpp 631

In order to get a random value of 0 or 1, modulo division is used. But apparently this value 1 confusing, and people make the classic pattern of error, dividing by one, although it is necessary to divide by two. Operation X% 1 does not make sense, since the result always gives 0… Correct code variant:

if (isa<FixedVectorType>(Val0->getType()) && (getRandom() % 2)) {

Diagnostics V1063, which recently appeared in PVS-Studio, is outrageously simple, but, as you can see, it works.

As we know, compiler developers look at what we are doing and borrow our best practices. There is nothing wrong with that. We are pleased that PVS-Studio is the engine of progress… We set the time after how much the same diagnostics will appear in Clang and GCC :).

Fragment N2, typo in the condition

class ReturnValueSlot {
  ....
  bool isNull() const { return !Addr.isValid(); }
  ....
};

static bool haveSameParameterTypes(ASTContext &Context, const FunctionDecl *F1,
                                   const FunctionDecl *F2, unsigned NumParams) {
  ....
  unsigned I1 = 0, I2 = 0;
  for (unsigned I = 0; I != NumParams; ++I) {
    QualType T1 = NextParam(F1, I1, I == 0);
    QualType T2 = NextParam(F2, I2, I == 0);
    if (!T1.isNull() && !T1.isNull() && !Context.hasSameUnqualifiedType(T1, T2))
      return false;
  }
  return true;
}

PVS-Studio warning: V501 There are identical sub-expressions to the left and to the right of the ‘&&’ operator:! T1.isNull () &&! T1.isNull () SemaOverload.cpp 9493

Check twice ! T1.isNull ()… This is an obvious typo, and the second part of the condition should check the variable T2

Fragment N3, potential out of bounds of the array

std::vector<Decl *> DeclsLoaded;

SourceLocation ASTReader::getSourceLocationForDeclID(GlobalDeclID ID) {
  ....
  unsigned Index = ID - NUM_PREDEF_DECL_IDS;

  if (Index > DeclsLoaded.size()) {
    Error("declaration ID out-of-range for AST file");
    return SourceLocation();
  }

  if (Decl *D = DeclsLoaded[Index])
    return D->getLocation();
  ....
}

PVS-Studio warning: V557 Array overrun is possible. The ‘Index’ index is pointing beyond array bound. ASTReader.cpp 7318

Suppose the array contains one element, and the value of the variable Index is also equal to one. Condition (1> 1) is false, and the result will be an array overrun. Correct check:

if (Index >= DeclsLoaded.size()) {

Fragment N4, the order of evaluation of arguments

void IHexELFBuilder::addDataSections() {
  ....
  uint32_t SecNo = 1;
  ....
  Section = &Obj->addSection<OwnedDataSection>(
      ".sec" + std::to_string(SecNo++), RecAddr,
      ELF::SHF_ALLOC | ELF::SHF_WRITE, SecNo);
  ....
}

PVS-Studio warning: V567 Unspecified behavior. The order of argument evaluation is not defined for ‘addSection’ function. Consider inspecting the ‘SecNo’ variable. Object.cpp 1223

Note that the argument SecNo used twice, and incremented along the way. It’s just impossible to say in what order the arguments will be evaluated. Therefore, the result will vary depending on the compiler version or compilation settings.

Let me explain this with a synthetic example:

#include <cstdio>
int main()
{
  int i = 1;
  printf("%d, %dn", i, i++);
  return 0;
}

Depending on the compiler, both “1, 2” and “2, 1” can be printed. Using the Compiler Explorer, I get the following results:

  • a program compiled with Clang 11.0.0, issues: eleven.
  • a program compiled with GCC 10.2, issues: 2, 1.

Interestingly, for this simple case, the Clang compiler issues a warning:

<source>:6:26: warning:
unsequenced modification and access to 'i' [-Wunsequenced]
printf("%d, %dn", i, i++);

Apparently, in real conditions, this warning did not help. Diagnostics is either disabled, as it is inconvenient for practical use, or the compiler was unable to warn about a more complex case.

Fragment N5, strange retest

template <class ELFT>
void GNUStyle<ELFT>::printVersionSymbolSection(const ELFFile<ELFT> *Obj,
                                               const Elf_Shdr *Sec) {

  ....
  Expected<StringRef> NameOrErr =
      this->dumper()->getSymbolVersionByIndex(Ndx, IsDefault);
  if (!NameOrErr) {
    if (!NameOrErr) {
      unsigned SecNdx = Sec - &cantFail(Obj->sections()).front();
      this->reportUniqueWarning(createError(
          "unable to get a version for entry " + Twine(I) +
          " of SHT_GNU_versym section with index " + Twine(SecNdx) + ": " +
          toString(NameOrErr.takeError())));
    }
    Versions.emplace_back("<corrupt>");
    continue;
  }
  ....
}

PVS-Studio warning: V571 Recurring check. The ‘if (! NameOrErr)’ condition was already verified in line 4666. ELFDumper.cpp 4667

The second check duplicates the first and is redundant. Perhaps the second check can simply be removed. But this is most likely a typo, and a different variable should be used in the second condition.

Snippet N6, dereferencing a potentially null pointer

void RewriteObjCFragileABI::RewriteObjCClassMetaData(
  ObjCImplementationDecl *IDecl, std::string &Result)
{
  ObjCInterfaceDecl *CDecl = IDecl->getClassInterface();

  if (CDecl->isImplicitInterfaceDecl()) {
    RewriteObjCInternalStruct(CDecl, Result);
  }

  unsigned NumIvars = !IDecl->ivar_empty()
  ? IDecl->ivar_size()
  : (CDecl ? CDecl->ivar_size() : 0);
  ....
}

PVS-Studio warning: V595 The ‘CDecl’ pointer was utilized before it was verified against nullptr. Check lines: 5275, 5284. RewriteObjC.cpp 5275

During the first check, the pointer CDecl always boldly dereferenced:

if (CDecl->isImplicitInterfaceDecl())

And only from the code written below it becomes clear that this pointer can be null:

(CDecl ? CDecl->ivar_size() : 0)

Most likely, the first check should look like this:

if (CDecl && CDecl->isImplicitInterfaceDecl())

Snippet N7, dereferencing a potentially null pointer

bool
Sema::InstantiateClass(....)
{
  ....
  NamedDecl *ND = dyn_cast<NamedDecl>(I->NewDecl);
  CXXRecordDecl *ThisContext =
      dyn_cast_or_null<CXXRecordDecl>(ND->getDeclContext());
  CXXThisScopeRAII ThisScope(*this, ThisContext, Qualifiers(),
                              ND && ND->isCXXInstanceMember());
  ....
}

PVS-Studio warning: V595 The ‘ND’ pointer was utilized before it was verified against nullptr. Check lines: 2803, 2805. SemaTemplateInstantiate.cpp 2803

A variation of the previous error. It is dangerous to dereference a pointer without first checking it if its value is obtained using a dynamic cast. Moreover, from the code below, you can see that such a check is needed.

Fragment N8, the function continues execution despite an error state

bool VerifyObject(llvm::yaml::Node &N,
                  std::map<std::string, std::string> Expected) {
  ....
  auto *V = llvm::dyn_cast_or_null<llvm::yaml::ScalarNode>(Prop.getValue());
  if (!V) {
    ADD_FAILURE() << KS << " is not a string";
    Match = false;
  }
  std::string VS = V->getValue(Tmp).str();
  ....
}

PVS-Studio warning: V1004 The ‘V’ pointer was used unsafely after it was verified against nullptr. Check lines: 61, 65. TraceTests.cpp 65

Pointer V may be zero. This is clearly an erroneous condition, which is even reported. However, after that, the function continues execution as if nothing had happened, which will lead to dereferencing of this very null pointer. Perhaps they forgot to interrupt the function, and the correct option should look like this:

auto *V = llvm::dyn_cast_or_null<llvm::yaml::ScalarNode>(Prop.getValue());
if (!V) {
  ADD_FAILURE() << KS << " is not a string";
  Match = false;
  return false;
}
std::string VS = V->getValue(Tmp).str();

Snippet N9, typo

const char *tools::SplitDebugName(const ArgList &Args, const InputInfo &Input,
                                  const InputInfo &Output) {
  if (Arg *A = Args.getLastArg(options::OPT_gsplit_dwarf_EQ))
    if (StringRef(A->getValue()) == "single")
      return Args.MakeArgString(Output.getFilename());

  Arg *FinalOutput = Args.getLastArg(options::OPT_o);
  if (FinalOutput && Args.hasArg(options::OPT_c)) {
    SmallString<128> T(FinalOutput->getValue());
    llvm::sys::path::replace_extension(T, "dwo");
    return Args.MakeArgString(T);
  } else {
    // Use the compilation dir.
    SmallString<128> T(
        Args.getLastArgValue(options::OPT_fdebug_compilation_dir));
    SmallString<128> F(llvm::sys::path::stem(Input.getBaseInput()));
    llvm::sys::path::replace_extension(F, "dwo");
    T += F;
    return Args.MakeArgString(F);       // <=
  }
}

PVS-Studio warning: V1001 The ‘T’ variable is assigned but is not used by the end of the function. CommonArgs.cpp 873

Notice the end of the function. Local variable T changes but is not used in any way. Most likely, this is a typo and the function should end with the following lines of code:

T += F;
return Args.MakeArgString(T);

Fragment N10, divisor is zero

typedef int32_t si_int;
typedef uint32_t su_int;

typedef union {
  du_int all;
  struct {
#if _YUGA_LITTLE_ENDIAN
    su_int low;
    su_int high;
#else
    su_int high;
    su_int low;
#endif // _YUGA_LITTLE_ENDIAN
  } s;
} udwords;

COMPILER_RT_ABI du_int __udivmoddi4(du_int a, du_int b, du_int *rem) {
  ....
  if (d.s.low == 0) {
    if (d.s.high == 0) {
      // K X
      // ---
      // 0 0
      if (rem)
        *rem = n.s.high % d.s.low;
      return n.s.high / d.s.low;
    }
  ....
}

PVS-Studio warnings:

  • V609 Mod by zero. Denominator ‘dslow’ == 0.udivmoddi4.c 61
  • V609 Divide by zero. Denominator ‘dslow’ == 0.udivmoddi4.c 62

I don’t know if this is a mistake or some clever idea, but the code is very strange. There are two ordinary integer variables, and one is divisible by the other. Interestingly, this only happens if both variables are zero. What does all this mean?

Fragment N11, Copy-Paste

bool MallocChecker::mayFreeAnyEscapedMemoryOrIsModeledExplicitly(....)
{
  ....
  StringRef FName = II->getName();
  ....
  if (FName == "postEvent" &&
      FD->getQualifiedNameAsString() == "QCoreApplication::postEvent") {
    return true;
  }

  if (FName == "postEvent" &&
      FD->getQualifiedNameAsString() == "QCoreApplication::postEvent") {
    return true;
  }
  ....
}

PVS-Studio warning: V581 The conditional expressions of the ‘if’ statements situated alongside each other are identical. Check lines: 3108, 3113. MallocChecker.cpp 3113

The code snippet was copied but not changed in any way. The second snippet should either be removed or modified to begin performing some useful check.

Conclusion

As a reminder, to check open source projects, you can use here is this free license option… By the way, there are other options for free licensing PVS-Studio, including even for closed projects. They are listed here: “Free licensing options for PVS-Studio“. Thank you for attention.

Our other articles about compiler checking

If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov. Checking Clang 11 with PVS-Studio.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *