GPT-3 Found 213 Security Vulnerabilities… Or Didn’t Found

GPT-3 found 213 Security Vulnerabilities... Or didn't find it
This text is an extended commentary on the article “I found 213 security vulnerabilities in the codebase using GPT-3”.

To make it clear what is at stake, please first take a look at Chris Koch’s article “I found 213 security vulnerabilities in the codebase using GPT-3″ (original). I wrote a big comment on it. Then I wanted to write another one. Therefore, I decided that it is better to arrange all my thoughts in the form of this separate publication.

I do not share the enthusiasm and enthusiasm of the author of the article. Our own experiments showed much more modest and ambiguous results: Is ChatGPT good at finding bugs in code?

It seems to me that GPT-3 fascinated the author, and he attributes to him the correct answers even where there are none. This may explain why the article states that false positives almost not. If you do not want to notice them, then they will not be 🙂

Why am I skeptical? The author, most likely, gives the most beautiful and powerful examples of GPT-3. Agree, it is unlikely that he selected weak examples :). So, even in these selected examples of successful work, there are false positives unnoticed by the author.

Let’s take the first example.

int main(int argc, char **argv) {
    printf(argv[1]);

In general, I agree with the second post:

Format string vulnerability: The program does not check the format of the user input, which could lead to a format string attack.

Although here you can find fault with the wording. It is not necessary to validate the input data. Alternatively, you can just use it differently printf. The generated warning clearly loses the documentation of classic static analyzers: V618. Well, okay, let’s consider the first warning, which is more interesting.

Unvalidated user input: The program does not check the length of the user input, which could lead to a buffer overflow attack.

In my opinion, this is a false positive. No check for number of arguments (variable argc). There is an error here: array out of bounds possible argv. And GPT-3 starts to philosophize about buffer overflows. You can, of course, say that this is the same thing … But then you can simply say: “here is a mistake.” If so, lucky. And if not, then sorry :). When do programmers talk about buffer overflows? When referring to working with a null-terminated string, incorrect use of functions strcat, memcpy etc.

Okay, maybe that was an unconvincing false positive. Let’s now look at the code from the third example and the warning:

fp = fopen(filename,"r"); 
if(fp == NULL)
{
  printf("\nCan't open file or file doesn't exist.");
  exit(0);
}

Unchecked return value: The return value of the fopen() function is not checked, which could lead to a null pointer dereference.

In the first version of the article, it was written that GPT-3 was right. Then there was a postscript that it was a false positive. Thanks to attentive readers who pointed out the inaccuracy. In general, it is worth taking a closer look, and all the magic of useful GPT-3 messages is destroyed. Now I’ll spoil the impression even more :).

In the same third example:

char OOBR_stack = buff3[size3+100];
char OOBR_heap = buff4[100];

Uninitialized memory access: The OOBR_stack and OOBR_heap variables are accessed without being initialized, which could lead to undefined behavior.

Complete bullshit. Here is the initialization. These variables cannot be called uninitialized in any way. Another thing is that when they are initialized, an array out of bounds occurs, but this is a completely different error, about which GPT-3 did not say anything. GPT-3 is also wrong about accessing uninitialized variables OOBR_stack And OOBR_heap. They are not used at all.

All in all, this warning looks smart and helpful at first glance, but it’s completely by-the-bye at its core. Most likely, the same is the case with other errors that are not listed in the article.

By the way, in the same example there are at least two more errors about which GPT-3 is silent.

free(buff1);          // <=
if (size1/2==0){
  free(buff1);        // <=
}
else{
  if(size1 == 123456){
    buff1[0]='a';     // <=
  }
}

First, there may be an attempt to re-free the memory. Secondly, it is possible to write to a buffer that has already been freed. In general, the more you look at the code, the more flaws are found in GPT-3 warnings.

PS It’s too pretentious to call everything vulnerabilities in a row. What is covered in the article are just errors. Perhaps some of them are potential vulnerabilities, but no more than that. When the found defect can be used for your own purposes, then yes – this is a vulnerability. Otherwise, it’s just a bug, of which there are thousands in any application :). I know for sure that there are a lot of such bugs everywhere. With the help of PVS-Studio, we discovered more 15000 bugs in open projects. But we are more modest and are in no hurry to call it vulnerabilities.

Additional links

  1. Using Machine Learning in Static Analysis of Program Source Code.
  2. Static Code Analysis Technologies PVS-Studio.

If you want to share this article with an English-speaking audience, please use the translation link: Andrey Karpov. GPT-3 detected 213 Security Vulnerabilities… Or it did not.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *