4 Common Pitfalls in Working with Strings in C and C++ and How to Avoid Them

  • Strings that are treated as arrays of characters with a null character at the end, known as the EOL or null character. These are called C-Style strings.

  • The std::basic_string class is provided by the standard library of the C++ programming language. For example, the widely used std::string is an alias of std::basic_string, and std::wstring is an alias of std::basic_string.

Of course, over the long history of C++, many custom string classes have appeared. They were written by large and small projects. Most often, this was due to the imperfection of the string class in the standard library. Here are a few examples that are still alive today:

However, with the new C++ standards, the std::basic_string class has become much nicer, and now we see a reversal of the trend: projects are moving back to the standard library string class. You can read more about this here: C++ Russia 2017: Anton Polukhin, How not to do it: C++ bicycle building for professionals.

A bit of history

First, I would like to consider a famous typical vulnerability related to strings – Heartbleed. It was discovered in April 2014 in the OpenSSL crypto library. Heartbleed allowed attackers to extract random fragments of server memory, including sensitive data such as passwords, private keys, and others. This was done through the manipulation of Heartbeat Extension requests in the TLS protocol.

The attack algorithm looked like this:

  1. Sending a Heartbeat Request: When a client and server communicate using the TLS protocol, the client can send a Heartbeat Extension request. This request contains the data and its length that the client expects to receive in the response from the server.

  2. Manipulating query length: An attacker could intentionally change the length of the data in the request to a value larger than the length of the data actually sent. For example, a client could indicate that it expects 500 bytes of data, although it only sent 10 bytes.

  3. The server returns data from memory: When the server received such a request, it attempted to send back to the client the specified number of bytes of data from its memory. However, due to an error in checking the length of the data, the server could accidentally send not only the requested data, but also additional fragments of memory that were beyond the expected data.

  4. Extracting sensitive data: An attacker could send such requests multiple times, each time receiving different pieces of random data from the server's memory. This data could include passwords, private keys, session information, and other sensitive information.

Heartbleed was discovered by Google's security team and researchers at Codenomicon. They independently discovered that the Heartbeat Extension feature in the TLS protocol, if the server was not configured correctly, allowed clients to obtain random data from the server's memory, including sensitive information.

The researchers alerted OpenSSL and provided details of the vulnerability, after which a patch was released to fix it. The vulnerability caused a wide resonance in the media and the security community, becoming one of the most well-known and serious vulnerabilities in history. The discovery of Heartbleed triggered a large-scale update campaign for servers and systems using OpenSSL. The vulnerability description itself with the code can be viewed here.

Now let's turn to practical examples that will demonstrate common vulnerabilities associated with the use of strings in C and C++.

Pitfall 1: Buffer Overflow

This vulnerability occurs when a string is copied or written to a fixed-size buffer without proper checking of the string's length. If the string exceeds the buffer's size, it can overwrite adjacent memory areas, including important data or function return addresses.

Let's write an example like this:

#include <cstring>

void process_string(const char* source)
{
    char buffer[10];
    strcpy(buffer, source);
}

int main()
{
    const char* input = "This is a long string that exceeds the buffer size.";
    process_string(input);
    return 0;
}

The input string has a length greater than the buffer buffer. When the strcpy function is executed, the contents of the input string will be copied to buffer, and the buffer will overflow. This can lead to very bad consequences: from program crashes to malicious code execution.

One recommendation for fixing the code is to use safe string functions, such as strncpy. Let's try rewriting this fragment:

#include <iostream>
#include <cstring>

void process_string(const char* input)
{

    char buffer[10];
    strncpy(buffer, input, sizeof(buffer));
    std::cout << "Processed string: " << buffer << std::endl;
}

int main()
{
    const char* input = "This is a long string that exceeds the buffer size.";
    process_string(input);
    return 0;
}

The process_string function uses the strncpy function to copy the input string into a 10-character buffer. However, the strncpy function does not guarantee that the buffer will be correctly null-terminated if the source string exceeds the buffer size (there are custom functionswhich provide such a guarantee). This can also lead to unexpected behavior and unpredictable results.

In this case, if you pass a long string to the process_string function, strncpy will only copy part of the string to the buffer, but will not append a null character to the end. This may result in incorrect operation of functions that expect null-terminated strings.

An option to fix this is to use std::string as a safer alternative to C-Style strings. Let's try to combine both options (both std::string and strncpy):

#include <iostream>
#include <string>
#include <cstring>

void process_string(const std::string& input)
{
    char buffer[10];
    std::string str = input.substr(0, 10);
    strncpy(buffer, str.c_str(), sizeof(buffer));
    std::cout << "Processed string: " << buffer << std::endl;
}

int main()
{
    std::string input = "This is a long string that exceeds the buffer size.";
    process_string(input);
    return 0;
}

In the corrected code, we used the copy method of the std::string class, which copies the specified number of characters from the input string to the buffer. We also increased the buffer size by 1 and set a null terminator at the end of the buffer to ensure proper string termination.

But there is another problem with this code fragment. If the size of input is less than 10, then the rest of the buffer will be filled with garbage values.

You can avoid this by doing the following:

#include <iostream>
#include <string>

void process_string(const std::string& input)
{
    char buffer[11];
    size_t copied = input.copy(buffer, sizeof(buffer) - 1);
    buffer[copied] = '\0';
    std::cout << "Processed string: " << buffer << std::endl;
}

int main()
{
    std::string input = "This is a long string that exceeds the buffer size.";
    process_string(input);
    return 0;
}

Trap 2: Labeled data

Taint analysis is the analysis of user input to identify situations that typically arise when a program uses user input without proper checks.

Let's look at a simplified version of the Remote Code Execution vulnerability. For example, we have an application that processes requests, including sending commands to the system.

#include <iostream>
#include <string>
#include <cstdlib>

void processRequest(const std::string &input) {
    std::string command = "ls " + input;

    std::cout << "Command run: " << command << std::endl;
    system(command.c_str());
}

int main() {
    std::string userInput;
    std::cout << "Enter directory or file name: "
      << std::flush;
    std::cin >> userInput;

    processRequest(userInput);

    return 0;
}

An attacker might, for example, enter the following:

; rm -rf --no-preserve-root /

In this case, the program will execute the ls command, and then, since it encountered ;, it will execute the rm -rf –no-preserve-root / command, which will delete all files at the root level of the system. This type of attack can lead to the execution of malicious code on the server or system and is a serious security threat.

To prevent these actions, user input must be strictly validated and escaped before being used in system operations. Safer alternatives, such as exec functions with arguments, can help avoid such vulnerabilities. By the way, an interesting vulnerability recently of this kind was also found in Windows.

Trap 3: Format String Vulnerability

In English — format string vulnerability. This is a type of vulnerability in software code that occurs when string formatting functions are used incorrectly. The vulnerability occurs when user input is incorrectly processed as a format string in functions such as:

  • fprintf

  • printf

  • sprintf

  • snprintf

  • vfprintf

  • vprintf

  • vsprintf

  • vsnprintf

A format string vulnerability can have serious consequences, such as memory overwriting, obtaining sensitive data, executing arbitrary code, and even remote code execution by an attacker. This is because string formatting functions accept a format string that specifies the types and order of arguments for output or input. An attacker can inject malicious formats that will be misinterpreted.

Here are some format options that can be used, and their consequences if used incorrectly:

  • “%x” Reading data from stack

  • “%s” Reading C character string from process memory

  • “%n” Writes an integer to the process's memory cells.

An example of a format string vulnerability looks like this:

#include <stdio.h>

int main(int argc, char** argv) 
{
    char buffer[100];
    sprintf(buffer, argv[1]);
    printf(buffer);
    return 0;
}

Here, user input is passed to the sprintf function without any validation or restriction. This allows an attacker to pass a format string, which can cause problems. If an attacker passes a format string, such as “%s”, then when attempting to print the contents of the buffer using printf, a memory access will occur, which can lead to errors or data leaks.

To prevent format string vulnerabilities, it is important to always control and validate strings, use safe formatting methods (e.g. std::cout, std::stringstream, std::format in C++20 and above), and limit user input to prevent formats from being misinterpreted. It is also recommended to use special functions for safe string formatting, such as snprintf. They allow you to specify the maximum output length.

You can read more about this vulnerability in article.

Pitfall 4. std::string::npos

The std::string and std::string_view classes provide methods for searching for specific characters or character sequences within a string. If the search is successful, these functions return the position of the first character that matches the specified pattern. If the search fails, the functions return the static constant npos. Let's look at some code:

#include <string>

auto foo(std::string str) noexcept
{
  return str.find("42");
}

One of the most common mistakes is to treat the return values ​​of string search functions as a bool variable. The code fragment above contains a logical error that can lead to unexpected results. And the result of calling the search functions should be compared with std::string::npos.

Here is a list of these functions:

  • find

  • rfind

  • find_first_of

  • find_last_of

  • find_first_not_of

  • find_last_not_of

The corrected code snippet would look like this:

#include <string>

auto foo(std::string str) noexcept
{
  return str.find("42") != std::string::npos;
}

conclusions

Our examples have shown that safe use of strings in the C++ programming language is often a prerequisite for the correct operation and protection of applications. Incorrect handling of string data can lead to critical vulnerabilities and security threats. Therefore, to create applications resistant to attacks, strict validation, safe functions and methods, and user input escaping are required. Modern approaches to string processing will help you ensure the safety, integrity, and reliability of your code and guarantee protection from threats.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *