C++17 – std::string_view and no copying

Purpose std::string_view is to avoid copying data that already belongs to something and requires only an immutable representation. As you might have guessed, this post will focus on performance.

Today we will talk about one of the main features of C ++ 17.

I’m assuming you already have a basic understanding of std::string_view. If not, you can read my previous post first. “C++17 – What’s new in the library“. A C++ string is like a thin wrapper that stores its data on the heap. So when you’re dealing with strings in C and C++, memory allocation is the most mundane thing. Let’s deal with this.

Small string optimization

Very soon you will see why I called this paragraph small string optimization.

// sso.cpp

#include <iostream>
#include <string>

void* operator new(std::size_t count){
    std::cout << "   " << count << " bytes" << std::endl;
    return malloc(count);
}

void getString(const std::string& str){}

int main() {

    std::cout << std::endl;

    std::cout << "std::string" << std::endl;

    std::string small = "0123456789";
    std::string substr = small.substr(5);
    std::cout << "   " << substr << std::endl;

    std::cout << std::endl;

    std::cout << "getString" << std::endl;

    getString(small);
    getString("0123456789");
    const char message []= "0123456789";
    getString(message);

    std::cout << std::endl;

}

I have overloaded the global operator new in lines 6-9. This way you will be able to see which operation is causing the memory allocation. Come on, it’s easier than a steamed turnip. Of course, lines 19, 20, 28, and 29 cause memory allocation. Now let’s look at the output:

What the …? I said that strings store their data on the heap. But this is only true if the length of the string exceeds some implementation dependent size. This size for std::string is 15 in MSVC and GCC and 23 in Clang.

This means that small strings are actually stored directly in the string object itself. Therefore, memory allocation is not required.

Okay, from now on, my strings will always be at least 30 characters so I don’t have to worry about optimizing small strings. Let’s start from the beginning, but this time with longer lines.

No memory allocation required

Now back to radiant std::string_view. Unlike std::string, std::string_view does not allocate memory. And here’s the proof:

// stringView.cpp

#include <cassert>
#include <iostream>
#include <string>

#include <string_view>

void* operator new(std::size_t count){
    std::cout << "   " << count << " bytes" << std::endl;
    return malloc(count);
}

void getString(const std::string& str){}

void getStringView(std::string_view strView){}

int main() {

    std::cout << std::endl;

    std::cout << "std::string" << std::endl;

    std::string large = "0123456789-123456789-123456789-123456789";
    std::string substr = large.substr(10);

    std::cout << std::endl;

    std::cout << "std::string_view" << std::endl;

    std::string_view largeStringView{large.c_str(), large.size()};
    largeStringView.remove_prefix(10);

    assert(substr == largeStringView);

    std::cout << std::endl;

    std::cout << "getString" << std::endl;

    getString(large);
    getString("0123456789-123456789-123456789-123456789");
    const char message []= "0123456789-123456789-123456789-123456789";
    getString(message);

    std::cout << std::endl;

    std::cout << "getStringView" << std::endl;

    getStringView(large);
    getStringView("0123456789-123456789-123456789-123456789");
    getStringView(message);

    std::cout << std::endl;

}

Again. Memory allocation occurs on lines 24, 25, 41, and 43. But what happens in the corresponding calls on lines 31, 32, 50, and 51? No memory allocation occurs!

It’s impressive. You can even be sure that this is a significant performance boost, because memory allocation is a very expensive operation. This performance gain is especially noticeable when you create substrings from existing strings.

O(n) vs. O(1)

std::string And std::string_view both contain a method substr. Method std::string returns a substring, and the method std::string_view returns a substring representation. It doesn’t sound so exciting, but there is a significant difference between these methods. std::string::substr has linear complexity, and std::string_view::substr – constant complexity. This means that the performance of the operation over std::string directly depends on the size of the substring, and the performance of the operation on std::string_view – does not depend.

Now I’m really curious. Let’s do a simple performance comparison.

// substr.cpp

#include <chrono>
#include <fstream>
#include <iostream>
#include <random>
#include <sstream>
#include <string>
#include <vector>

#include <string_view>

static const int count = 30;
static const int access = 10000000;

int main(){

    std::cout << std::endl;

    std::ifstream inFile("grimm.txt");

    std::stringstream strStream;
    strStream <<  inFile.rdbuf();
    std::string grimmsTales = strStream.str();

    size_t size = grimmsTales.size();

    std::cout << "Grimms' Fairy Tales size: " << size << std::endl;
    std::cout << std::endl;

    // случайные значения
    std::random_device seed;
    std::mt19937 engine(seed());
    std::uniform_int_distribution<> uniformDist(0, size - count - 2);
    std::vector<int> randValues;
    for (auto i = 0; i <  access; ++i) randValues.push_back(uniformDist(engine));

    auto start = std::chrono::steady_clock::now();
    for (auto i = 0; i  < access; ++i ) {
        grimmsTales.substr(randValues[i], count);
    }
    std::chrono::duration<double> durString= std::chrono::steady_clock::now() - start;
    std::cout << "std::string::substr:      " << durString.count() << " seconds" << std::endl;

    std::string_view grimmsTalesView{grimmsTales.c_str(), size};
    start = std::chrono::steady_clock::now();
    for (auto i = 0; i  < access; ++i ) {
        grimmsTalesView.substr(randValues[i], count);
    }
    std::chrono::duration<double> durStringView= std::chrono::steady_clock::now() - start;
    std::cout << "std::string_view::substr: " << durStringView.count() << " seconds" << std::endl;

    std::cout << std::endl;

    std::cout << "durString.count()/durStringView.count(): " << durString.count()/durStringView.count() << std::endl;

    std::cout << std::endl;

}

Before I present the results to you, let me say a few words about my performance test. The main idea of ​​this performance test is to take a large file as std::string and create many substrings with std::string And std::string_view. Here I’m specifically interested in how long it will take to create these substrings.

I used Grimm’s Fairy Tales as my long file. What else could I use? Line grimmTales (line 24) contains the insides of the file. I fill std::vector<int> on line 37 with the number of access (10’000’000) values ​​in the range [0, размер – количество – 2] (line 34). And now the performance test itself begins. I create in lines 39 to 41 access fixed length substring count. count is 30. Therefore, the optimization of small strings will not get in the way. I do the same on lines 47 to 49 using std::string_view.

Here are the results. Below you can see the file length, indicators for std::string::substr And std::string_view::substr, and the relationship between them. I used GCC 6.3.0 as the compiler.

For strings of 30 characters

Just out of curiosity. Indicators without optimization.

But now to more important indicators. GCC with full optimization.

Optimization doesn’t really matter if std::stringbut we see a big difference in the case std::string_view. Creating a substring with std::string_view about 45 times faster than using std::string. What is this if not a reason to use std::string_view?

For strings of other sizes

Now I’m even more interested. What happens if I play with the size count substrings? Of course, all indicators for maximum optimization. I rounded them up to the 3rd decimal place.

I’m not surprised, the numbers reflect guarantees of complexity std::string::substr as opposed to std::string_view::substr. The complexity of the former depends linearly on the size of the substring; the second does not depend on the size of the substring. Ultimately, std::string_view is radically superior std::string.


An open lesson will take place tonight, where we will figure out which basic algorithms are included in the STL. During the lesson, we will get acquainted with the search and sorting algorithms. You can sign up on the course page “C++ Developer. Professional”.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *