C++17 – std::string_view and no copying
Purpose std::string_view
is to avoid copying data that already belongs to something and requires only an immutable representation. As you might have guessed, this post will focus on performance.
Today we will talk about one of the main features of C ++ 17.
I’m assuming you already have a basic understanding of std::string_view
. If not, you can read my previous post first. “C++17 – What’s new in the library“. A C++ string is like a thin wrapper that stores its data on the heap. So when you’re dealing with strings in C and C++, memory allocation is the most mundane thing. Let’s deal with this.
Small string optimization
Very soon you will see why I called this paragraph small string optimization.
// sso.cpp
#include <iostream>
#include <string>
void* operator new(std::size_t count){
std::cout << " " << count << " bytes" << std::endl;
return malloc(count);
}
void getString(const std::string& str){}
int main() {
std::cout << std::endl;
std::cout << "std::string" << std::endl;
std::string small = "0123456789";
std::string substr = small.substr(5);
std::cout << " " << substr << std::endl;
std::cout << std::endl;
std::cout << "getString" << std::endl;
getString(small);
getString("0123456789");
const char message []= "0123456789";
getString(message);
std::cout << std::endl;
}
I have overloaded the global operator new
in lines 6-9. This way you will be able to see which operation is causing the memory allocation. Come on, it’s easier than a steamed turnip. Of course, lines 19, 20, 28, and 29 cause memory allocation. Now let’s look at the output:
What the …? I said that strings store their data on the heap. But this is only true if the length of the string exceeds some implementation dependent size. This size for std::string
is 15 in MSVC and GCC and 23 in Clang.
This means that small strings are actually stored directly in the string object itself. Therefore, memory allocation is not required.
Okay, from now on, my strings will always be at least 30 characters so I don’t have to worry about optimizing small strings. Let’s start from the beginning, but this time with longer lines.
No memory allocation required
Now back to radiant std::string_view
. Unlike std::string
, std::string_view
does not allocate memory. And here’s the proof:
// stringView.cpp
#include <cassert>
#include <iostream>
#include <string>
#include <string_view>
void* operator new(std::size_t count){
std::cout << " " << count << " bytes" << std::endl;
return malloc(count);
}
void getString(const std::string& str){}
void getStringView(std::string_view strView){}
int main() {
std::cout << std::endl;
std::cout << "std::string" << std::endl;
std::string large = "0123456789-123456789-123456789-123456789";
std::string substr = large.substr(10);
std::cout << std::endl;
std::cout << "std::string_view" << std::endl;
std::string_view largeStringView{large.c_str(), large.size()};
largeStringView.remove_prefix(10);
assert(substr == largeStringView);
std::cout << std::endl;
std::cout << "getString" << std::endl;
getString(large);
getString("0123456789-123456789-123456789-123456789");
const char message []= "0123456789-123456789-123456789-123456789";
getString(message);
std::cout << std::endl;
std::cout << "getStringView" << std::endl;
getStringView(large);
getStringView("0123456789-123456789-123456789-123456789");
getStringView(message);
std::cout << std::endl;
}
Again. Memory allocation occurs on lines 24, 25, 41, and 43. But what happens in the corresponding calls on lines 31, 32, 50, and 51? No memory allocation occurs!
It’s impressive. You can even be sure that this is a significant performance boost, because memory allocation is a very expensive operation. This performance gain is especially noticeable when you create substrings from existing strings.
O(n) vs. O(1)
std::string
And std::string_view
both contain a method substr
. Method std::string
returns a substring, and the method std::string_view
returns a substring representation. It doesn’t sound so exciting, but there is a significant difference between these methods. std::string::substr
has linear complexity, and std::string_view::substr
– constant complexity. This means that the performance of the operation over std::string
directly depends on the size of the substring, and the performance of the operation on std::string_view
– does not depend.
Now I’m really curious. Let’s do a simple performance comparison.
// substr.cpp
#include <chrono>
#include <fstream>
#include <iostream>
#include <random>
#include <sstream>
#include <string>
#include <vector>
#include <string_view>
static const int count = 30;
static const int access = 10000000;
int main(){
std::cout << std::endl;
std::ifstream inFile("grimm.txt");
std::stringstream strStream;
strStream << inFile.rdbuf();
std::string grimmsTales = strStream.str();
size_t size = grimmsTales.size();
std::cout << "Grimms' Fairy Tales size: " << size << std::endl;
std::cout << std::endl;
// случайные значения
std::random_device seed;
std::mt19937 engine(seed());
std::uniform_int_distribution<> uniformDist(0, size - count - 2);
std::vector<int> randValues;
for (auto i = 0; i < access; ++i) randValues.push_back(uniformDist(engine));
auto start = std::chrono::steady_clock::now();
for (auto i = 0; i < access; ++i ) {
grimmsTales.substr(randValues[i], count);
}
std::chrono::duration<double> durString= std::chrono::steady_clock::now() - start;
std::cout << "std::string::substr: " << durString.count() << " seconds" << std::endl;
std::string_view grimmsTalesView{grimmsTales.c_str(), size};
start = std::chrono::steady_clock::now();
for (auto i = 0; i < access; ++i ) {
grimmsTalesView.substr(randValues[i], count);
}
std::chrono::duration<double> durStringView= std::chrono::steady_clock::now() - start;
std::cout << "std::string_view::substr: " << durStringView.count() << " seconds" << std::endl;
std::cout << std::endl;
std::cout << "durString.count()/durStringView.count(): " << durString.count()/durStringView.count() << std::endl;
std::cout << std::endl;
}
Before I present the results to you, let me say a few words about my performance test. The main idea of this performance test is to take a large file as std::string
and create many substrings with std::string
And std::string_view
. Here I’m specifically interested in how long it will take to create these substrings.
I used Grimm’s Fairy Tales as my long file. What else could I use? Line grimmTales
(line 24) contains the insides of the file. I fill std::vector<int>
on line 37 with the number of access (10’000’000) values in the range [0, размер – количество – 2] (line 34). And now the performance test itself begins. I create in lines 39 to 41 access
fixed length substring count
. count
is 30. Therefore, the optimization of small strings will not get in the way. I do the same on lines 47 to 49 using std::string_view
.
Here are the results. Below you can see the file length, indicators for std::string::substr
And std::string_view::substr
, and the relationship between them. I used GCC 6.3.0 as the compiler.
For strings of 30 characters
Just out of curiosity. Indicators without optimization.
But now to more important indicators. GCC with full optimization.
Optimization doesn’t really matter if std::string
but we see a big difference in the case std::string_view
. Creating a substring with std::string_view
about 45 times faster than using std::string
. What is this if not a reason to use std::string_view
?
For strings of other sizes
Now I’m even more interested. What happens if I play with the size count
substrings? Of course, all indicators for maximum optimization. I rounded them up to the 3rd decimal place.
I’m not surprised, the numbers reflect guarantees of complexity std::string::substr
as opposed to std::string_view::substr
. The complexity of the former depends linearly on the size of the substring; the second does not depend on the size of the substring. Ultimately, std::string_view
is radically superior std::string
.
An open lesson will take place tonight, where we will figure out which basic algorithms are included in the STL. During the lesson, we will get acquainted with the search and sorting algorithms. You can sign up on the course page “C++ Developer. Professional”.