UB or not UB: duplicate the std::vector element

In this article we will find out whether, from the point of view of the C++ language standard, it is possible to duplicate a std::vector element with a trivial push_back call. Answering a simple question, we will be faced with more interesting ones: what is the internal world of a vector, how iterators “fade” during reallocation, what restrictions add safety guarantees regarding exceptions…

Perhaps the most common container from the C++ standard library is std::vector. The linear arrangement of data in memory gives us pleasant bonuses: access to an arbitrary element in constant time, fast sequential passage due to the efficient use of the cache. The other side of the coin is expensive insertion into an arbitrary place (in linear time) and invalidation of iterators during reallocations and more. The latter especially “pleases” beginners, who benefit from harmless-looking functions like add And erase demons begin to jump out below UB (Undefined Behavior – unpredictable program behavior as a result of violation of the agreement fixed in the language standard).

void add(std::vector<int> vec) {
  for (int v: vec) { // UB
    if (/*некоторое условие*/)
      vec.push_back(42);
  }
}

void erase() {
  std::vector vec = {1, 2, 3};
  auto two = std::next(vec.begin());
  std::erase(vec, 1);
  std::cout << *two; // UB
}

If the problems in the code above are not obvious, just go to cppreference in the documentation for vector::push_back() or any other method whose call potentially results in reallocation. After that, look under the spoiler below, where the insertion process is illustrated with a visual animation.

The problem of disability on the fingers

This is a good, normal situation when understanding the error (and its cause) becomes obvious after a few minutes of reading the documentation. And as an opposite example, hoping that the problems above are understandable and do not cause interest, I propose to jointly consider the following question: is such code correct:

void duplicateLatest(std::vector<int>& vec) {
  assert(!vec.empty());
  vec.push_back(vec.back());
}

A similar entry caught my eye when I was looking for a bug in someone else’s code, which occasionally crashed in the absence of sanitizers. The internal bell rang: “stop, isn’t this UB”? Overload used push_back(const T& v) accepts an argument v by reference, which means the implementation of the vector will have to create a new element from another one located in its own buffer. What if by the time the copy is constructed, the internal buffer has already been implemented? Intuition suggests that similar code in various versions has already been written many times and everything should work, but who in the C++ world relies on intuition…

As a starting point, let's look at description methods for changing the vector, we see what is expected:

Remarks: Causes reallocation if the new size is greater than the old capacity. Reallocation invalidates all the references, pointers, and iterators referring to the elements in the sequence, as well as the past-the-end iterator.

And at first glance, this is enough to confidently state: the code above will lead to UB if by the time of the call push_back assertion in progress size() == capacity(). Every developer knows that bicycles are most often unproductive, but educational and fun. To study the issue from the inside, let's start by trying to write a minimal implementation that meets the standard push_back, in which the problem will be reproduced. In this case, we will come to the answer to the question, and the answer itself will be in the next section.

Broken implementation

For example, such a naive insertion operation at the end will make the result of the entire program undefined:

 1 //_buf - сырой указатель на внутренний буфер, _capacity - его вместимость
 2 void push_back_realloc_v1(const T& value, size_t next_capacity) {
 3   T* next_buffer = allocate(next_capacity);
 4   T* place_for_value = std::uninitialized_move(begin(), end(), next_buffer);
 5   deallocate(_buf, size());
 6   new (place_for_value) T(value); // потенциальное use-after-free
 7   std::swap(_buf, next_buffer);
 8   std::swap(_capacity, next_capacity);
 9 }
10
11 void push_back(const T& value) {
12   if (size() == capacity()) {
13     size_t next_capacity = size() * growth_rate + 1;
14     push_back_realloc_v1(value, next_capacity);
15   } else {/*тривиальное добавление в существующий буфер*/}
16   // увеличение счетчика длины или указателя на конец данных
17 }

Here, if the buffer is exhausted, we end up in push_back_realloc_v1, where first of all we allocate a new one, of some guaranteed larger size. Then we move all the elements from the old one into it and free the old internal buffer _buf challenge deallocate (we will assume that in implementing the latter we not only ask the allocator to take over the memory that has become unnecessary, but also do not forget to go through all size() elements and call their destructors). Next we try to create a copy value and place it after the last element in the new buffer and… Houston, we have a problem. If value was stored in our original vector, then at this moment we access memory that has already been freed. The program is broken. Before updating our vector with a pointer to the internal buffer and _capacity we won't get there anymore.

Done, have you written a correct implementation that leads to UB in the case of interest? No, there are still enough blunders here: if T there is no moving constructor, the code simply won’t build. If there is and it throws an exception on line 4, the memory will leak. But from the point of view of our experiment, it is much more serious that they forgot about requirement strict security assurance standard regarding exceptions:

Remarks: If an exception is thrown while inserting a single element at the end and T is Cpp17CopyInsertable or is_nothrow_move_constructible_v is true, there are no effects. Otherwise, if an exception is thrown by the move constructor of a non-Cpp17CopyInsertable Tthe effects are unspecified.

Disclaimer for cases not related to our overload push_back (when the copy constructor is unavailable and the move constructor throws an exception) is not very interesting. But the first requirement is fatally violated. Perhaps the safety of the original example follows implicitly from this statement? Let's try to provide strong exception guarantee. What could go wrong now? If we fail to allocate memory (line 3), everything is fine: we did not have time to spoil the vector data. In line 4 the situation is more interesting: if one of the moving constructors throws an exception, the original state of the vector cannot be restored. Calls to destructors and deallocation (5) guarantee no exceptions (unless you have strange objects with non-noexcept destructors, they have no place in standard containers). Finally, line (6) can throw a copy constructor, in which case the user of the container shouldn't notice any side effects either. Correcting:

 1 void push_back_realloc_v2(const T& value, size_t next_capacity) {
 2   T* next_buffer = allocate(next_capacity);
 3   {
 4     T* out = next_buffer;
 5     auto _ = gsl::finally([&]{ deallocate(next_buffer, out - next_buffer); });
 6 
 7     for (auto in = begin(); in != end(); ++in, ++out) {
 8       new (out) T(std::move_if_noexcept(*in));
 9     }
10     new (out) T(value); // потенциальное use-after-move
11     std::swap(_buf, next_buffer);
12     std::swap(_capacity, next_capacity);
13     out = next_buffer + size();
14   }
15 }

This option is a little more complicated and much better. Using a RAII wrapper over a memory block next_buffer (in order not to bloat the example code is taken final_action from GSL). An object _ at the end of his life (exiting area 3-14) guarantees a call to the one transferred to gsl::finally lambdas. Now filling the new buffer is based on a standard function move_if_noexcept: if T non-throwing move constructor, get an rvalue reference and call the move constructor, otherwise copy from the lvalue reference. An exception may still be thrown when calling the copy constructor (lines 8 and 10), in which case our unnamed guard will be thrown before the function exits _ by using deallocate will ensure that the destructors of all already copied objects are called out - next_buffer pieces and freeing the temporary buffer. If all the constructors have worked and execution has passed line 10, there will be no more exceptions, it’s time to change the temporary and main buffer pointers. In line 13 we recall that existence _ is coming to an end and is about to be called deallocate. Trying to calculate out - next_buffer will now lead to UB, because next_buffer now a pointer to the old buffer, and out – still like new. Therefore, as payback for the syntactic sugar final_actionwe need to update the value out solely for the correct determination of the number of elements in the freed buffer.

Great, we can more or less confidently say that we implemented it for the experimental class bad_vector method push_back, which took into account all the requirements for it, clearly listed on the corresponding cppreference.com page (and in the standard). Now let's reproduce the situation where push_back breaks our example. Let's assume that

  • at the class T there is a nothrow move constructor

  • the moment came when vec.size()==vec.capacity()

  • wanted to duplicate your element vec.push_back(vec.back())

Under these conditions, when dragging elements from the old buffer to the new one, the move constructors will be called. It is important that each of them generally has the right to pull the insides out of the object. At the moment of movement, we light the fuse, and explosion an interesting effect occurs on line 10 when we try to construct a copy value from the moved object. The standard imposes a rather vague, in my opinion, requirement validity to objects from std in after move state and clarifiesthat the invariants declared by the container/algorithm must be preserved:

Cpp17MoveConstructible requirements: <...> Note 1:rv must still meet the requirements of the library component that is using it. The operations listed in those requirements must work as specified whether rv has been moved from or not.

As long as an object of a user-defined type does not touch the standard library, from the point of view of the main language document, the moving constructor/assignment operator has the right to do any indecent things with it (not required even the ability to call a destructor on a moved object). When we place this object in a standard container (more precisely, we start using some methods), the same requirements arrive Cpp17MoveInsertable and Cpp17MoveAssignable and the like. However, the requirements are relatively weak: for example, for a vector it is enough to ensure some correct behavior of copy and move constructors and a few other functions.

It follows from this that UB immediately at the time of the call push_back we won’t get it, but the contents of the vector will generally become inadequate and user invariants may break. For example, as a result of executing the code below, we get a vector of two elements, the first of which is a pointer to one, and the second is nullptr, because constructed from displaced shared_ptr.

bad_vector<std::shared_ptr<int>> vec;
vec.push_back(std::make_shared<int>(1));
vec.push_back(vec.back());

By the way, you can completely break itbad_vector::push_back, separately handling the case where the moving constructor throws an exception but the copying constructor does not. In this thread we switch to push_back_realloc_v1in which instead of moving we use copying (std::uninitialized_copy) and get a full-fledged UB when accessing freed memory in full compliance with the strong exception guarantee:

constexpr bool copy_ctor_noexcept = noexcept(T(*std::declval<T*>()));
constexpr bool move_ctor_noexcept = noexcept(T(std::declval<T>()));
if constexpr (copy_ctor_noexcept && !move_ctor_noexcept) {
  /*привет, use-after-free*/
}

UB or not UB

We were successfully able to write a bad implementation. Let's now see how a priori good ones work: let's look under the hood push_back performed clang, gcc And msvc. A surprise awaits here: the data access pattern does not correspond to ours and is strictly repeated in each library: allocation of a new buffer, construction of the last (inserted) element, copying or moving elements from the old buffer and its deallocation. The obvious conclusion is that bad_vector unlike its adult counterparts, it still does not meet the standard: something was missed. Once again, we carefully go through the containers section in the standard:

  • strict guarantee exception safety does not prohibit the possibility of a “broken implementation” of push_back in itself, although it makes it inconvenient;

  • For vec.emplace(pos, args) explicitly statedthat the arguments can be references to values ​​from vec. For push_back (and emplace_back) there is no similar mark;

  • For vec.insert(pos, begin, end) it is no less clearly stated that the range to be inserted should not belong to the same container;

  • comments to vec.push_back

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *