Implementing an Efficient Tuple with C++26

The world has seen many amateur implementations std::tupleand selling your bikes is probably a really effective way of learning: you can hardly say that you really understand something if you can’t explain how it works.

Many inquisitive minds have wondered for decades: how is it implemented? std::tuplehow can I implement my tuple (tuple)? [1]

And many answers have been given to these questions and many articles have been written ([2]). However, I dare say that they all have one thing in common. critical flaw! More specifically, they all basically consider only one (and at the same time ineffective) way of implementation: using multiple inheritance or recursive instantiation, which in turn has many of its own disadvantages, the main one being inefficient use of memory.

While modern C++ allows you to implement a tuple much more simply (without an abundance of boilerplate) and more efficiently.

Design notes

The cornerstone of this idea of ​​effective implementation is simple byte array (hereinafter referred to as storage), the size of which will be sufficient to store all members of the tuple. Of course, it imposes on us the requirement to think about alignment: members are of different types, which have different alignment requirements, and we will have to take this into account when placing them in storage.

But how exactly should we place members to get maximum memory efficiency?

Let's assume we have a type tuple<char, double, int> — the naive version of its placement in memory (see the figure and listing below), in which we simply arrange the members (taking into account alignment) one after another in their original order, is clearly not suitable for us.

Figure 1 - Naive variant of tuple<char, double, int> placement in memory” title=”Figure 1 – Naive variant of tuple<char, double, int> placement in memory” width=”5844″ height=”728″ src=”https://habrastorage.org/getpro/habr/upload_files/686/f6b/025/686f6b0258eb97563e276c417c007e42.png”/></p><p><figcaption>Figure 1 – Naive variant of tuple<char, double, int> placement in memory</figcaption></p></figure>
<details class= Listing 1 – Naive way to place a tuple in memory (if you don't feel comfortable looking at the pictures)
struct tuple  /* size: 24, align: 8 */
{
  char a;                         /* offset: 0, size: 1
  char __padding[7];                            size: 7 */
  double b;                       /* offset: 8, size: 8 */
  int c;                          /* offset: 16, size: 4
  char __padding[4];                            size: 4 */
};

It requires 24 bytes of memory while the same members can be placed much more efficiently. Thus, the following variant is optimal and requires only 16 bytes:

Figure 2 - One of the optimal options for placing a tuple<char, double, int> in memory” title=”Figure 2 – One of the optimal options for placing a tuple<char, double, int> in memory” width=”4168″ height=”728″ src=”https://habrastorage.org/getpro/habr/upload_files/43c/bce/0b4/43cbce0b4b4360877a815036bd712f5f.png”/></p><p><figcaption>Figure 2 – One of the optimal options for placing a tuple<char, double, int> in memory</figcaption></p></figure>
<details class= Listing 2 – One of the optimal options for placing a tuple in memory
struct tuple  /* size: 16, align: 8 */
{
  double b;                       /* offset: 0, size: 8 */
  char a;                         /* offset: 8, size: 1
  char __padding[3];                            size: 3 */
  int c;                          /* offset: 12, size: 4 */
};

But how can we find the optimal placement of types in memory for any combination of types? There is an algorithm for this: it is enough to simply sort the types by their alignment requirements in descending order. In our case, since alignof(double) > alignof(int) > alignof(char)it will look like this:

Figure 3 - Optimal placement of tuple<char, double, int> in memory” title=”Figure 3 – Optimal placement of tuple<char, double, int> in memory” width=”4164″ height=”728″ src=”https://habrastorage.org/getpro/habr/upload_files/4c0/a9b/9fe/4c0a9b9fe28587e17ef3347304c35dea.png”/></p><p><figcaption>Figure 3 – Optimal placement of tuple<char, double, int> in memory</figcaption></p></figure>
<details class= Listing 3 – Optimal placement of tuple in memory
struct tuple  /* size: 16, align: 8 */
{
  double b;                       /* offset: 0, size: 8 */
  int c;                          /* offset: 8, size: 4 */
  char a;                         /* offset: 12, size: 1
  char __padding[3];                            size: 3 */
};

We got the same 16 bytes – this is the minimum size that our tuple can have, taking into account the alignment requirements.

Taking into account the above considerations, we can already implement our tuple in pseudocode:

template <typename... T>
struct tuple {
    constexpr tuple(T&&... args) {
      /* Инициализируем хранилище */
      1) Отсортировать типы T по их требованиям к выравниванию
      2) Для каждого T:
        2.1) Разместить его на требуемом месте в массиве
        2.2) Инициализировать его соответствующим значением args
    }

    template <std::size_t I>
    constexpr auto& get() noexcept {
      1) Отсортировать типы T по их требованиям к выравниванию,
         сохраняя информацию об их исходном индексе (относительно T...)
         IOriginal
      3) Получить тип TOriginal, имевший исходный индекс IOriginal
      2) Получить информацию о смещении (=offset) внутри storage для
         TOriginal
      return *std::launder(reinterpret_cast<TOriginal*>(storage + offset))
    }
private:
    alignas(T...) std::byte storage[(sizeof(T) + ...)];
};

This sounds like something that requires complex template magic: “sort the types” sounds scary (even though, for our purposes, we can actually reduce it to sorting regular objects). And it does require complex template magic if we're talking about C++11, C++14, and even C++17.

But since then the possibilities have expanded significantly constexprand the language also gained such nice features as pack indexing (cppreference). So in reality everything will look quite simple and clear. Let's proceed to the real implementation.

Implementation

As a basis, we will take our previous listing (from which, of course, we will remove all pseudo-code) (godbolt):

#include <cstddef>

template <typename... T>
struct tuple {
  constexpr tuple(T&&... args) { }
  
  template <std::size_t I>
  constexpr auto& get() noexcept { }
private:
  alignas(T...) std::byte storage[(sizeof(T) + ...)];
};

// Corner case: пустой tuple
// Проще всего реализовать его как специализацию
template <>
struct tuple<> { };

The first thing we need to implement a constructor is a helper type MemberInfowhich we will use to store information about each T (each member of our tuple): its original index in T...alignment, size and offset relative to &storage[0] (in other words, where is this member in storage will be located).

And along with this we implement the method get_members_infofilling in compile time MemberInfo for everyone T (godbolt):

// ...

#include <utility>
#include <array>
#include <algorithm>

template <typename... T>
struct tuple {
  // ...
private:
  struct MemberInfo {
    std::size_t original_idx;
    std::size_t align;
    std::size_t sizeof_;
    std::size_t offset;
  };

  template <std::size_t... I>
  consteval static auto get_members_info(std::index_sequence<I...> idx) {
    // Для каждого T сохраняем его исходный индекс относительно T...,
    // alignof и sizeof
    std::array<MemberInfo, sizeof...(T)> res = {
      MemberInfo { I, alignof(T), sizeof(T), 0 }...
    };
    // Сортируем полученный массив по требуемым выравниваниям, чтобы
    // получить оптимальное размещение
    std::sort(res.begin(), res.end(), [](const auto& lhs, const auto& rhs) {
      return lhs.align > rhs.align;
    });
    // Считаем смещение относительно &storage[0] для каждого из членов
    for (std::size_t idx = 1, size = res.size(); idx != size; ++idx) {
      res[idx].offset = res[idx - 1].offset + res[idx - 1].sizeof_;
    }
    return res;
  }
  
  consteval static auto get_members_info() {
    return get_members_info(std::make_index_sequence<sizeof...(T)>());
  }
  // ...
};

// ...

Now it is not difficult to implement the constructor itself (godbolt):

// ...

#include <new>

template <typename... T>
struct tuple {
  constexpr tuple(T&&... args) {
    initialize_storage(std::make_index_sequence<sizeof...(T)>(), std::forward<T>(args)...);
  }
  // ...
private:
  // ...
  template <std::size_t... I>
  constexpr auto initialize_storage(std::index_sequence<I...> idx, T&&... args) {
    // Получаем всю необходимую нам информацию о членах
    constexpr static auto info = get_members_info();
    // Размещаем каждый член с помощью placement new в storage
    // Обратите внимание: мы не можем написать T...[I] или args...[I],
    // так как мы отсортировали члены по их выравниваниям
    /// и на I-ой позиции в info может оказаться совсем другой член.
    // Именно поэтому мы сохраняем в info original_idx
    (
      new (storage + info[I].offset)
      T...[info[I].original_idx](args...[info[I].original_idx]),
      ...
    );
  }
  // ...
};

// ...

And we can already start covering this code with tests: in particular, we will make sure that our tuple is really optimal in terms of memory consumption (by comparing its size with the size of a similar one). std::tuple) (godbolt):

#include <tuple>
#include <cassert>
#include <string>

// ...

int main() {
  // Все эти assertы выполняются без ошибок
  tuple<int, double> x1(1, 2.0);
  assert(sizeof(x1) == sizeof(std::tuple<int, double>{}));
  
  tuple<double, int> x2(2.0, 1);
  assert(sizeof(x2) == sizeof(std::tuple<double, int>{}));
  
  tuple<double, char, int, std::string> x3(2.0, 'A', 1, "ABC");
  assert(sizeof(x3) == sizeof(std::tuple<double, char, int, std::string>));
}

All that remains is to implement the method get<I>() (along with tests for it) (godbolt):

// ...

template <typename... T>
struct tuple {
  // ...
  template <std::size_t I>
  constexpr auto& get() noexcept {
    constexpr static auto info = get_members_info();
    // Нас интересует один конкретный член: с original_idx == I
    // В частности, нас интересует только его offset — он нам нужен для
    // того, чтобы найти этот член в storage
    constexpr auto it = std::find_if(info.begin(), info.end(), [](const auto& element) {
      return element.original_idx == I;
    });
    if constexpr (it == info.end()) {
      // Если его не оказалось — пользователь указал некорректный индекс
      // Выводим красивое сообщение об ошибке
      static_assert(false, "get<I>() access out of bounds");
    } else {
      // Иначе, пользуясь сохраненным offset, находим этот член в storage и
      // возвращаем его
      constexpr auto type = *it;
      return *std::launder(reinterpret_cast<T...[I]*>(storage + type.offset));
    }
  }
  // ...
};

// ...

int main() {
  // Все эти assertы выполняются без ошибок
  tuple<int, double> x1(1, 2.0);
  assert(x1.get<0>() == 1 && x1.get<1>() == 2.0);
  // ...

  tuple<double, int> x2(2.0, 1);
  assert(x2.get<0>() == 2.0 && x2.get<1>() == 1);
  // ...
  
  tuple<double, char, int, std::string> x3(2.0, 'A', 1, "ABC");
  assert(x3.get<0>() == 2.0 && x3.get<1>() == 'A' && x3.get<2>() == 1 && x3.get<3>() == "ABC");
  // x3.get<4>();
  // ^ Код выше при раскомментировании упадет с красивым сообщением
  // ...
}

And, of course, since we placed objects in memory manually (using placement new), we also need to manually in the destructor ~tuple call destructors for all allocated objects. By analogy with initialize_storagethis can be implemented as follows (godbolt):

// ...

template <typename... T>
struct tuple {
  // ...
  constexpr ~tuple() {
    deinitialize_storage(std::make_index_sequence<sizeof...(T)>());
  }
private:
  // ...
  template <std::size_t... I>
  constexpr auto deinitialize_storage(std::index_sequence<I...> idx) {
    // Получаем всю необходимую нам информацию о членах
    constexpr static auto info = get_members_info();
    // Вызываем деструктор каждого из членов, стараясь при этом
    // не запутаться в индексах
    (
      std::launder(
        reinterpret_cast<T...[info[I].original_idx]*>(
          storage + info[I].offset
        )
      )->~T...[info[I].original_idx](),
      ...
    );
  }

Now we are great! Our homemade tuple

  1. Surpasses std::tuple by memory efficiency. Take a look at the test results below for yourself (godbolt).

// ...

int main() {
  // ...

  // Наш самодельный tuple blazing fast и уделывает
  // стандартный тупль в некоторых кейсах, так как std::tuple
  // не оптимизирует размещение членов в памяти
  assert(sizeof(tuple<char, double, int>) == 16);
  assert(sizeof(std::tuple<char, double, int>) == 24);
}
  1. Damn fast. Thanks to the use constexpr And consteval Most of the lines we write won't even appear in the binary, as they will be executed at compile time.

Now you know how to easily and effectively implement tuple in an interview!

Published with the support of C++ Moscow

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *