Just don't copy it

What I’m going to talk about in this article is so trivial that any developer, even a beginner, already knows it – I really really hope so. However, the unreviewed code that comes in shows that people have done and continue to do something similar:

bool LoadAnimation(str::string filename);
void DrawLines(std::vector<Points> path);
Matrix RotateObject(Matrix m, Angle angle);
int DrawSprite(Sprite sprite);

What do these functions have in common? Argument by value. And every time such a functionality is called in code, a copy of the input data is created in its temporary context and passed inside the function. And you can also forgive rarely called code, such as loading animations, or rely on the compiler that optimizations will work and it will destroy data copying, but as luck would have it, most often this approach to development destroys only performance and fps.


Any optimization must be approached only! after analyzing them in the profiler, copies, as it turned out, may not be an expensive operation. For example, this depends on the size of the object, so the compiler does an excellent job of passing objects up to 32 bytes by value, of course there are costs, but they are very insignificant and are not caught in benchmarks. The vendor can screw up “something in the platform and the compiler” that copying 32kb from special memory areas will be faster than adding a pair of numbers. And in the game itself, hot code optimization, let's be honest, is often not the biggest problem in overall performance. But dynamic memory allocation can present many surprises, especially when used thoughtlessly.

But even if the overhead is small, is there any point in wasting CPU cycles when you can avoid it? These “lost 2-3%” of smeared perf, which do not glow even in the profiler, are very difficult to catch later, and even more difficult to repair.

Hidden location on lines
#include <string>
#include <numeric>

size_t PassStringByValueImpl(std::string str) {
    return std::accumulate(str.begin(), str.end(), 0, [] (size_t v, char a) { 
        return (v += (a == ' ') ? 1 : 0);
    });
}

size_t PassStringByRefImpl(const std::string& str) {
    return std::accumulate(str.begin(), str.end(), 0, [] (size_t v, char a) { 
        return (v += (a == ' ') ? 1 : 0);
    });
}

const std::string LONG_STR("a long string that can't use Small String Optimization");

void PassStringByValue(benchmark::State& state) {
    for (auto _ : state) {
        size_t n = PassStringByValueImpl(LONG_STR);
        benchmark::DoNotOptimize(n);
    }
}
BENCHMARK(PassStringByValue);

void PassStringByRef(benchmark::State& state) {
    for (auto _ : state) {
        size_t n = PassStringByRefImpl(LONG_STR);
        benchmark::DoNotOptimize(n);
    }
}
BENCHMARK(PassStringByRef);

void PassStringByNone(benchmark::State& state) {
    for (auto _ : state) {
        size_t n = 0;
        benchmark::DoNotOptimize(n);
    }
}
BENCHMARK(PassStringByNone);

QuickBench: https://quick-bench.com/q/f6sBUE7FdwLdsU47G26yPOnnViY

Hidden location on arrays
size_t SumValueImpl(std::vector<unsigned> vect)
{
    size_t sum = 0;
    for(unsigned val: vect) { sum += val; }
    return sum;
}

size_t SumRefImpl(const std::vector<unsigned>& vect)
{
    size_t sum = 0;
    for(unsigned val: vect) { sum += val; }
    return sum;
}

const std::vector<unsigned> vect_in = { 1, 2, 3, 4, 5 };

void PassVectorByValue(benchmark::State& state) {
    for (auto _ : state) {
        size_t n = SumValueImpl(vect_in);
        benchmark::DoNotOptimize(n);
    }
}
BENCHMARK(PassVectorByValue);

void PassVectorByRef(benchmark::State& state) {
    for (auto _ : state) {
        size_t n = SumRefImpl(vect_in);
        benchmark::DoNotOptimize(n);
    }
}
BENCHMARK(PassVectorByRef);

void PassVectorByNone(benchmark::State& state) {
    for (auto _ : state) {
        size_t n = 0;
        benchmark::DoNotOptimize(n);
    }
}
BENCHMARK(PassVectorByNone);

QuickBench: https://quick-bench.com/q/GU68xgT0r97eYaCKxMzm9bXJei4

The compiler is still smarter

In cases where the copied object takes up less than a couple of tens of bytes in size, we will not notice any difference at all from passing it by reference, to the point that the generated assembly code will be “almost the same”.

There is copying, but it does not affect
struct Vector{
    double x;
    double y;
    double z;
};

double DotProductObjectImpl(Vector a, Vector p){
    return (a.x*p.x + a.y*p.y + a.z*p.z);
}

double DotProductRefImpl(const Vector& a, const Vector& p){
    return (a.x*p.x + a.y*p.y + a.z*p.z);
}

void DotProductObject(benchmark::State& state) {
    Vector in = {1,2,3};
    for (auto _ : state) {
        double dp = DotProductObjectImpl(in, in);
        benchmark::DoNotOptimize(dp);
    }
}
BENCHMARK(DotProductObject);

void DotProductRef(benchmark::State& state) {
    Vector in = {1,2,3};
    for (auto _ : state) {
        double dp = DotProductObjectImpl(in, in);
        benchmark::DoNotOptimize(dp);
    }
}
BENCHMARK(DotProductRef);

void DotProductNone(benchmark::State& state) {
    for (auto _ : state) {
        size_t n = 0;
        benchmark::DoNotOptimize(n);
    }
}
BENCHMARK(DotProductNone);

QuickBench: https://quick-bench.com/q/drlH-a9o4ejvWP87neq7KAyyA8o

In this example, we of course know the size of the structure, and the example is very simple. On the other hand, if passing by reference is clearly no slower than passing by value, using const& will be “such best as we can”. And the transfer of primitive types by const& and doesn't affect anything at all compilation with flag /Ox

And since there are no advantages, write something like this const int &i makes no sense, but some still write.

Reserve my vector

Arrays have a huge advantage over other data structures that often overrides any conveniences of other containers: their elements follow each other in memory. We can discuss for a long time the impact of the used algorithm on the operating time and how this can affect performance, but we don’t have anything faster than processor caches, and the more elements are in the cache, the faster the most banal search will work. Any access outside the L1 cache immediately increases the operating time.

But when working with vectors (dynamic arrays), many people forget or don’t remember what’s under the hood. And there, if the allocated space runs out, and it was, for example, allocated for 1 (one) element, then:

  1. A new memory block that is larger is allocated.

  2. All elements that were saved in the new block are copied.

  3. The old memory block is deleted

All these operations are costly, very fast, but still costly. And they happen under the hood and are not visible:
– if you don’t mess around, look at the code of the standard library
– do not look at the location profiler
– trust the code written by the vendor (although here you have to accept things as they are)

Use reserve, Luke
static void NoReserve(benchmark::State& state) 
{
  for (auto _ : state) {
    // create a vector and add 100 elements
    std::vector<size_t> v;
    for(size_t i=0; i<100; i++){  v.push_back(i);  }
    benchmark::DoNotOptimize(v);
  }
}
BENCHMARK(NoReserve);

static void WithReserve(benchmark::State& state) 
{
  for (auto _ : state) {
    // create a vector and add 100 elements, but reserve first
    std::vector<size_t> v;
    v.reserve(100);
    for(size_t i=0; i<100; i++){  v.push_back(i);  }
    benchmark::DoNotOptimize(v);
  }
}
BENCHMARK(WithReserve);

static void CycleNone(benchmark::State& state) {
  // create the vector only once
  std::vector<size_t> v;
  for (auto _ : state) {
    benchmark::DoNotOptimize(v);
  }
}
BENCHMARK(CycleNone);

QuickBench: https://quick-bench.com/q/OuiFp3VOZKNKaAZgM_0DkJxRock

And finally, an example of how you can kill a perf and get a drop of up to 10 FPS out of the blue, just by moving the mouse across the playing field. I won’t name the engine, the bug has already been fixed. If you find a mistake, write in the comments 🙂

bool findPath(Vector2 start, Vector2 finish) {
   ...

   while (toVisit.empty() == false) 
   {
      ...

      if (result == OBSTACLE_OBJECT_IN_WAY)
      {
         ...

         const std::vector<Vector2> directions{ {1.f, 0.f}, {-1.f, 0.f}, {0.f, 1.f}, {0.f, -1.f} };
         for (const auto& dir : directions)
         {
            auto nextPos = currentPosition + dir;
            if (visited.find(nextPos) == visited.end())
            {
               toVisit.push({ nextPos, Vector2::DistanceSq(center, nextPos) });
            }
         }
      }
   }
}

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *