Boss, it's all gone

Errors by C++ programmers are a separate form of art, a seemingly simple language, but as soon as you get distracted by a cup of coffee, the compiler starts throwing out a sheet of warnings in half with errors, and sometimes it looks more like ancient Egyptian writing than a normal output. You yourself have probably encountered nullptr dereferencing more than once or confused (= and ==) due to an oversight. Often the cause of an error is laziness or inattention, or fatigue – it’s not in vain that the superstitions “don’t commit on a Friday night”, “don’t code in a state of altered consciousness” or “avoid coding while on a coffee stupor” appeared, well, this is when you’ve had three or four mugs of coffee and went to carry the good code left and right.

In general, the article was planned about how you can make fun of the switch operator – I had a recording of a conversation from a St. Petersburg meetup that took place several years ago and where those present were asked to write “impossible” but working code with a switch. And while searching for this entry, I came across a file with a description of “Friday” bugs and commits, which in a sober mind would hardly have slipped into the master in the middle of the week. And there was also an article about errors, “Fedya, game,” where a respected habra reader guessed the cause of various errors, obvious or not. Therefore, I decided to continue in the same style, question and answer, and I invite you to also participate in guessing possible reasons. All examples are from real projects, games, engines and Friday commits.

C++ is unforgiving, but that's where its charm lies. In most of the examples given, the style and names of the parameters have been preserved, only the comments have been cleaned up a little so as not to scorch the office.


0x10

Ignoring development rules and reinventing wheels in most cases ends in a bug. So here, the engine sometimes crashed when rendering UI widgets, but more often it simply spoiled other widgets that were entirely made on POD types.

std::vector<int> Widget::DrawElements(Element* p, int n) {
    std::vector<int> result;
    result.reserve(n);
    for (int i = 0; i < n; ++i) {
    	bool r = DrawElement(p[i]);
        result.push_back(r);
    }
    ...
    return result;
}

void Widget::Draw(int m) {
    constexpr int n = 10;
    Element a[n] = {}; // локальный буффер, связано с отложенным рендером виджетов
    // copy elements
    // ...
    DrawElements(a, m);   
    // ...
}
Where's the mistake?

Here we made a small error in the Widget::Draw function, which led to data corruption and, less often, a program crash. The “pointer, quantity” style interface leaves the DrawElements() function virtually no way to protect itself from errors of going beyond the array limits; there was another array lying there, which took the blow. If we could check indexes for array out-of-bounds, the error would have been detected much earlier. And so the error lived in the code for about six months, simply because the widget usually had less than 10 elements.

0x11

When porting the Dagor engine to the Nintendo Switch platform, we had to create a layer to work with the file system. After another Friday, the game began to crash after several loads in random places with OOM (out of memory), and the reason turned out to be just one line. This should be simple.

void OsFile::ropen(const NxString &name) {
    NxFile f = NxFileOpen(name, "r");
    // ...
    if (something) return;   
    // ...
    NxFileClose(f);
}
Where is the mistake?

Line 5 was added at some point, after fixing a bug with unsupported file types. But even a slow increase in resource consumption can eventually exhaust their availability.

A memory leak is any allocation of a resource that has not been freed. For example, if an object was allocated on the heap and the pointer to that object was lost, such resources can no longer be cleaned up. So on a Nintendo Switch console you can simultaneously open no more than 256 files per process, then the kernel descriptors run out and when you open the next file the game crashes somewhere in the bowels of musl, but it also happened that open files took up all the available memory and the game crashed honest OOM, but in a random place.

This rule should not be understood as a mandatory requirement to release all allocated resources when the program terminates. In some cases, you can rely on the operating system to automatically clean up resources, such as closing files or freeing memory when a process terminates

0x12

A newbie came to the department and was given small bugs to familiarize himself with the code base. And to make it not so boring, we were allowed to refactor the code a little in the dev branch, so that there was a history of commits and an understanding of how onboarding is progressing. On Tuesday, a complaint arrived from QA that the editor has become an order of magnitude slower to load levels – small ones still load normally, but large ones have become straight for about twenty minutes. And only in the QA department, everything else is fine. Here we need to explain a little, the testing tool we used also formatted the level configs before approval. QA suffered for a day, and decided to write to the programmers, who, hot on their heels, found this:

void lower(xstring &s) {
    for (int i = 0; i < strlen(s); ++i) {
    	s[i] = tolower(s[i]);
    }
}
Where's the mistake?

And before the fix it was like this, it’s also not great, but no one complained about the speed:

for (int i = 0, size = (int)strlen(s); i < size; ++i) {
	s[i] = tolower(s[i]);
}

The condition uses the expression i < strlen(s). This expression will be evaluated on each iteration of the loop, which means the strlen function must iterate over the string each time to determine its length. Although the content of the string changes, tolower is not expected to affect the length of the string, so it is better to keep the length of the string outside of the loop to avoid unnecessary overhead on each iteration. Now imagine what happens when the formatter tries to process a 40MB level config.

0x13

We sent another patch for boxing, but it doesn’t pass the tests. Microsoft at first simply froze and rejected the builds, then they sent compliance for forgotten sensitive data. What the hell are so forgotten?

Subject: Urgent Security Alert: Unused Encryption Keys Found in Memory
Hey Team,
Hope you're all doing great! During our recent testing sessions, we came across a serious issue regarding some leftover encryption keys in memory tied to our Secure Layered Login (SLL) system. It looks like these keys were overlooked, and if we don't deal with them quickly, they could potentially expose sensitive data to unauthorized access, the approval pipeline will be moved to pre-stage policy after one week.

Было: CL125332
void XBL::UserLogin::requestKey() {
    char secretKeyPwa[MAX];
    // ...
    // something work
    // ...
    memset(secretKeyPwa, 0, sizeof buffer);
    ENSURE(secretKeyPwa[0] == 0);
}
<->
Cтало: СL127499
void XBL::UserLogin::requestKey() {
    char secretKeyPwa[MAX];
    // ...
    // something work
    // ...
    memset(secretKeyPwa, 0, sizeof buffer);
    assert(secretKeyPwa[0] == 0);
}
Where's the mistake?

For those who read the previous article, this error will seem familiar. And, yes, don’t you think the compiler doesn’t care about your sensitive data. The ENSURE() macro compared the value from a variable with zero, which made the memset function work. The assert() macro does nothing in release, and the compiler also removed the memset() call and removed sensitive data clearing logic.

0x14

Wonderful Monday morning, a bug arrives from the designers that the entity at the level has ceased to determine the presence of a right and left hand in the skeleton, bones and locators were obtained from there to attach the hands to the corresponding places on the weapon. Or she determined that there were hands, but confused them in places, which caused various graphic glitches, such as the left hand on the butt, etc.

using namespace std;

struct UnitComponent {
    string name;
    int number;

    UnitComponent(string nm, int num) : name(nm), number(num) {}
    virtual bool operator==(const UnitComponent& a) const {
        return name == a.name && number == a.number;
    }
    // ...
};

struct UnitHand : public UnitComponent {
    bool left_hand;

    UnitHand(string nm, int num, char ch) : UnitComponent(nm, num), left_hand(ch) {}
    virtual bool operator==(const UnitHand& a) const {
        return UnitComponent::operator==(a) && left_hand == a.left_hand;
    }
    // ...
};
Where's the mistake?

The reason is also trivial: colleagues refactored their code and decided, instead of explicit comparisons of component attributes, to write comparison operators and overload them in the necessary classes. But they just forgot that you have to be very careful with such an overload, otherwise everyone risks comparing the parameters of the base class instead of their own. Of course there are ways to force
The comparison operator works across the entire inheritance hierarchy, but such naive approaches will work with errors. Try playing with this code in onlinegdb.

int main() {
    UnitComponent comp{ "test", 1 };
    UnitHand hand{ "test", 1, true };
    printf("comp == hand (%d)\n", comp == hand);    // compares name and number, ignores unit's left_hand
    UnitHand leg{ "test", 1, false };
    printf("hand == leg (%d)\n", hand == leg);   // compares name, number, and left_hand
    UnitComponent &hand_ref = hand;
    UnitComponent &leg_ref = leg;
    printf("hand_ref == leg_ref (%d)\n", hand_ref == leg_ref);  // ???

    return 0;
}

0x15

It happens that even Rockstar renderers make mistakes, who, in pursuit of perfection, write all sorts of things on Friday evenings. And then on Monday morning, a hundred changes will be reversed. In theory, the texture pool should contain empty textures that would be taken from there as objects are created. But something went wrong, and the engine crashed on the first call to the texture.

struct TextureBase {
    virtual void update() = 0;
    std::shared_ptr<int> sp;
};

struct TextureDX12 : public TextureBase {
    void update() override {}
};

void init(std::vector<TextureDX12> &a) {
    memset(a.data(), 0, sizeof(TextureDX12) * a.size());
}
Where's the mistake?

Attempts to trick the mechanism for creating an instance of a type usually end in failure. The class constructor must create a fully initialized object and there is no need for additional initialization. And using memcpy/memset to change the data of a class that is not trivially copyable leads to undefined behavior. And this is one of the most common mistakes when working with non-trivial data types, which leads to memory corruption. The error happened because one of my colleagues made a new texture class for DX12 and it was initially without vtbl, and another colleague highlighted the base class and added virtuality. Both commits formally passed the review without errors, but when they overlapped each other, it turned out to be garbage.

0x16

Another example of inattention or an accidental coincidence, the game did not have enough memory on the iPhone6, where the effect pool had to be cut. As a result, on busy scenes we got graphic artifacts in random places. But what’s funny is that the game did not crash, and sometimes these bugs were not noticed at all.

std::array<Particle, 2048> particles;
std::array<ParticleShader, 2048> shaders;

void updateEffects() {
    for (int i = 0, size = particles.size(); i < size; ++i) {
        //
        if (/* something */) ++i; 
        //
        ... some logic
    }
}
Where's the mistake?

Cycle management should allow you to correctly understand what is happening inside. Changing the loop counter both in the iteration expression and inside the loop body led to i going beyond the array boundaries and memory corruption in shaders (in rare cases), which consequently led to graphical artifacts. There was no such problem on PC because the buffer was larger and never filled completely. on mobile, the shader data was corrupted and looked bad if rendered. But more often the update of some effects was simply skipped.

0x17

You can mess up even in three lines, especially if the code has become multi-threaded; in some cases, a separate unit on the stage did not move. Investigation of the bug showed that Mover (this is the component that was responsible for navigating the unit around the stage) did not receive updates, so it was not in the pool of movers. Will you find a mistake?

Mover MoverPool::acquire() {
    static int init = 0;
    if (!init) {
        init = 1;
        create_pool();
    }

    return create_mover();
}
Hidden text

Although the init variable works perfectly in a single-threaded environment, in a multi-threaded environment two static variables lead to data races, which causes undefined behavior. In a multi-threaded application, if multiple threads access these static variables simultaneously, they may modify them in unpredictable sequences. As a result, we received two pools of movers, the majority submitted to one, and one loser submitted to the second.

0x18

You may think that you have fixed the problem, but it returns after some time. For example, in this simple bug, sometimes there were extra units left on the level, and the engine asserted this in the log. But since this all happened while the level was unloading, no one paid much attention. The person who was responsible for this system came up with a brilliant fix. When they stopped passing tests, he removed or added [volatile] and the tests started working. There are a dozen commits in history with such fixes.

[volatile] int free_slots = max_slots;

PathFinder* use()
{
    if (int n = free_slots--) return &finders[n];
}
Where's the mistake?

In C++, the volatile keyword does not enforce atomicity, does not synchronize across threads, and does not prevent instruction shuffling (either at the compiler or hardware level). It simply has nothing to do with competition. A colleague was just lucky, there’s no other way to describe it, probably by tinkering with the code in this place, he shifted the point of the error and the tests didn’t catch it for some time. The correct fix was to use atomic.

0x19

Not exactly a mistake, but it also spoiled my nerves. The price of such a code plus a few seconds to load the level. A little information on the bug; the unit has components that it loads into itself at the start of the level. It can be anything – a hand, a gun, a grenade and some other attachment, a vision system, etc. Objects can have dozens and hundreds of such components, and there can also be tens and hundreds of objects themselves on the stage – from decals and puddles, to cars and buildings.

std::map<ComponentId, Component> components;

bool ComponentVec::add(Component c) {
    ...

    if (components.find(c.id()) == components.end()) {
        components.insert({c.id(), c});
    }

    auto &compc = components[c.id()];
    ...
}
Where's the mistake?

And the mistake here is that the search in the component map is performed THREE times in the worst case, the first on find(c.id()), the second on insert() if there are no components. And the third one in the index access operator.

0x1a

And then the memory in the particle manager was slowly leaking. Can you find the culprit?

void Particle::CreateVerties(size_t p)
{
  std::shared_ptr<Vertex> ss(new Point[p * 3]); // each p is 3 vertex
  ....
}
Where's the mistake?

It will probably be in the comments 🙂

0х1b

An offensive copy-paste and an offensive mistake, the engine played only the first sound specified for the entity. But since there were usually 1 sound, rarely 2 or more, the error lived on for a long time.

int Entity::update_sounds(...)
{
  sounds.lock();
  sounds.commit();
  ...
  int i, size = sounds.size();
  for(i = 0; i < size; ++i); {
    sounds[i].pre_update(dt);
  }

  for(i = 0; i < size; ++i); {
    sounds[i].update(dt);
  }
  ...
  sounds.unlock();
}
Where's the mistake?

Please note that the for loop does not do any work, but because the variable i is moved outside the loop, it has a valid value of 0. Only the first sound in the array is updated. And the code below would have been copied and the error would have moved there.

0x1c

Another common, stupid, but difficult to detect error. Models are assigned a shader index, in most cases everything worked, but in rare situations the model was rendered with graphical errors. The shader index cannot be greater than 65k (shader cache size). There were no errors on Ps5, but on Ps4 the models had errors, and the game did not crash because the renderer worked normally with deferred shaders.

using AssignShaderCB = std::function<int()>;

void LoadModel() {
    BaseModel *model = ... 
    ...

    model.setShader([&] (Model* m) {
        const uint16_t shaderIndex = dummyShader();  
        return [&] () {
            shaderIndex = createShader(m);
            return shaderIndex;
        };
    });
}
Where's the mistake?

setShader() set a callback to get the shader index; on ps5, due to the faster cpu and pre-assembled shaders, this callback was called immediately and returned the correct index, which was still on the stack. On ps4, the callback was delayed during heavy load and called later, but the stack where it existed no longer existed and garbage was returned. And since any index within the cache was valid, the models did not receive the shaders they were supposed to. Banal dangling reference.

0x1d

And then there was some wild coincidence of circumstances, the moon was in the wrong phase, a solar storm and a witches’ Sabbath in the next office. But the code passed the tests and even worked for some time.

void BaseBehavior::ChaseEnemy(Human *h) {
    ...
    HumanType enemy = h->type();
    if ( enemy == HumanType::Corpse) {
        return;
    }

    if ( (enemy =! HumanType::Friend) || (enemy == HumanType::Follower)) {
        ...
    }
}
Where's the mistake?

The author confused != and =! and it turned out that the enemy variable was assigned !HumanType::Friend, which was non-zero, the result was 0. And participation in a logical expression hid the error and only the right side was checked. The code passed the tests and was identified only during manual review.

0x1e

In one fairly well-known engine, for a long time there was its own function for converting a number into a hex, apparently out of the box it was not satisfactory with the speed or code style. Until one of the maintainers, in a fit of unbridled refactoring, decided that there were too many zeros in this function and uploaded this commit, bypassing the tests, so what – the changes are the most banal.

до "фикса"
const char buffer[] = "abcdef\0000123456789";

после "фикса"
const char buffer[] = "abcdef\0123456789";
Where's the mistake?

Only the author of the “fix” forgot that \0xx in the middle of a line is interpreted as a number, and the unfortunate placement of digits after such a zero led to the fact that they became part of this number, i.e. it was \000 – line termination, but now \012 – new line. But it broke the algorithm, and they quickly caught it.

0x1f

Complex, this function was used in a well-known physics engine to search for an arbitrary volume defined by several planes, potentially any number of planes. It worked perfectly if there were less than 16 planes, and could fall if there were more. It should work quickly, because it was often called in runtime to check collisions between objects.

// NOTE! convex volume created in local space, you need apply xform from owner for correct world positions
template<typename T, typename E, typename V>
inline bool convex_volume_from_planes(const T &clip_planes, E &out_edges, V *out_vertices) {
    const u32 num_planes        = clip_planes.size();
    ...

    for (u32 i = 0; i < num_planes; ++i) {
        pair<vec, vec> * plane_egdes = alloca (16 * sizeof(vec) * 2);
        u32 next_plane_start_index = out_vertices ? 0 : (i + 1); // if we want to collect tris from planes, it should check all intersections
                                                                 // if we need only edges we can skip some checks, because edges was found on previous step
        for (u32 j = next_plane_start_index; j < num_planes; ++j) {
            vec line_orig, line_dir;
            ...

            vec *plane_vectices = alloca(sizeof(vec) * 3); // maybe expanded to 12 later
            ...
        }
    }
}
Where's the mistake?

In general, memory allocations in a loop are bad; they are three times worse if they are allocations on the stack, because there memory is only allocated, but not returned. The operation of this algorithm in difficult cases selected all the stack memory and crashed the engine.

That's all! Bad tickets to everyone.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *