Uncertain behavior can lead to time travel


The languages ​​C and C ++ are notorious for large areas on maps that are marked with the warning “dragons live here”, And more formally, we are talking about undefined behavior

When faced with undefined behavior, anything can happen. For example, a variable can be both true and false at the same time… John Regehr’s blog has a couple of interesting examples undefined behavior, and several winners the competition announced in the comments for the craziest object code generated by the compiler as a result of undefined behavior.

Consider the following function:

int table[4];
bool exists_in_table(int v)
{
    for (int i = 0; i <= 4; i++) {
        if (table[i] == v) return true;
    }
    return false;
}

What does this have to do with time travel, you ask? Wait a little patience.

First of all, in the loop control circuitry, you may have noticed an offset by one error. As a result, the function will read one more element following the final element of the data array in tablebefore the cycle ends. The classic compiler doesn’t really care about that. It will simply generate the code to read the element out of bounds of the array (despite the fact that this is a violation of the language syntax rules), and will return trueif in memory outside of the array, there would be a suitable value.

On the other hand, a postclassical compiler could do the following analysis:

  • When i is 4, undefined behavior occurs in the code. Since undefined behavior allows me to do whatever I want, I can ignore this case entirely and assume that i will never be equal to four. (If the guess is wrong, something unexpected will happen, but that’s okay, because undefined behavior allows me to act unpredictably.)

  • The case where i is 5 is impossible, because in order for this to happen, we must first go through i = 4, which, as we are allowed to assume, may not happen.

Thus, the postclassical compiler can optimize the function like this:

bool exists_in_table(int v)
{
    return true;
}

So, this already looks strange. The function has been optimized to almost nothing due to undefined behavior. Note that even if the value is not present in table (including even invalid fifth Element), the function is all will return true

We can now take postclassical behavior one step further: because the compiler can assume that undefined behavior will never occur (after all, if it does, the compiler can do anything), for example, it can exploit undefined behavior for optimization purposes.

int value_or_fallback(int *p)
{
 return p ? *p : 42;
}

The above function takes a pointer to an integer and either returns the specified value or (if the pointer is null) returns the fallback value 42. So far, everything is fine.

Now, for debugging purposes, let’s add this line to the function.

int value_or_fallback(int *p)
{
 printf("The value of *p is %dn", *p);
 return p ? *p : 42;
}

The added line contains a bug: it dereferences the pointer pwithout first checking whether it is equal to null… However, this small bug has massive consequences. The postclassical compiler optimizes the function because it sees that the null pointer check is no longer required, as follows:

int value_or_fallback(int *p)
{
 printf("The value of *p is %dn", *p);
 return *p;
}

If the pointer were equal null, then printf would already cause undefined behavior, so the compiler is allowed to do whatever it wants in case the pointer has a value null (even for example, imagine that a pointer is not equal to null).

Okay, this is not surprising. It is possible that this is the kind of optimization you expect from the compiler. (For example, if the ternary operator was hidden in a macro, you might expect the compiler to not check, which is guaranteed to be false.)

But the postclassical compiler can now use an error-prone function to begin time travel.

void unwitting(bool door_is_open)
{
 if (door_is_open) {
  walk_on_in();
 } else {
  ring_bell();
  // ждем, пока дверь откроется, со значением по умолчанию
  fallback = value_or_fallback(nullptr);
  wait_for_door_to_open(fallback);
 }
}

The postclassical compiler can optimize the entire function like this:

void unwitting(bool door_is_open)
{
 walk_on_in();
}

What happened?

The compiler noticed that the call value_or_fallback(nullptr) results in undefined behavior along all code paths. Extending this analysis backwards, the compiler notices that if door_is_open has the meaning false, then in the branch else undefined behavior is invoked along all code paths. Therefore, the entire else branch can be considered impossible.

Now we finally get to time travel:

void keep_checking_door()
{
 for (;;) {
  printf("Is the door open? ");
  fflush(stdout);
  char response;
  if (scanf("%c", &response) != 1) return;
  bool door_is_open = response == 'Y';
  unwitting(door_is_open);
 }
}

A postclassical compiler can remember the conclusion that “if door_is_open is false, then undefined behavior occurs” and rewrite this function as follows:

void keep_checking_door()
{
 for (;;) {
  printf("Is the door open? ");
  fflush(stdout);
  char response;
  if (scanf("%c", &response) != 1) return;
  bool door_is_open = response == 'Y';
  if (!door_is_open) abort();
  walk_on_in();
 }
}

Note that the original code rang the bell before crashing, and the optimized function skips the call and crashes right away. We can say that the compiler went back in time and canceled the ringing bell.

“Journey into the past” is possible even for objects with external visibility, such as files, because the standard allows everything to happen when undefined behavior is detected. And it lets you jump into a time machine and pretend you never called fwrite

Even if you start to argue that the compiler is not allowed to travel in time, you can still see that previous operations have been undone. For example, it might happen that an undefined operation would corrupt the file buffers, so the data was never actually written. Even if the buffers were flushed, this operation could result in a call to the ftruncate function, which was created to logically delete the data just written. Or it could lead to the fact that DeleteFile deleted the file you thought you created.

All of these behaviors have one common, noticeable effect, namely undoing the previous action. Whether the action was performed, changed, or never occurred is a controversial issue from a compiler theory perspective.

The compiler may also have multiplied the effect of an undefined operation in the past.

¹ Author’s note: According to the standard, time travel is acceptable before undefined behavior occurs:

However, if any such execution contains an undefined operation, this International Standard imposes no requirements on an implementation that executes that program with that input (this also applies to operations preceding the first undefined operation).

² Another way to look at this transformation is that the compiler saw that the else branch invokes undefined behavior for all code paths, and then rewrote the code using the rule that undefined behavior allows anything:

void unwitting(bool door_is_open)
{
 if (door_is_open) {
  walk_on_in();
 } else {
  walk_on_in();
 }
}

In this case, “walk_on_in”Was mistaken for“ anything ”.

Additional Information: Note that there are some categories of undefined behavior that may not be obvious. For example, dereferencing a null pointer is undefined behavior, even if you try to counteract the dereference before it does anything dangerous.

int p = nullptr;
int& i = p;
foo(&i); // undefined

You might think that & and are mutually destructed and the result is the same as if you had written foo (p), but the fact that you created a reference to a non-existent object, even if you didn’t essentially do so, causes undefined behavior ( §8.5.3 (1)).

Related articles: What Every C Programmer Should Know About Undefined Behavior. part 1, part 2, part 3

Update: split & into two lines because a single one is the problem.


Material prepared as part of the course “Programmer C”.

We invite everyone to a free demo lesson “WinAPI: writing a GUI application in C and cross-compiling from Linux.” In an open lesson, we will analyze a simple GUI application using WinAPI, as well as compile an executable .exe file for Windows, while in Linux, using the CMake build tool.
>> REGISTRATION

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *