What does it mean to initialize an int in C++?

I recently received this question in the mail from Sam Johnson. Here is Sam's letter, slightly edited:

“Let's take this code in the local scope of a function as an example:

int a;
a = 5;

Many people think that initialization happens on line 1 because websites like cppreference define it this way: “Initializing a variable gives it its initial value at the time of creation.”

However, I am convinced that the initialization occurs in line 2 because [в разных хороших книгах по C++] initialization is defined as the first significant value assigned to a variable.

Can you tell which line is considered initialization?

Excellent question. Cppreference is correct, and for all class types the answer is simple: the object is initialized on line 1 by calling its default constructor.

But (and you knew there would be a “but”) for a local object fundamental built-in type like int the answer is… a little more complicated. And that's why Sam asked the question, because he knows that the language is pretty liberal with initializing such local objects for historical reasons that made sense at the time.

The short answer is that it is perfectly acceptable to say that the variable gets its initial value on line 2. But notice that I deliberately didn't say “The object is initialized on line 2,” and both the code and this answer gloss over the more important question: “Okay, what if the code between “will try to read the value of the object using lines 1 and 2?”

This post consists of three parts:

  • Before C++26, the situation was quite awkward. But the funniest thing is how it is described in the Standard today, which I couldn't resist quoting below.

  • In C++26 we made this code safe by default.we have Thomas Koeppe to thank for this! It was a very important step.

  • In my Cpp2 experiment, this problem has completely disappeared, and all types are treated equally, with guaranteed initialization safety. I want to propose such a solution for ISO C++ itself after C++26, so that ISO C++ can evolve and completely get rid of this problem in the future, if there is a consensus to make such a change.

Let's start with the present, with the status quo that existed before the release of C++26…

Pre-C++26 answer: A variable is never “initialized”

In case of multiple built-in types, for example, intthe answer is that in this example no initialization occurs at all, because (strictly speaking) none of the lines perform initialization. If this surprises you, here is the explanation:

  • Line 1 declares uninitialized object. It has no initial meaning, either explicit or implicit.

  • Next in line 2 is assigned “initial value” This operation overwrites the object's bits and sets the object to the same value as the bits initialized in line 1… but this is an assignment, not an initialization (construction).

However, I think it is reasonable to informally call line 2 “setting the initial value” in the sense that it is writing the first significant value to the program into this object. This is not formally initialization, but eventually the bits become the same, and good books might reasonably call line 2 “initialization” a“.

“But wait a minute,” someone might say. “I was reading the Standard last night, and [dcl.init] it says that line 1 is that's it “initialization with default value”! That is, line 1 and There is initialization!” I can answer “yes” and “no” to these statements. Let's look at the formal, precise and rather funny answer from the Standard, it's just great: The Standard does say that in line 1 the object initialized to default value… But, for types like intthe term “initialized to default value” means “no initialization is performed.”

I didn't make this up, see. paragraph 7 [dcl.init].

(This is a good time to say: “The Standard is not a tutorial”… In other words, don't read the Standard to learn the language. The Standard is pretty clear about what C++ does, and there's nothing wrong with it defining things that way, it's perfectly fine. But it's not written for the layman, and no one would blame you if you thought that “default initialization means no initialization” is an example of cognitive dissonance, Orwellian doublethink (This not the same thing) or passive-aggressive provocation.)

A related question might be: did the object's lifetime begin after line 1? The good news is that yes, line 1 does indeed mark the beginning of the lifetime of an uninitialized object, according to paragraph 1 [basic.life]. But let's not go too deep into the analysis of the phrase about “empty initialization” from this paragraph, because this is another allegory of the Standard of the same concept “this is initialization, although no, we were just joking” (Did I mention that the Standard is not a tutorial?) And of course, this is a serious problem, because the object's lifetime has already begun, but it has not yet been initialized to a predictable value. This is the worst problem of an uninitialized variable, because reading from it can be a security risk; it is truly “undefined behavior” that can do anything, and attackers can exploit this property.

Luckily, in C++26 the security situation is getting much better…

C++26: Things Are Getting Better (Really) and Safer by Default

Just a few months ago (in March 2024, at the Tokyo meeting), we improved this situation in C++26 by introducing article by Thomas Koeppe P2795R5, “Erroneous behavior for uninitialized reads”. Perhaps its name may seem familiar to readers of my blog, because I mentioned it in my Tokyo Trip Report.

In C++26, a new concept was created erroneous behavior (erroneous behavior)which is better than “undefined” or “unspecified” because it allows us to reason about the code “which is clearly identified as erroneous” (seriously, that's almost a direct quote from the paper), and because the code is now well-defined, we get rid of the security threat of “undefined behavior.” You can think of this as a tool in the Standard that allows you to turn some behavior from “scary undefined” to “well, part of it is our fault because we let you write this code that doesn't mean what it's supposed to mean, but in fact you wrote a bug here, and we'll put a fence around this pit of spikes so that you don't fall in it by default.” And this concept was first applied to… drum roll… uninitialized local variables.

And this is very important, because it means that line 1 from the original example is still uninitialized, but since C++26 this becomes “error behavior”, i.e. When compiling code with the C++26 compiler, undefined behavior cannot occur when reading an uninitialized value. Yes, it follows that the C++26 compiler will generate different code than the previous one… It is guaranteed to write what the compiler knows erroneous value (but this does not guarantee that the programmer can rely on it, so it is still zero-trust), if there is any chance that the value can be read.

This may seem like a small thing, but it's actually a major improvement, and shows that the committee is serious about actively changing our language to be safe by default. This trend of more safe-by-default code is something we'll be seeing in the near future of C++, and it's a welcome one.

While you wait for your favorite C++26 compiler to add support for this, you can get an approximation of this function using GCC or Clang switch -ftrivial-auto-var-init=pattern or with the help of MSVC switch /RTC1 (use them quickly if you can) They will give you almost everything that C++26 will give you, except perhaps that they won't generate diagnostics (for example, the Clang switch only generates diagnostics if you run Memory Sanitizer).

For example, let's look at how this new default behavior prevents secret leaks using a program compiled with and without today's flag (link to Godbolt):

template<int N>
auto print(char (&a)[N]) { std::cout << std::string_view{a,N} << "\n"; }
 
auto f1() {
    char a[] = {'s', 'e', 'c', 'r', 'e', 't' };
    print(a);
}
 
auto f2() {
    char a[6];
    print(a);  // сегодня этот код, вероятно, выведет "secret"
}
 
auto f3() {
    char a[] = {'0', '1', '2', '3', '4', '5' };
    print(a);  // перезаписывает "secret"
}
 
int main() {
    f1();
    f2();
    f3();
}

By default, all three local arrays share the same stack storage, and after f1 will return a string secretit will probably still be sitting on the stack waiting for the array to be pushed onto it f2.

In today's C++, by default, no -ftrivial-auto-var-init=pattern or /RTC1 function f2will probably lead to secret. Which can cause, let's say, problems safety and security. This undefined behavior of the no-initialization rule is what gives C++ a bad reputation.

But when using -ftrivial-auto-var-init=pattern GCC and Clang compilers or /RTC1 compiler MSVC and also starting from C++26 onwards by default function f2 will not leak any secrets. As Björn sometimes says in other contexts, “That's progress!” And to those grumblers who might want to say, “Author, I'm used to unsafe code, and getting rid of unsafe code by default is against the spirit of C++,” I say that (a) this is the way things are now, and (b) get used to it, because there will be a lot more of this in the future.

Addition: A common question asked is why not initialize the variable to 0? This is suggested all the time, but it is not the best answer for many reasons. Two main ones are: (1) zero is not always a significant value for the program, so injecting it often results in replacing one bug with another; (2) it often actively masks initialization failures from sanitizers, so we cannot see the bug and report it. Using an implementation-defined, well-known “bug” bit pattern does not cause such problems.

But this is C++, so you can always take full control and get maximum performance if you want. So yes, C++26 allows you to opt out of this if you really want to, having written [[indeterminate]]but every use of this attribute should be subject to review at every code review and clearly justified by hard performance measurements demonstrating the need to override the default safe behavior:

int a [[indeterminate]] ;
    // Так в C++26 можно сказать "да, пожалуйста, сделай мне больно,
    // мне нужна эта старая опасная семантика"

After C++26: What else can we do?

Here's what our situation is before C++26 (the most problematic lines are 4 and 5):

// В современном C++ до C++26 для локальных переменных
 
// Применение фундаментального типа наподобие 'int'
int a;            // объявление без инициализации
std::cout << a;   // неопределённое поведение: чтение неинициализированной переменной
a = 5;            // присвоение (не инициализация)
std::cout << a;   // выводит 5
 
// Применение классового типа наподобие 'std::string'
string b;         // объявление с конструкцией по умолчанию
std::cout << b;   // выводит "": чтение сконструированного по умолчанию значения
b = "5";          // присвоение (не инициализация)
std::cout << b;   // выводит "5"

It's worth noting that line 5 may not output anything… that's undefined behavior, so you'll be lucky if it's just a question of output and not output, since a standards-compliant compiler could theoretically generate code that wipes the hard drive, causing nasal demons or leading to other traditional pranks of undefined behavior.

And here's what we start with in C++26 (the differences are in lines 4 and 5):

// В C++26 для локальных переменных
 
// Применение фундаментального типа наподобие 'int'
int a;            // декларация с неким ошибочным значением
std::cout << a;   // выводит ? или прекращает выполнение: чтение ошибочного значения
a = 5;            // присвоение (не инициализация)
std::cout << a;   // выводит 5
 
// Применение классового типа наподобие 'std::string'
string b;         // объявление с конструкцией по умолчанию
std::cout << b;   // выводит "": чтение сконструированного по умолчанию значения
b = "5";          // присвоение (не инициализация)
std::cout << b;   // выводит "5"

The good news is that our hard drives are now safe: the implementation may print a value or abort, but there will be no undefined behavior.

In fine print: C++26 compilers are required to force line 4 to rewrite the bits to a known value, and are encouraged to report the problem on line 5 (but are not required to do so).

In my experimental Cpp2 syntax, local variables of all types are defined like this: a: some_type = initial_value;. You can omit the part with = initial_value to make it clear that stack space is allocated for the variable, but its initialization itself is deferred, after which Cpp2 ensures initialization before use; you must perform the initialization later using = (For example, a = initial_value;) before using the variable in any way, which gives us flexibility, for example, allowing us to use different constructors for the same variable along different branching paths. So the equivalent example would be (the differences from C++26 are on lines 4-6 and 10-12):

// Локальные переменные в моём синтаксисе Cpp2
 
// Применение фундаментального типа наподобие 'int'
a: int;              // выделяет пространство, без инициализации
// std::cout << a;   // недопустимо: нельзя использовать до инициализации!
a = 5;               // конструкция => реальная инициализация!
std::cout << a;      // выводит 5
 
// Применение классового типа наподобие 'std::string'
b: string;           // выделяет пространство, без инициализации
// std::cout << b;   // недопустимо: нельзя использовать до инициализации!
b = "5";             // конструкция => реальная инициализация!
std::cout << b;      // выводит "5"

Cpp2 intentionally leaves no easy way to break this pattern and use a variable before it is initialized. To achieve this, you need to create an array of raw variables on the stack. std::byte or something like that and then execute unsafe_castto pretend it's a different type… This is long and complicated to write, because I think unsafe code should be long and complicated to write… but it Can when necessary, because that's the nature of C++: I may condemn the unsafe code you might write for performance's sake, but I will defend to the death your right to write it when necessary; C++ is always there to let you crawl in and take control. I'm trying to move from a “performance by default, safety always available” model, where you have to put in extra effort to get safety, to a “safety by default, performance always available” model. The metaphor I have for this is that I don't want to take away sharp knives from C++ programmers, because chefs sometimes need sharp knives; but when knives aren't in use, we just want to put them in a drawer that we have to consciously open, not throw them all over the floor and constantly remind people to watch their step.

So far this model works very well and has a triple advantage: performance (no initialization until you need it), flexibility (can call the real constructor I need), safety (real “initialization” with real construction, and never a use-before-initialization situation). I think this might come to ISO C++ someday, and I intend to send a proposal in this style to the ISO C++ committee in a year or two, making it as convincing as possible. Maybe they'll like it, or maybe they'll find some flaws I haven't noticed… We'll see! In any case, I'll keep you posted on my blog.

Thanks again to Sam Johnson for this question!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *