How to Kill a Unicorn or an Attempt to Clean Up Variable Initialization in C++

You know, I never really thought about how good or bad variable initialization is in C++. I just used it. And had no problems. But recently I watched a couple of videos, skimmed through a few articles and yes, I have to admit… it really is terrible. One very serious person even said that we, as a programming community, are to blame for C++ not being as good as it could be.

Okay, let's use our imagination and see what we could change to improve this situation. For those who already understand what I'm talking about, I want to reassure you right away. We won't go too deep into this rabbit hole. We'll just look at what everyone faces, as well as hypothetical ways to bring some minimal order to all of this.

I haven't written anything professionally for a long time, I don't know the newest, most fashionable tricks and if something from the proposed conflicts with what already exists, please don't kick me too hard. If this article gets a positive response, we can send the proposals presented in it to the C++ standardization committee and who knows, maybe we really can change something for the better.

This article requires a good knowledge of C++. It is not intended for beginners or those who want to learn the language.

Terminology used

I think that for most people our “new” terms will not be so new, but they will immediately tell us what we are going to do.

Aggregate assignments – Assigning an anonymous object to a named one. For example: my_obj = { 2, 6 }. This differs from aggregate initialization in that we do not zero out what is omitted and can use such operations for purposes other than initialization.

Special designers – Conversion constructors and copy constructors. The peculiarity of such constructors is that they can be called implicitly.

Special assignment operator – An assignment (or copy) operator that takes an object of another type as a parameter.

INITIALIZATION SCRIPTS

Often, when you find yourself in a situation where it's unclear where to go next, it's helpful to take a step back and look at the situation from a point before anything happened. Below, I propose to consider variable initialization in C++ as it is presented today, from the perspective of old-school programming. When trees were big and manuals were small.

So, when talking about variable initialization, it is important to understand that there are two possible scenarios:

1. We want to speed up development and we don't care if we lose (very little) performance due to zeroing out variables that may remain uninitialized.

2. We need full control over what is initialized and how.

Let's consider these situations separately.

“We Don't Care” Scenario

This approach is used in most modern programming languages. In terms of initialization, this means that the compiler zeroes out everything that is not specified. This scenario can be useful when:

  • We are newbies who don't yet know why initialization is needed.

  • We might just forget (or we are spoiled by another language).

  • This is necessary if we want to “reinvent” some C++ mechanics without breaking the existing code.

Without going into too much detail, there are three different initialization syntaxes in C++: initialization without parentheses, initialization using parentheses, and initialization using curly braces.

Initialization without brackets

Recommendation: Let's use initialization without parentheses to initialize an object or variable by default. In the “We don't care” scenario, this is almost the same as value-initialization.

For example:

class MyType
{
     int a;
     int b = 5;
};

int val;                     // val=0
MyType obj;                  // a=0, b=5

int* pi = new int;           // *pi=0
MyType* pobj = new MyType;   // pobj->a=0, pobj->b=5

char s1[5];                  // s1[0]=s1[1]=s1[2]=s1[3]=s1[4]=0
MyType a1[2];                // a1[0].a=0, a1[0].b=5, a1[1].a=0, a1[1].b=5

MyType* a2 = new MyType[2];  // a2[0].a=0, a2[0].b=5, a2[1].a=0, a2[1].b=5
char*   s2 = new char[5];    // s2[0]=s2[1]=s2[2]=s2[3]=s2[4]=0

Here, everything that is not initialized to the specified values ​​is initialized to zero.

Important point: if a class has a default constructor, it will be called implicitly in all the above cases, even for array elements.

Initialization using parentheses

Recommendation: Let's use parenthesized initialization to call a specific constructor.

AND ONLY FOR THIS. We don't need parentheses for initialization with values, because we do that anyway. For initialization using parentheses, we have two types of syntax:

MyType obj(...);             // syntax 1
MyType obj = MyType(...);    // syntax 2 involves copy-elision

I think both are valid because syntax 1 is concise and syntax 2 is more versatile. Using parentheses to initialize built-in types looks weird, but it doesn't break the overall concept because it mimics the copy constructor.

Let's look at some examples:

class MyType
{
     int a, b, c = 5;
  
 public:
     MyType(int b) { this->b = b; }
};

int val(7);                    // val=7
int val = int(7);              // val=7

MyType obj(3);                 // a=0, b=3, c=5
MyType obj = MyType(3);        // a=0, b=3, c=5

int* pi = new int(7);          // *pi=7
MyType* pobj = new MyType(3);  // a=0, b=3, c=5

Important point: Syntax 1 can create problems in the function body if the compiler confuses it with a predefined function. Some serious people even recommend using curly braces instead of parentheses because of this. I wouldn't do that. If we initialize everything by default, empty parentheses are unnecessary. And if a default constructor is defined, it will be called without them.

In other words, when your compiler has trouble interpreting empty parentheses, just don't use them at all, it will produce the same result.

Initialization using curly braces

Recommendation: Let's use the curly brace initialization syntax for “aggregate assignments”.

I understand this term as assigning values ​​of some anonymous object to the fields of ours. Therefore, I would advise to always use curly braces with an equal sign. The most common case of initialization using curly braces is aggregate initialization:

MyType  o1 = { 2, 4, 6 };                    // o1.a=2, o1.b=4, o1.c=6
MyType* o2 = new MyType { 2, 4, 6 };         // o2.a=2, o2.b=4, o2.c=6

With arrays it looks like this:

char   s1[] = “abcdef”;                      // no braces but still ‘curly-braces stuff’
MyType a1[] = { { 2, 4, 6 }, { 2, 4, 6 } };  // a1[0].a=2, a1[0].b=4, a1[0].c=6, a1[1].a=2, a1[1].b=4, a1[1].c=6

char*    s2 = new char[] { “abcdef” };       // s2=“abcdef”
MyType*  a2 = new MyType[] { { 2, 4, 6 }, { 2, 4, 6 } };

There is a special syntax that can be useful for returning an object to its original state:

obj = { };

And finally, the strangest initialization, which I would not recommend, but it already exists, and I must admit, it is quite convenient:

MyType obj = { .a=2, .b=4, .c=6 };  // designated initialization

In addition, I would like to suggest the following:

– Leave missing values ​​unchanged, which are currently zeroed out by aggregate initialization. If we initialize everything with default values, we get correct initialization and can use the same syntax further without unnecessary zeroing.

– Another syntax that allows you to skip values:

MyType obj = { 2,  , 6 };  // a=2, c=6, b is left intact

Now let's look at the innovations (although they were introduced quite a long time ago) that can cause the most unexpected problems in completely unexpected places.

Where things start to get weird with curly braces is when you add a constructor to a class. In this case, the compiler turns off aggregate initialization and calls it using the values ​​in the braces as parameters. This is handy when initializing arrays, but it's far from obvious:

MyType* arr = new MyType[5] { { 2, 4, 6 }, { 2, 4, 6 } };

In the example above, we allocate an array of five objects and call a constructor that takes three parameters on the first two. Guess what happens to the other three objects.

Sometimes this can be quite unexpected. What do you think the following line will do for an object with a constructor that takes three parameters?

obj = { 2, 4, 6 };

That's right, it will call that constructor. For an already initialized object… Come on, guys! Initializer lists add to the confusion. The following two lines look similar, but they do completely different things:

std::vector<int> v1(3, 0);  // v1[0]=0, v1[1]=0, v1[2]=0
std::vector<int> v2{3, 0};  // v2[0]=3, v2[1]=0

The first line calls a constructor that takes two parameters, and the second line calls a list of initializers.

There are other situations where initialization errors are easy to make and hard to detect. I think the source of the confusion is trying to do everything in the constructor. This made sense when things were relatively simple. But now it's a real mess.

I have a somewhat controversial concept that I think could solve this problem, and at the same time harmonize the “invented”. I call it “Three-Stage Initialization”.

Three-stage initialization

Such an initialization would look quite normal, but would include a “special” assignment operator that takes an object of another type as a parameter:

void TypeA::operator= (TypeB&) { }  // special assignment operator

Such an operator, even when present, is never used for initialization. Instead, the compiler looks for a suitable constructor and issues an error if one is missing.

By changing this behavior, our special assignment operator could be used like this:

class MyType
{
     int a, b, c = 5;
  
     struct init_obj     // initializer-object
     {
           int a, b;
     };
  
public:
     void operator= (init_obj& o) { a = o.a; b = o.b; }
};

MyType obj1;             // default constructor call (if defined)
MyType obj2 = { 2, 4 };  // default constructor call (if defined), then assignment operator call

So the stages of our three-stage initialization are:

– default initialization, which in our case is value initialization;
– call the constructor (if any);
– call a special assignment operator (if specified).

In this case, the first two stages are necessary and sufficient to create a full-fledged object, and the last one is optional. The logic here is extremely simple. Imagine that you declare a variable and immediately assign a value to it. Or you could not assign:

float a = 0.5;
float b;

In both cases you get a variable that is suitable for further use. Three-stage initialization is an attempt to implement the same thing, but for user types and without the frivolities of unified initialization.

The most obvious problem that the proposed approach solves is unexpected constructor calls in the middle of code in lines like:

obj1 = { 2, 4 };
obj1 = obj2;

What now looks like copying (or appropriation) is, in essence, always copying. But the main thing, as it seems to me, is that this eliminates a whole series of rules and reservations, making everything clear from the code, from the syntax. For those who want to look at the current picture as a whole, I can offer the following list for review: Default initialization, Value initialization, Direct initialization, Copy-initialization, List initialization, Aggregate initialization, Reference initialization, Copy elision, Static initialization, Zero initialization, Constant initialization, Dynamic non-local initialization, Ordered dynamic initialization, Unordered dynamic initialization, Class member initialization, Member initializer list, Default member initializer.

We deliberately bypass reference types, since this is a separate topic with its own demons and horrors, deserving a separate discussion. As for the rest, we can say that the proposed approach, if it does not cancel the above, then simplifies everything to such an extent that almost nothing needs to be explained.

There is one basic rule. If there are parentheses, then there is a constructor call (a constructor is, strictly speaking, a function). And if there are no parentheses, then the data is copied either using our assignment operator, or byte by byte, or nothing happens at all, since the data is already where it is needed. THAT'S IT. The nuances that may arise are of no interest to beginners, but intuitively understandable to experts.

The only thing to keep in mind is that if a default constructor is defined, it can be called implicitly (and sometimes quite unexpectedly). But we can't do anything about it. Special constructors can also be called implicitly: the conversion constructor and the copy constructor.

Conversion constructors (types)

Such constructors initialize an object using data from an object of another type.

The least we could do is to make a conversion constructor explicit if there is a copy operator with an identical signature (as if it were declared “explicit”). But I think we could go further and make all conversion constructors explicit by default. There was a lot of controversy about their “implicitness” at one time. Correct me if I'm wrong, but that's how the aforementioned keyword came into being.

To me, that's a so-so solution. And now we can rethink it. Judge for yourself, the following can be implemented by explicitly calling a special assignment operator or by implicitly using a conversion constructor:

MyType1 obj1;
MyType2 obj2;

obj1 = obj2;             // what is better to use here?

Which option seems more logical to you?

When we try to pass to a function something other than what it accepts, but we have a suitable conversion constructor, it can be called implicitly. Although, no one forbids calling it explicitly and I personally like this option more:

MyType1 obj1;
MyType2 obj2;

void MyFunction(obj1 a) { … }

MyFunction(obj2);        // implicit type conversion
MyFunction(obj1(obj2));  // explicit type conversion

The last use of the conversion constructor is potentially dangerous and is, in fact, the initialization of the object:

MyType1 obj1;

MyType2 obj2(obj1);      // constructor syntax
MyType2 obj2 = obj1;     // assignment syntax

From a unified initialization perspective, the last two lines are identical and call the conversion constructor. But in the case of three-stage initialization, the last line is a call to the default constructor, then the assignment operator, and depending on what it accepts, possibly the conversion constructor. Ideally, we should get the same result, but in practice we will most likely get a compilation error.

I don't think that the problems described above should discourage us from making conversion constructors explicit by default in the future. It could be a significant improvement. If we don't want to get rid of the implicit invocation mechanics completely, perhaps we could introduce some keyword indicating that a particular constructor can be called implicitly.

Copy constructors (copy generators)

When we pass an object by value to a function, the compiler rushes to create a copy of it for us. If our object is not an aggregate, it will need a copy constructor for this.

Here, too, it all depends on how far we are willing to go. I think we could do without such a constructor altogether. In this case, the process of copying an object would be a call to the default constructor followed by a call to the assignment operator. The assignment operator in this case is NOT special, but in essence it is the same three-stage initialization.

At first glance it looks cumbersome, but this approach is safer. The default constructor will be called for us automatically, and later we will consider how to remind the programmer not to forget to write the corresponding assignment operator. As for the special copy constructor, who ever remembers about it? Of course, if such a constructor is already defined, we should use it in order not to break the existing code.

A prohibition on copying an object might look like this:

MyType& operator= (MyType&) = delete;

Having said all that, it should be noted that there is no objection to constructors that are identical in functionality to conversion constructors and copy constructors, but without the “superpower” of being called “where you least expect it”.

Let's look at array initialization using assignment operators:

MyType arr[10] =    // allocate an array of 10 elements
{
     { 2, 4, 6 },   // invoke ‘special’ assignment operator taking 3 ints
     “asdf”,        // invoke ‘special’ assignment operator taking a string
     ,              // invoke no ‘special’ assignment operator
     { { 2, 4 }, { 3, 7 } }, // invoke ‘special’ assignment operator taking initializer-list
     { 2, 4 },      // invoke another ‘special’ assignment operator
     SomeObj        // invoke another ‘special’ assignment operator taking SomeObj
};

In the example above, we allocate an array, call the default constructor for all of its elements, and then various assignment operators for some of them. Note that this is the classic initialization syntax, where each element of the array has its own initializer object. No special mechanics for passing parameters to the constructor. Everything is within the general rules.

The assignment operator has an interesting feature. It always takes one parameter. We could use this to get rid of the initializer object by writing out its contents in the operator signature. In this case, our special assignment operator would look like this:

void operator= (int a, int b) { }  // assignment operator taking two values

Unfortunately, for initializer lists we still need an initializer object. This is how I see the ideal assignment operator, taking an initializer list as a parameter:

void operator= (init_obj (&o)[])  // won’t compile
{
     for (int i=0; i < ARRAYSIZE(o); i++)
     {
          ...
     }
}

The above won't work because the size of the array being passed isn't specified. But you can do it like this:

template<size_t N> void operator= (init_obj (&o)[N])
{
     for (int i=0; i < N; i++)
     {
          ...
     }
}

I think this is too basic functionality to use interfaces here. Well, this is not C#. And range-based for could also be implemented without them.

Initializers in a constructor definition

The final piece of the puzzle is the initializers in the constructor definition. Consider the following example:

struct A
{
     int e, f;
};

class T
{
     int c;

public:
     T(int c) { this->c = c; }
};

class MyType
{
     int i;     // value
     A a, b;    // aggregates
     T t;       // non-aggregate

public:
     MyType();
};

Constructor MyType can use initializers in its definition as follows:

MyType::MyType() :
i(5),           // assignment
a{ 2, 4 },      // aggregate initialization
t(9)            // constructor call
{
}

There's probably some logic to the fact that we can't use the equals sign here. But if we were to break that taboo, we could rewrite the above in a more traditional way, simplifying things a bit:

MyType::MyType() :
i = 5,          // assignment
a = { 2, 4 },   // aggregate initialization
t(9)            // constructor call
{  
}

Correct me if I'm wrong, but that's probably all we need for initialization. The proposed approaches allow for development, but the point of what we're doing is not to invent something new. Quite the contrary, we've taken the existing logic of language functioning, removed a little, added a little, and tried to fit it all into the simplest theory possible. We're trying to bring order. From this point of view, the variety of formats that unified initialization offers doesn't help us at all. And we don't need narrowing or any of the other “side benefits” of all these syntactic quirks.

Anyway, unless I'm forgetting something important, I suggest putting the unicorn in the toy box and marking all other initialization options as “deprecated”.

Disadvantages of three-stage initialization

1. I'm afraid we may still break the existing code. This problem could be mitigated by introducing some keyword for assignment operators (whether special or not) intended for initialization, for example:

__initr__ void TypeA::operator= (TypeB&) { }  // for initialization

After some time, this keyword could be abolished.

2. Three-stage initialization can lead to code duplication. For example, if we want to use constructor call syntax and assignment-format initialization syntax at the same time. But if the point of implementing all possible scenarios in the constructor was to make the code simpler and protect the programmer from errors, then this task has completely failed.

3. If we define any constructor, then the default assignment operators implementing byte-by-byte copying must be blocked. This is necessary to avoid rewriting the initialized object, and also to remind the programmer not to forget to write an assignment operator that allows copying his object. If byte-by-byte copying is still acceptable, then this:

MyType& operator= (MyType&) = default;

In a sense, we have “unified” the assignment operators and now they work the same in different parts of the program, but only with an already initialized object.

Scenario “We need to optimize”

This is the case where we need to control what is initialized to zero and what is not. Since this is a rare case, I would suggest introducing a new keyword that specifies variables that should be left uninitialized. For example, “noinit”:

noinit double val;     // val remains uninitialized (UB)

This is not required for global and static variables because they are initialized to zero anyway, but it is required for class and structure members:

class MyType1
{
     noinit int a;    // UB
     noinit double b; // UB
     int c;           // c=0
};

class MyType2 noinit
{
     int a;           // UB
     double b;        // UB
     int c;           // UB
};

With arrays this could be done like this:

int* pi = new noinit int;             // *pi contains garbage

noinit char str1[5];                  // str1 contains garbage
char* str2 = new noinit char[5];      // str2 contains garbage

Initialization of objects of a custom type is a specific case:

noinit MyType obj;                    //
noinit MyType obj[5];                 // error-prone
MyType* obj = new noinit MyType[5];   //

In theory, we should leave uninitialized data members that the author of the custom type has not explicitly initialized. There is logic in this, but it will return the very uncertainty that we wanted to avoid. The same can be said about arrays of such objects.

One potential problem with this new keyword would be the temptation to use it liberally for local variables. I don't think you should. Let the compiler decide what can be left uninitialized and what can't. And if it's not obvious, it's probably better to zero out the variable. Code that runs a little slower is better than code that terminates unexpectedly.

In conclusion

These were my thoughts on how to make initialization of variables in C++ simpler and more understandable. Of course, this is only my personal opinion, a theory that you may not agree with, but which, in my opinion, is worth considering. If it is not said that all this is stupidity and no one needs it, we can try to rethink reference types, which with their multiple ampersands have turned into some kind of wild “fetish”. It seems to me that everything can be greatly simplified there.

Thanks for reading. Comments are welcome.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *