When you come face to face with the problem nullptr! = NULL

Many of us often read articles about bugs and best programming practices to avoid possible mistakes. Sometimes, you just know what you shouldn't do, but in practice you don't encounter a real problem. For me, this was the topic of nullptr != NULL. Initially, I used NULL in system programming in C. When switching to C++, I encountered the NULL macro only in WinAPI, in the code for null pointers I always used nullptr (everyone has already learned that nullptr != NULL). I have never encountered real consequences, where NULL is used instead of nullptr. This was the case until I replaced boost::function with std::function in one component.

The working day started as usual with the analysis of new emergency dumps. Here is a batch of dumps from one service. It looks usual: a callback was called when the object was already destroyed. Most likely, they forgot to unsubscribe from the event during the shutdown, this should be done explicitly. They created a bug for this.

Error(s):
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_function_call> >'
  what():  call to empty boost::function

Over the course of three days, a similar situation appeared in several more services. So, the problem is not just the lack of an unsubscribe. We need to understand the cause of the mass problem and assess the scale. A release candidate should be built in the next few days, mass crash dumps are not allowed. A regular bug fix has been going on for several weeks, there should be no global changes. I check the mailing lists: the other day they updated the version of one component, on which all the crashed services depend. The release dates of the product and the component are different, so the integration happened quite late. There are many changes in the component, but they all relate to replacing boost::function with std::function.

I check the unsubscribe code for events in the downed services. I see some incomprehensible code. At first, you don’t immediately realize that this is the unsubscribe code:

connection.SetCallback(boost::function<void(bool)>());

which is equivalent to:

connection.SetCallback(NULL);

In the connection implementation, when an event occurs, a callback function is called.

Using boost::function in a component:

void SetCallback(boost::function<void(bool)> callback)
{
   ...
}

void ConnectionChanged(bool state)
{
    if (!_callback.empty())
	{
        _сallback(state);
	    // do something
	}
}

After converting the component to std:

void SetCallback(std::function<void(bool)> callback)
{
   ...
}

void ConnectionChanged(bool state)
{
    if (_callback != nullptr)
	{
        _сallback(state);
	    // do something
	}
}

Moreover, if you explicitly pass NULL as a callback to SetCallback, for the case of std::function the error will be detected already at the compilation stage. But when unsubscribing using the default constructor boost::function() the error is detected during code execution. connection.SetCallback(boost::function()) now does not unsubscribe, but sets NULL as a callback. In the ConnectionChanged method, the callback is checked for nullptr, calling an actually empty callback results in a boost::bad_function_call exception.

A search of the entire code base showed that a similar unsubscribe form is implemented in dozens of components. If the entire code is not corrected, many dumps will appear: when turning off devices and changing operating modes (requires a reboot of services). The prospects are so-so. The only suggestion that could be recommended was to postpone the release for a week and fix everything in the code. The suggestion was approved. All teams allocated people to fix the code.

The problem was solved, but required a shift in the release dates. After this incident, the component was checked for backward compatibility (code review was done) before updating to a new version. Now the integration was always postponed until the beginning of the next release.

I think this is a rare case, but it can also be found in the vastness of the codebase.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *