A Tale of static and unnamed function namespace in C++

The development team received an urgent task from management to write an application to process user data. The manager quickly decomposed the problem into two and instructed one team to create a data preparation module, and the other to implement the calculations themselves.

So, the first team heroically strained and gave the world the following code:

// parser.h
#pragma once

int process_data(int);
int parse_data(int);
// parser.cpp
#include "parser.h"

int process_data(int raw_data) {
	return 15;
}

int parse_data(int raw_data) {
	return process_data(raw_data);
}
// main.cpp
#include "parser.h"

int main() {
	const int raw_data = 1;
	const int data = parse_data(raw_data);
	return data;
}

And sent it to master. The compilation on the server was successful and the guys, with a sense of accomplishment, went to the bar with their leader.

The second team also took a responsible approach to the work and wrote a calculation module, not forgetting to accompany it with a succinct comment:

// solver.h
#pragma once

int process_data(int);
int solve(int);
// solver.cpp
#include "solver.h"

int process_data(int temp_data) {
	// - Петька, приборы! - 30! - что 30? - А что "приборы"?
	return temp_data + 30;
}

int solve(int data) {
	return process_data(data);
}

The guys knew that the first team had already written a piece of code in main.cpp, so they just did git pull and made some adjustments to the code to enable the new functionality:

// main.cpp
#include "parser.h"
++#include "solver.h"

int main() {
	const int raw_data = 1;
	const int data = parse_data(raw_data);
	--return data;
	++return solve(data);
}

It so happens that in the development department everyone really likes to use the word “process”. And this time it so happened that the second team came up with the name of the auxiliary function exactly matching the one implemented by the first team, but did not yet know about it.

When trying to compile new code, my favorite IDE suddenly gave me a terrible error message:

multiple definition of `process_data(int)';

The matter smelled of unplanned processing, because… No one during the planning expected that problems might arise, so they did not set aside story points to be on the safe side.

We started to find out. The youngest of them aptly noted that in the file solver.cpp there is indeed a mention of “process_data” twice, which is what the error seems to indicate:

int -->process_data<--(int temp_data) {
	return temp_data + 30;
}

int solve(int data) {
	return -->process_data<--(data);
}

But the more experienced one answered him with reason: “No, we did it like that before, there were no mistakes.”

The most experienced member of the team (hereinafter referred to as “Hero”) decided to take a closer look at the error message, the full text of which is presented below:

/usr/bin/ld: /tmp/ccOIubul.o: in function `process_data(int)':
solver.cpp:(.text+0x1b): multiple definition of `process_data(int)'; /tmp/ccelFhGq.o:parser.cpp:(.text+0x1b): first defined here
collect2: error: ld returned 1 exit status

and saw in it a mention of a file created by another team – parser.cpp, and the phrase “first defined here” seemed to hint that that team was the first to define something there.
It seemed that a solution was about to be found. The simplest thing that could be done was to tell the manager that everything was fine with us, everything was assembled locally until they merged it with the code made by another team, they probably did something wrong. But no one responded to the messages sent to the manager, apparently they were having a great time at the bar, I thought Stirlitz hero.

Something still needs to be done, so we started looking for help on the Internet.
The first link for the query “multiple definition of” led to the advice translated into Russian on SO:

The source in which XYZArray is defined is connected twice. CPP files should not be included via include, and H files should have a check for re-inclusion.

There is a check for re-reading; only h-files are connected via include. Did pragma really not work? I heard somewhere that it is better to write using define. Rewrote the code to version with #ifndef/#define – did not help.

Another tip talked about some kind of binding and static. Our hero's instinct advised us to dig in this direction. It’s not for nothing that this morning he was hit hard by static electricity when he reached out to plug in the USB mouse. The fact is that the friend’s chair is made of synthetics, just like the hero’s pants. But he has no idea about this yet and chalks it all up to the long-awaited superpowers that have begun to manifest themselves.

There were a huge number of references for the word static in relation to the C++ language. It turned out that it can be used in free functions, class methods, variables, and class fields. He decided not to spread himself too thin for now and narrow the search to functions.

As it turned out, none of the team had previously thought about how, after compilation, numerous h- and cpp-files produce one binary program file. Their favorite IDE did everything for them, where you write code, press a button, and if you’re lucky, it won’t produce a single error. The guys also heard out of the blue that another team was writing some tests for the code and it seemed like they were getting less money from the testers. But this is true, by the way.

So, Google showed that when you call, for example, the command:

g++ main.cpp parser.cpp solver.cpp

A lot of things happen behind the scenes, including some kind of linking, and the one who links is called the linker and he notified us of the error:

/usr/bin/ld: ...multiple definition of ...

It seems that we are on the right track, it remains to understand what static has to do with it and what is still connected.

It turned out that the cpp files from the command given above are actually compiled separately into the so-called. object files – intermediate files created by the assembler that store pieces of machine code [1]. This is confirmed by the entries in the error message:

... /tmp/ccOIubul.o ... /tmp/ccelFhGq.o

Machine code is not interesting to read, but assembly code for our sources looks like this (or rather, part of it):

g++ -S -O2 parser.cpp

	.file	"parser.cpp"
	.text
	.p2align 4
	.globl	_Z12process_datai
	.type	_Z12process_datai, @function
_Z12process_datai:
.LFB3:
	.cfi_startproc
	endbr64
	movl	$15, %eax
	ret
	.cfi_endproc
...
g++ -S -O2 solver.cpp

	.file	"solver.cpp"
	.text
	.p2align 4
	.globl	_Z12process_datai
	.type	_Z12process_datai, @function
_Z12process_datai:
.LFB0:
	.cfi_startproc
	endbr64
	leal	30(%rdi), %eax
	ret
	.cfi_endproc
...

What do we see? At first glance, nothing is clear, but if you look closely, you can find the same names in both listings:

.globl	_Z12process_datai
...
_Z12process_datai:

_Z12process_datai is painfully reminiscent of our unfortunate function that appeared in the error message. Only she has something else stuck to the front and back. What is behind, by the way, is very similar to the type of function argument – int. And a certain mark .globl is consonant with something global, such as a global variable, but apparently it also applies to functions.

It turns out that in two object files there are code blocks with the same name and this name also has a label .globl. And then the linker tries to somehow link these files into one, stumbles upon these blocks and reports an error.
It seems logical, otherwise, let’s say that the linker could stop at the first code inclusion it comes across and ignore the second. But then where is the confidence that next time he will start linking from another file, where the implementation of the code may differ from the first?

The problem has been localized, now we need to find a solution.

Since we have already seen the assembly code, the first thing that comes to mind is to change that strange function signature in our object file or play tricks with the label .globl.

They write on the forums that the static keyword for free functions somehow hides the scope of the function inside the object file. Let's try changing solver.h:

--int process_data(int);
++static int process_data(int);

and look at the assembly code (although for clarity, I had to reduce the optimization level to -O0 to avoid its inlining):

g++ -S -O0 solver.cpp
	
	.file	"solver.cpp"
	.text
	.type	_ZL12process_datai, @function
_ZL12process_datai:
.LFB0:
	.cfi_startproc
	endbr64
....
_Z5solvei:
.LFB1:
	.cfi_startproc
	endbr64
	pushq	%rbp
....

And lo and behold! The letter L has been added to the function name: _ZL12process_datai yes and a mark .globl disappeared! This is happiness!

But the joy turned out to be premature. When trying to compile the entire project, the compiler reported a new error:

g++ main.cpp parser.cpp solver.cpp 

In file included from main.cpp:2:
solver.h:3:12: error: ‘int process_data(int)’ was declared ‘extern’ and later ‘static’ [-fpermissive]
    3 | static int process_data(int);
      |            ^~~~~~~~~~~~
In file included from main.cpp:1:
parser.h:4:5: note: previous declaration of ‘int process_data(int)’
    4 | int process_data(int);

Message

was declared ‘extern’ and later ‘static’

turned out to be quite common on the Internet, but all the links went back to the times of pure C, which our hero did not want to deal with. Therefore, he made an impulsive decision – to remove the declaration of the int process_data(int) function from his h-file, and add static to its definition in the cpp file:

--int process_data(int temp_data) {
++static int process_data(int temp_data) {

And it worked! The project has compiled. One could rejoice, but something haunted our hero.

Is it possible not to remove the function declaration? Is there any other way to limit the scope of a function? What kind of extern? Why are there so many references to the C language?

And yes, it turned out that there is a second way to hide a function – use an unnamed namespace or unnamed namespace in their name. By the way, this is what is recommended in some Core C++ Guidelines [2].

To accomplish this, it was necessary to wrap the function definition in a construct of the form namespace { ... }:

namespace {

int process_data(int temp_data) {
	return temp_data + 30;
}

}

True, we still had to remove its declaration in the header file, because an error popped up about ambiguous.

Checking the assembly code of the separately compiled solver.cpp showed that the function name had indeed changed:

	.file	"solver.cpp"
	.text
	.type	_ZN12_GLOBAL__N_112process_dataEi, @function
_ZN12_GLOBAL__N_112process_dataEi:
.LFB0:
	.cfi_startproc
	endbr64
.....

The entire project also compiled successfully. Our hero decided that he could stop there and pour everything into the master, when the long-awaited answer came from the boss:

Guys, it’s good to suffer there bullshit, just rename your function and come to our bar.

They did so, and then at gatherings they often recalled this incident.

Links:
[1] https://habr.com/ru/articles/478124/
[2] https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#sf22-use-an-unnamed-anonymous-namespace-for-all-internalnon-exported-entities

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *