C ++ vtables. Part 1 (basics + multiple Inheritance)

Hello! The translation of the article was prepared specifically for students of the course "C ++ Developer". Is it interesting to develop in this direction? Come online December 13 at 20:00 Moscow time. to the master class "Practice using the Google Test Framework"!

In this article, we will look at how clang implements vtables (virtual method tables) and RTTI (runtime type identification). In the first part, we start with the base classes, and then look at multiple and virtual inheritance.

Please note that in this article we have to dig into the binary representation generated for various parts of our code using gdb. This is a pretty low level, but I will do all the hard work for you. I do not think that most of the future posts will describe the details of such a low level.

Disclaimer: everything written here depends on the implementation, may change in any future version, so you should not rely on it. We consider this for educational purposes only.

excellent, then let's get started.

Part 1 – vtables – Basics

Let's look at the following code:

#include 
using namespace std;

class NonVirtualClass {
 public:
  void foo () {}
};

class VirtualClass {
 public:
  virtual void foo () {}
};

int main () {
  cout << "Size of NonVirtualClass:" << sizeof (NonVirtualClass) << endl;
  cout << "Size of VirtualClass:" << sizeof (VirtualClass) << endl;
}

$ # compile and run main.cpp
$ clang ++ main.cpp && ./a.out
Size of NonVirtualClass: 1
Size of VirtualClass: 8

NonVirtualClass has a size of 1 byte, because in C ++ classes cannot have zero size. However, this is not important now.

The size Virtualclass is 8 bytes on a 64-bit machine. Why? Because inside there is a hidden pointer pointing to a vtable. vtables are static translation tables created for each virtual class. This article talks about their content and how they are used.

To get a deeper understanding of what vtables look like, let's look at the following code with gdb to find out how memory is allocated:

#include 

class Parent {
 public:
  virtual void Foo () {}
  virtual void FooNotOverridden () {}
};

class Derived: public Parent {
 public:
  void Foo () override {}
};

int main () {
  Parent p1, p2;
  Derived d1, d2;

  std :: cout << "done" << std :: endl;
}

$ # compile our code with debugging symbols and start debugging using gdb
$ clang ++ -std = c ++ 14 -stdlib = libc ++ -g main.cpp && gdb ./a.out
...
(gdb) # set gdb to automatically de-decorate C ++ characters
(gdb) set print asm-demangle on
(gdb) set print demangle on
(gdb) # set a breakpoint on main
(gdb) b main
Breakpoint 1 at 0x4009ac: file main.cpp, line 15.
(gdb) run
Starting program: /home/shmike/cpp/a.out

Breakpoint 1, main () at main.cpp: 15
15 Parent p1, p2;
(gdb) # go to the next line
(gdb) n
16 Derived d1, d2;
(gdb) # go to the next line
(gdb) n
18 std :: cout << "done" << std :: endl;
(gdb) # print p1, p2, d1, d2 - we will talk about what output means soon
(gdb) p p1
$ 1 = {_vptr $ Parent = 0x400bb8 }
(gdb) p p2
$ 2 = {_vptr $ Parent = 0x400bb8 }
(gdb) p d1
$ 3 = { = {_vptr $ Parent = 0x400b50 }, }
(gdb) p d2
$ 4 = { = {_vptr $ Parent = 0x400b50 }, }

Here is what we learned from the above:
– Despite the fact that classes do not have data members, there is a hidden pointer to vtable;
– vtable for p1 and p2 is the same. vtables are static data for each type;
– d1 and d2 inherit the vtable-pointer from Parent, which points to vtable Derived;
– All vtables indicate an offset of 16 (0x10) bytes in the vtable. We will also discuss this later.

Let's continue our gdb session to see the contents of vtables. I will use the x command, which displays the memory on the screen. We are going to output 300 bytes in hexadecimal format, starting with 0x400b40. Why exactly this address? Because we saw above that the vtable pointer points to 0x400b50, and the symbol for this address vtable for Derived + 16 (16 == 0x10).

(gdb) x / 300xb 0x400b40
0x400b40 : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400b48 : 0x90 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400b50 : 0x80 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
0x400b58 : 0x90 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
0x400b60 : 0x37 0x44 0x65 0x72 0x69 0x76 0x65 0x64
0x400b68 : 0x00 0x36 0x50 0x61 0x72 0x65 0x6e 0x74
0x400b70 : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400b78 : 0x90 0x20 0x60 0x00 0x00 0x00 0x00 0x00
0x400b80 : 0x69 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400b88: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400b90 : 0x10 0x22 0x60 0x00 0x00 0x00 0x00 0x00
0x400b98 : 0x60 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400ba0 : 0x78 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400ba8 : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x400bb0 : 0x78 0x0b 0x40 0x00 0x00 0x00 0x00 0x00
0x400bb8 : 0xa0 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
0x400bc0 : 0x90 0x0a 0x40 0x00 0x00 0x00 0x00 0x00
...

Note: we look at de-decorated (demangled) characters. If you're really interested, _ZTV is the prefix for vtable, _ZTS is the prefix for the type string (name), and _ZTI for typeinfo.


Here is the structure vtable parent:

Address Value Content
0x400ba8 0x0 top_offset (more on this later)
0x400bb0 0x400b78 Pointer to typeinfo for Parent (also part of the above memory dump)
0x400bb8 0x400aa0 Pointer to Parent :: Foo () (one). _vptr Parent points here.
0x400bc0 0x400a90 Pointer to Parent :: FooNotOverridden () (2)

Here is the structure vtable derived:

Address Value Content
0x400b40 0x0 top_offset (more on this later)
0x400b48 0x400b90 Pointer to typeinfo for Derived (also part of the above memory dump)
0x400b50 0x400a80 Pointer to Derived :: Foo () (3)., _ vptr Derived points here.
0x400b58 0x400a90 Pointer to Parent :: FooNotOverridden () (same as Parent)

one:

(gdb) # find out what debugging symbol we have for the address
0x400aa0
(gdb) info symbol 0x400aa0
Parent :: Foo () in section .text of a.out

2:

(gdb) info symbol 0x400a90
Parent :: FooNotOverridden () in section .text of a.out

3:

(gdb) info symbol 0x400a80
Derived :: Foo () in section .text of a.out

Remember that the vtable pointer in Derived pointed to a +16 byte offset in the vtable? The third pointer is the address of the pointer of the first method. Want a third method? No problem – add 2 sizeof (void) to the vtable pointer. Want a typeinfo record? go to the pointer in front of it.

Moving on – what about the typeinfo record structure?

Parent:

Address Value Content
0x400b78 0x602090 Helper class for type_info methods (one)
0x400b80 0x400b69 A string representing the type name (2)
0x400b88 0x0 0 means no parent typeinfo entry

And here is the record typeinfo derived:

Address Value Content
0x400b90 0x602210 Helper class for type_info methods (3)
0x400b98 0x400b60 A string representing the type name (4)
0x400ba0 0x400b78 Pointer to a typeinfo Parent entry

one:

(gdb) info symbol 0x602090
vtable for __cxxabiv1 :: __ class_type_info @@ CXXABI_1.3 + 16 in section .bss of a.out

2:

(gdb) x / s 0x400b69
0x400b69 : "6Parent"

3:

(gdb) info symbol 0x602210
vtable for __cxxabiv1 :: __ si_class_type_info @@ CXXABI_1.3 + 16 in section .bss of a.out

4:

(gdb) x / s 0x400b60
0x400b60 : "7Derived"

If you want to know more about __si_class_type_info, you can find some information here as well as here.

This exhausts my skills with gdb and also completes this part. I suggest that some people find this too low, or perhaps simply not of practical value. If so, I would recommend skipping parts 2 and 3, going straight to part 4.

Part 2 – Multiple Inheritance

The world of single inheritance hierarchies is easier for the compiler. As we saw in the first part, each child class extends the parent vtable by adding entries for each new virtual method.

Let's look at multiple inheritance, which complicates the situation, even when inheritance is implemented only purely from interfaces.

Let's look at the following code snippet:

class Mother {
 public:
  virtual void MotherMethod () {}
  int mother_data;
};

class Father {
 public:
  virtual void FatherMethod () {}
  int father_data;
};

class Child: public Mother, public Father {
 public:
  virtual void ChildMethod () {}
  int child_data;
};

Child structure
_vptr $ Mother
mother_data (+ padding)
_vptr $ Father
father_data
child_data (one)

Note that there are 2 vtable pointers. Intuitively, I would expect 1 or 3 pointers (Mother, Father and Child). In fact, it is impossible to have one pointer (more on this later), and the compiler is smart enough to combine the entries of the child vtable Child as a continuation of vtable Mother, thus saving 1 pointer.

Why can't a child have one vtable pointer for all three types? Remember that a Child pointer can be passed to a function that accepts a Mother or Father pointer, and both will expect the this pointer to contain the correct data at the correct offsets. These functions do not need to know about Child, and you should definitely not assume that Child is really what is under the Mother / Father pointer with which they operate.

(1) It is not relevant to this topic, but, nevertheless, it is interesting that child_data is actually placed in the filling of Father. This is called tail padding and may be the subject of a future post.

Here is the structure vtable:

Address Value Content
0x4008b8 0 top_offset (more on this later)
0x4008c0 0x400930 pointer to typeinfo for Child
0x4008c8 0x400800 Mother :: MotherMethod (). _vptr $ Mother points here.
0x4008d0 0x400810 Child :: ChildMethod ()
0x4008d8 -sixteen top_offset (more on this later)
0x4008e0 0x400930 pointer to typeinfo for Child
0x4008e8 0x400820 Father :: FatherMethod (). _vptr $ Father points here.

In this example, the Child instance will have the same pointer when casting to the Mother pointer. But when casting to the Father pointer, the compiler calculates the offset of the this pointer to point to the _vptr $ Father part of the Child (3rd field in the Child structure, see the table above).

In other words, for a given Child c ;: (void) & c! = (void) static_cast(& c). Some people do not expect this, and perhaps one day this information will save you some time debugging.

I have found this useful more than once. But wait, that’s not all.

What if Child decided to override one of the Father methods? Consider this code:

class Mother {
 public:
  virtual void MotherFoo () {}
};

class Father {
 public:
  virtual void FatherFoo () {}
};

class Child: public Mother, public Father {
 public:
  void FatherFoo () override {}
};

The situation is getting harder. The function can take the argument Father * and call FatherFoo () for it. But if you pass the Child instance, it is expected to call the overridden Child method with the correct this pointer. However, the caller does not know that he really contains Child. It has a pointer to the Child offset, where the location of the Father is. Someone has to offset the this pointer, but how to do it? What kind of magic does the compiler do to make this work?

Before we answer this, note that overriding one of the Mother methods is not very tricky, since the this pointer is the same. Child knows what to read after vtable Mother, and expects Child methods to be right after it.

Here is the solution: the compiler creates a thunk method that corrects the this pointer and then calls the “real” method. The address of the adapter method will be under the vtable Father, while the “real” method will be under the vtable Child.

Here vtable Child:

0x4008e8 : 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
0x4008f0 : 0x60 0x09 0x40 0x00 0x00 0x00 0x00 0x00
0x4008f8 : 0x00 0x08 0x40 0x00 0x00 0x00 0x00 0x00
0x400900 : 0x10 0x08 0x40 0x00 0x00 0x00 0x00 0x00
0x400908 : 0xf8 0xff 0xff 0xff 0xff 0xff 0xff 0xff
0x400910 : 0x60 0x09 0x40 0x00 0x00 0x00 0x00 0x00
0x400918 : 0x20 0x08 0x40 0x00 0x00 0x00 0x00 0x00

Which means:

Address Value Content
0x4008e8 0 top_offset (coming soon!)
0x4008f0 0x400960 typeinfo for child
0x4008f8 0x400800 Mother :: MotherFoo ()
0x400900 0x400810 Child :: FatherFoo ()
0x400908 -8 top_offset
0x400910 0x400960 typeinfo for child
0x400918 0x400820 not virtual adapter Child :: FatherFoo ()

Explanation: as we saw earlier, Child has 2 vtables – one is used for Mother and Child, and the other for Father. In vtable Father, FatherFoo () points to an “adapter”, and in vtable Child points directly to Child :: FatherFoo ().

And what is in this “adapter”, you ask?

(gdb) disas / m 0x400820, 0x400850
Dump of assembler code from 0x400820 to 0x400850:
15 void FatherFoo () override {}
   0x0000000000400820 : push% rbp
   0x0000000000400821 : mov% rsp,% rbp
   0x0000000000400824 : sub $ 0x10,% rsp
   0x0000000000400828 : mov% rdi, -0x8 (% rbp)
   0x000000000040082c : mov -0x8 (% rbp),% rdi
   0x0000000000400830 : add $ 0xfffffffffffffff8,% rdi
   0x0000000000400837 : callq 0x400810 
   0x000000000040083c : add $ 0x10,% rsp
   0x0000000000400840 : pop% rbp
   0x0000000000400841 : retq
   0x0000000000400842: nopw% cs: 0x0 (% rax,% rax, 1)
   0x000000000040084c: nopl 0x0 (% rax)

As we have already discussed, this is offsets and FatherFoo () is called. And how much should we shift this to get Child? top_offset!

Please note that I personally find the non-virtual thunk name to be extremely confusing as it is a virtual table entry for a virtual function. I’m not sure that it’s not virtual, but this is only my opinion.


That's all for now, in the near future we will translate 3 and 4 parts. Follow the news!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *