VxLAN fabric. Part 2.5

Hello. I passed an interview here and the thought appeared the next part from a series of articles, launching the course “Network engineer” from OTUS, to make it more theoretical in order to answer some of the questions that I encountered during the interview.

Many things here will be more of a basic level in terms of VxLAN and should not cause difficulties.

Part 1 of the cycle – L2 connectivity between servers
2 part of the cycle – Routing between VNIs

I. How does the VxLAN factory know about MAC addresses?

Yes, we have already figured out that MAC and IP addresses are transmitted via EVPN route-type 2. But how does EVPN know about them?

Everything is quite simple and works similarly to the logic of a regular VLAN:

Frame from source goes to switch port (VTEP)
The switch, if it does not know the source MAC, writes it to its TCAM table
Since the switch acts as a VTEP, it transmits information about the source MAC and IP addresses through EVPN route-type 2 (how exactly depends on the factory settings. In our case, Route-reflector (RR) is used, so information is sent to the RR and from him to the rest of VTEP)

Everything is clear with the source, but what to do with Destination? After all, the Source Host most likely does not know the destination MAC address and will send an ARP request.
Two options appear:

do not use Suppress-ARP function
use Suppress-ARP function

In the first case, everything is quite simple, but not optimal. When a Broadcast request is received, VTEP will send it further within the VNI from which the request came. That is, this request will be distributed throughout the factory in the form of Unicast messages.

In the second case, when an ARP request is received, VTEP itself replies with an ARP reply, and the ARP REQ is not sent further.

However, this logic only works if VTEP already knows the Destination MAC. If the address is not known, then we will go along the first path. I touched upon the work and configuration of Suppress-ARP in more detail in the first part of the series.

II. Why is UDP used?

The question is no less interesting and the answer is quite simple. To do this, let’s recall the logic of the VxLAN factory.

A VxLAN tag with the VNI number is added to the frame arriving at the VTEP port. Then the resulting frame is packed in UDP, encapsulated in a new IP packet and transmitted over the Underlay network.

So why can’t the original VxLAN tagged frame be packed into IP and must use UDP?

And all because of one field in the header of the IP – Protocol, which indicates which protocol is higher. Examples of protocols and their numbers (wiki):

ICMP - 1
TCP - 6
UDP - 17
GRE - 47

And this is the whole secret – VxLAN does not have such a number, which means that the IP protocol will not be able to tell about it and a self-respecting network will not let such a packet through, so engineers bypassed this problem using UDP.

And here a second question may arise – why not TCP, because it is so reliable and good? This is because TCP is so reliable and good – it guarantees delivery, and for that guarantee it uses eerily long timers, checks, bandwidth throttling, etc. As a result, TCP gives a lot of latency, which will be especially noticeable when the VxLAN client of the factory also uses TCP.

III. Difference between ingress-replication and multicast

The topic is quite voluminous and it will not be possible to give an answer as a short explanation. Therefore, the work of Multicast will be considered in one of the next article in the cycle. However, I will try to give a brief description of the differences between the two technologies.

First, let’s look at how packets are transmitted in both cases.

when using ingress-replication – when receiving broadcast traffic (for example, an ARP request) – the request is encapsulated inside the VxLAN and transmitted to each VTEP in the VxLAN via Unicast messages (for example, we will refuse RR). Since VTEP will be greater than 1, then the broadcast traffic will be duplicated:

In the case of using Multicast, each VTEP for each VNI subscribes to a specific Multicast group. And now, when receiving broadcast traffic, VTEP encapsulates the ARP request in an IP packet. The IP packet headers use the multicast group address for this VNI as the destination address, and the source address uses the IP address of the NVE interface. For example, VNI 10000 is associated with multicast group 225.1.10.10

Thus, duplication of broadcast traffic disappears. Plus, with proper traffic optimization, work via Multicast will be more scalable. The only difficulty is that the underlay network must support Multicast traffic.

If you have a question – Why did you need EVPN at all, if everything can work perfectly through Multicast, because it is a more scalable solution?

It is rather difficult to give an answer here, and you will have to decide on your own which technology to use. At the moment, Multicast is indeed a more scalable solution. But EVPN is constantly being improved and new route-types appear in it to transmit more and more information about the network for more flexible configuration. Additionally, EVPN is built on the basis of BGP, which means that it is possible to use all optimization methods that BGP itself has (for example, RR is already used in my stand to reduce BUM traffic and optimize the announced information).

It turned out to be a rather small part, but I think it will help clarify some points in understanding the technology.

OSPF ipv6. Practical skills