The Smart Guy on the Ladder Spouts Off About OSPFv2 MTU

I had to call the recruiter to remind him about myself. On the other side of the screen, an empty chair stared at me against the backdrop of a dull partition. When the interviewer finally showed up, I saw a tired man who had no desire to talk. It was a large company – well, I didn't expect any other attitude. They were buying up local tier 3 providers and transferring their services to their core. There was a lot of work, I had to deal with different networks, the structure of which I had no idea about.

Of course, I didn't know any of this at the time. I was like a teenager going through puberty, poking his pod into any hole. And now I had a disgruntled engineer in front of me, who decided to quickly deal with the CCNA guy and get back to his business. He asked why the OSPF neighborhood was hanging at the Exstart/Exchange stage. I couldn't answer, and the interview was over. Before throwing me out the door, he condescendingly suggested that the problem was in the MTU, but at that time this word for me was something from the field of launching spaceships – a certain parameter, the handling of which requires deep calculations and remarkable mental abilities. In other words, for a minute I found myself in the same module with an astronaut, but was immediately thrown out of it. By the will of fate, I managed to get a job in the same office in another city, and, in fact, began to do the same thing – transfer services. Then I understood where this question about MTU in OSPF came from …

We rarely ask questions. Questions require answers. Questions make us responsible, or worse, give rise to other questions. And once a question gets into our heads, we sit hunched over a screen or a book in the middle of the night, instead of hugging a tender wife in a warm bed. Engineering and technical solutions* are created so that questions are not asked – just follow the instructions and leave work on time. Unfortunately, even if such a document is drawn up scrupulously, the executor is in no hurry to read it, preferring a brief briefing from a senior colleague. What do all these solutions and instructions boil down to? As a rule, they contain only standard configurations, the order of actions that must be performed, and without any explanation of what and why is being done. Even the official manuals of large network equipment manufacturers are guilty of this – what can we say about providers with their internal documents. Only rare enthusiasts create truly detailed manuals, geniuses who are passionate about their business. Try to find one like that.

The engineers I worked with had no mention of MTU in OSPF. MTU negotiation was enabled by default on the hardware, so typical configurations did not have a single line of code about MTU. Most of the arias we worked with were stubs: the default route was sent from the ABR side, and only a few client networks were sent from the stub router**. The smallest configured MTU I encountered was at least 1200 bytes. For an LSA in a stub aria, this is a huge size, like Gibraltar for a small boat. It seemed that configuring stub routers with MTU checking was like writing a condition into a contract that would take a long time to argue about, but would never happen in real life. Troubleshooting OSPF because of MTU mismatch is a waste of time.

From the book by Mr. John T.Moy “OSPF: Anatomy of an Internet Routing Protocol” on page 60 you can read [2]that when OSPF was developed, the Internet was also in its formative stages and used many evolving data link layer technologies. All of them had their own characteristics: different transmission speeds, error rates, MTUs. Despite the fact that OSPF was originally developed for the ARPANET [2] (MTU size – 1006 bytes [3]), over time OSPF began to aspire to become a standard. To do this, it needed to be compatible with all of these technologies. OSPF had to work equally well on top of them all, or at least minimize the impact of different specifications on the protocol's performance. Indeed, RFC1583 (1994) only mentions MTU [4]. Its essence comes down to the fact that OSPF is not capable of independently fragmenting packets, and relies on the IP protocol for this (while it is recommended to avoid fragmentation whenever possible). That's all. Studying the history of the Internet's development using RFCs is a real pleasure. Fortunately, in 1998 the already mentioned book by John T.Moy, the author of the same RFC1583, was published. It is clear from it [2]that at that time only a few data link technologies were running IP: Ethernet and point-to-point serial lines (MTU 1500 bytes); PPP and FDDI (4352), which were just being developed; 802.5 Token Ring (4464 [8]) and Frame Relay, which had a poorly defined IP MTU. On top of these links, there was a possibility that peers would have inconsistencies in the maximum packet size they could send over the link, causing problems when forwarding: as soon as one router sent a packet larger than another could receive, the packet would be dropped by the receiving end. IP fragmentation can come to the rescue. But for fragmentation to happen, all routers must first agree on an MTU. OSPF was modified to allow MTU negotiation. The modification was made to the Database Description [5] two bytes (Interface MTU) immediately after the header.

I had to assemble a lab from a pair of L3 switches*** to check this. I set the MTU to 160 bytes***** on the interface of the stub router, and left 1500 bytes unchanged on the ABR side. Enabled debug. The status was Full, the neighborhood was established immediately without a single comment. I checked the same Interface MTU in the Database Description, and it was 0. It turned out that the vendor disabled MTU negotiation by default on all its products, explaining this by the fact that it does not want to create problems [6]MTU negotiation is an option, not a requirement.

Next, I disabled the announcement of summary LSAs on the ABR, leaving only the default route. I chose p2p as the network type based on the unnumbered loopback interface. As a result, I received one Router LSA with its loopback address and only one summary LSA with the default route from the ABR. And from the side of the dead-end router – one Router LSA with its loopback. Now it was necessary to form such an LSA on the ABR, the size of which would exceed 160 bytes, and send it along the direct link towards the dead-end router. For this purpose, I could only choose Router LSA. Firstly, because I had a p2p connection, and a dead-end aria with summary announcements disabled, so I could not have other LSAs. Secondly, even if I returned the summary ones, they would not suit me, because each new network announcement would form one LSA, which would contain only one added network, without all previously announced ones, the size of the summary LSA would always be the same – 28 bytes. Another thing is Router LSA – when adding a new network, it grew by 12 bytes, including both old networks and the new one. In other words, one LSA Update can contain several different types of LSA, but the summary LSA can contain an announcement of only one network, its size is always the same, while Router LSA can contain a whole bunch (or rather, 0x0000) of networks (numbers of link). Yes, of course, you can stuff as many (or rather 0x00000000) LSAs (Number of LSAs) into one LSA update, but for this you will have to create and “commit” many OSPF active interfaces at once. In short, it was decided to increase the size of LSA Update through Router LSA Type 1.

Next, it was necessary to understand how many networks to add to the ABR to get a packet larger than 160 bytes. 20 bytes – IP header, 24 bytes – OSPF header, 4 bytes – for the number of LSAs, 36 bytes – Router LSA, of which 12 bytes are allocated for the network description. In total, the basic size of a Router LSA packet with an announcement of one network is 84 bytes. The size of this frame is 98 bytes (84 + 14 bytes Ethernet header). So, it is necessary to add 7 networks to get an LSA Update packet of 168 bytes. Added. “Committed.” And something unexpected happened. The stub router received a Router LSA of 120 bytes (24 bytes + 8*12), or a 168-byte packet: 20 bytes – IP header, 24 bytes – OSPF header, 4 bytes – for the number of LSAs (Number of LSAs), 24 bytes – Router LSA-1 header, 96 bytes – 8 networks (12 bytes for each network description). The 168-byte OSPF packet “flew” through the MTU equal to 160 bytes of the incoming interface of the stub router and was added to lsdb. There was no packet drop. I rechecked on another product, another firmware. The same thing. There could not be an error. I checked using ICMP. Sent a 133-byte packet from ABR without fragmentation. That is, with an excess of 1 byte (this vendor adds +20 bytes of IP and +8 bytes of ICMP headers behind the scenes), the final size was 161 bytes. The packet flew perfectly and the answer came. I was clearly missing something. And I was missing 30 years of technology development…

Ethernet has long been improved, the concept of jumbo frames has long been implemented. Having studied the capabilities of both interfaces, I found another parameter right after the MTU parameter – Maximum Frame Length. It was equal to 9216 bytes. Support for jumbo frames was enabled by default. I could disable this support in only one way – by setting the minimum permissible frame value to 1518 bytes. It is impossible to set it less. I suspect that this is a limitation of a specific implementation. A frame size less than or equal to 1518 bytes is not a jumbo frame. [7]. “Well, if the mountain won't come to Mohammed, Mohammed will come to the mountain.” So it was necessary to increase the LSA size on the ABR side. To form a frame of 1526 bytes, 120 networks were required. Before sending a frame of this size, I stopped at 119 networks – I wanted to approach the extreme boundary and make sure that such a frame (1514 bytes in size) would pass successfully. It passed. The dead-end router accepted it and updated lsdb. But it did not accept the next frame with the announcement of 120 networks. The 1526-byte frame was discarded. The error counter on the interface began to grow: Giants +1.

There is one detail that I have intentionally hidden for the sake of simplicity. In fact, ABR was unable to send one 1526-byte frame with 120 networks, because the size of this packet was 1512 bytes, which is 12 bytes larger than its configured MTU. ABR sent two frames using IP fragmentation. ABR compared the packet size with the MTU of its outgoing interface, calculated that the latter was smaller than the packet size, and fragmented the packet by sending it in two parts. And the stub router received them, because both of these packets: one of 1500 bytes (frame 1514), and the second of 84 bytes (frame 98)**** – fit perfectly through the Maximum Frame Length of 1518 bytes of its receiving interface. Yes, it was necessary to raise the MTU on the outgoing interface of ABR to avoid fragmentation and send a really large frame of 1526 bytes. It was then that the network counter in lsdb (link counts) on the dead-end router froze at 119, and the error counter began to grow: Giants +1.

In my test, the packet was dropped and a Giants +1 error was generated. Not receiving an ACK, the ABR continued to send this giant LSA, and the error counter on the stub router continued to grow. If the MTU was negotiated, this scenario would not have happened. If the ABR knew the MTU of the stub router, it would have performed fragmentation. Before sending the packet, it would have compared the packet size not with the MTU of its outgoing interface, but with the OSPF-negotiated MTU! That's the point. If the first is greater than the second, then fragmentation. If less or equal, then forwarding without fragmentation. There is no other outcome in which the frame will be dropped.

The idea of ​​the relationship between Maximum Frame Length and MTU is a bit crazy. The docs clearly state [7]that on the incoming interface the packet size is not checked, but the frame size is checked – whether it “fits” into the Maximum Frame Length. The router takes care of the MTU when sending a packet, and when receiving it takes care of the Maximum Frame Length. It is necessary to realize that the Maximum Frame Length cannot be less than the MTU.

The respected reader will probably have a handbook in a beautiful hardcover that will tell you about all these relatively new improvements in Ethernet technology better than I can. My task was to share personal experience, as well as materials from authoritative sources. It is clear that an opus like this cannot be considered exhaustive. You will undoubtedly add a large number of recommendations to your personal list, also including useful instructions from other engineers. I will draw a line under this protracted opus with a short list of theses:

1. MTU negotiation is an option, not a requirement, and is left to the discretion of the vendor.

2. OSPF troubleshooting due to MTU mismatch is a waste of time in dead-end environments.

3.Maximum Frame Length – a new player who stands in goal.

4.MTU – an old player who hits the ball.

1.https://ru.wikipedia.org/wiki/%D0%9B%D0%B5%D1%81%D1%82%D0%BD%D0%B8%D1%87%D0%BD%D1%8B %D0%B9_%D1%83%D0%BC.

2.OSPF Anatomy of an Internet Routing Protocol. John T.Moy, pp. 72, 60, 72.

3.Fragmentation Considered Harmful Christopher A. Kent, Jeffrey C. Mogul. http://ccr.sigcomm.org/archive/1995/jan95/ccr-9501-mogulf1.pdf, page 3.

4.https://datatracker.ietf.org/doc/html/rfc1583, A.1 Encapsulation of OSPF packets.

5.https://datatracker.ietf.org/doc/html/rfc2328#ref-Ref22, 10.8. Sending Database Description Packets.

6.https://support.huawei.com/hedex/hdx.do?docid=EDOC1100345000&id=EN-US_CLIREF_0000001907603308.

7.https://support.huawei.com/hedex/hdx.do?docid=EDOC1100345000&id=EN-US_CLIREF_0000001907600404.

8.https://datatracker.ietf.org/doc/html/rfc1191 Table 7-1: Common MTUs in the Internet.

*- ITR – that's what they called internal technical documents in the office where I once worked.

** – for simplicity, I call the ABRa peer with which it establishes a neighborhood in a dead-end aria. And not the well-known stub-router option, which makes + 65535 to the cost. https://support.huawei.com/hedex/hdx.do?docid=EDOC1100345000&id=EN-US_CLIREF_0000001907763288

*** – I won't name the vendor because I want to stay within the framework of the concept, and not individual implementations

**** – this is a guess. At this point my packet capturer broke, so I can't say exactly how big the fragments were.

***** – minimum recommended value for this hardware.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *