* Gateway – gateway
Azure Active Directory Gateway is a reverse proxy server that works with hundreds of services included in Azure Active Directory (Azure AD). If you used services such as office.com, outlook.com, azure.com, or xbox.live.com, then you were using an Azure AD gateway. The gateway is present in over 53 Azure datacenters worldwide and serves ~ 115 billion requests daily. Until recently, Azure AD Gateway ran on the .NET Framework 4.6.2. It has been running on .NET Core 3.1 since September 2020.
Motivation to migrate to .NET Core
The scale of the gateway results in significant computational resource consumption, which in turn costs money. Finding ways to reduce the cost of running the service was a key goal for the team behind it. The .NET Core performance hype caught our attention, especially since TechEmpower named ASP.NET Core one of the fastest web frameworks on the planet. We ran our own tests of the gateway prototypes on .NET Core, and the results allowed us to make a very simple decision: we have to migrate our service to .NET Core.
Is .NET Core Providing Real Cost Savings?
Certainly it does. With Azure AD Gateway, we were able to reduce our CPU (CPU) costs by 50%.
Previously, the gateway worked on IIS with .NET Framework 4.6.2. Today it runs on IIS with .NET Core 3.1. The image below shows that CPU usage was cut in half on .NET Core 3.1 compared to .NET Framework 4.6.2 (effectively doubling the bandwidth).
As a result of the increased bandwidth, we were able to reduce the size of our server from ~ 40K to ~ 20K cores (50% reduction).
How did the transition to .NET Core come about?
It took place in 3 stages.
Stage 1: Select an Edge Server
When we started working on the transition, the first question we had to ask ourselves was: Which of the three servers in .NET Core should we choose?
We ran our scripts on all three servers and realized it all comes down to TLS support. Given that the gateway is a reverse proxy, supporting a wide variety of TLS scenarios is very important.
As of .NET 5.0, Kestrel (thanks to the use of SslStream) does not support CTL stores for every hostname. Support expected in .NET 6.0.
Server HTTP.sys encountered a mismatch between the TLS configuration on Http.Sys and the .NET implementation: Even if the binding is configured not to negotiate client certificates, accessing the Client certificate property in .NET Core causes unwanted TLS renegotiation.
For example, doing a simple null check in C # results in a renegotiation of the TLS handshake:
if (HttpContext.Connection.ClientCertificate != null)
This is shown in https://github.com/dotnet/aspnetcore/issues/14806 in ASP.NET Core 3.1. At the time we made the transition in November 2019, we were on ASP.NET Core 2.2 and therefore did not select this server.
IIS met all our TLS requirements, so we chose this server.
Stage 2: Migrate the Application and Dependencies
Like many large services and applications, the Azure AD gateway has many dependencies. Some were written specifically for this service, and some were written by others inside and outside Microsoft. In some cases, these libraries have already targeted .NET Standard 2.0. In other cases, we updated them to support .NET Standard 2.0 or found alternative implementations, for example, removed our deprecated Dependency Injection library and used the built-in .NET Core support for dependency injection instead. At this stage, great help was provided by .NET Portability Analyzer.
For the application itself:
The Azure AD gateway had a dependency on IHttpModule and IHttpHandler from classic ASP.NET that ASP.NET Core doesn’t have. Therefore, we redesigned the application using middleware constructs in ASP.NET Core.
One of the things that really helped with the migration process was Azure Profiler (a service that collects performance traces on Azure VMs). We deployed our nightly builds to test sites, used wrk2 as a load agent for testing scenarios under load and collecting Azure Profiler traces. These traces then informed us about the next tuning needed to get the peak performance of our application.
Phase 3: Gradual Deployment
The philosophy we followed during the deployment was to detect as many issues as possible with minimal or no impact on operations.
We have deployed our initial versions in test, integration, and DogFood environments. This led to early detection of bugs and helped fix them before they got to work.
After completing the code, we deployed the .NET Core assembly to a single production system in a scaling unit. The scaling unit is a load balanced VM pool.
At a scaling unit of ~ 100 machines, where 99 machines were still running our existing .NET Framework assembly and only 1 machine had the new .NET Core assembly installed.
All ~ 100 cars in this scale block receive the exact type and amount of traffic. We then compared the status codes, error counts, functional scenarios, and performance of one machine with the other 99 machines to detect anomalies.
We wanted this single machine to behave functionally like the other 99 machines, but with a much higher performance / throughput, which is what we observed.
We’ve also “redirected” traffic from live production devices (running .NET Framework build) to devices running .NET Core to compare and contrast as above.
Once we achieved functional equivalence, we began to increase the number of scale units running on .NET Core and gradually expanded them to a whole datacenter.
After migrating the entire datacenter, the final step was to gradually roll out globally to all Azure datacenters where the Azure AD Gateway service is present. The migration is complete!
ASP.NET Core is RFC attentive. This is a very good feature as it promotes good practice. However, classic ASP.NET and .NET Framework were more forgiving, which causes some backward compatibility issues:
The default web server only allows ASCII values in HTTP headers. At our request, Latin1 support has been added to IISHttpServer: https://github.com/dotnet/aspnetcore/pull/22798
HttpClienton .NET Core previously only supported ASCII values in HTTP headers.
The .NET Core team has added Latin1 support to .NET Core 3.1: https://github.com/dotnet/corefx/pull/42978
The ability to select an encoding scheme was added in .NET 5.0: https://github.com/dotnet/runtime/issues/38711.
Forms and cookies that do not comply with RFCs result in validation exceptions. Therefore, we created “fallback” parsers using the classic ASP.NET source code to maintain backward compatibility for clients.
CopyToAsync()performance degradation was observed due to multiple 1-byte copies of an n-byte stream. This issue was addressed in .NET 5.0 by choosing the default 4K buffer size: https://github.com/dotnet/aspnetcore/issues/24032.
Remember the classic ASP.NET quirks:
Space characters are auto-trimmed:
foo.com/oauth? client = abc trims to foo.com/oauth?client=abc in classic ASP.NET.
Over time, clients / downstream services became dependent on these trims, and ASP.NET Core does not automatically trim.
So we had to remove whitespace (auto-trim) to mimic the classic ASP.NET behavior.
Content-Typeautomatically generated if the volume is missing:
When the response size is greater than zero bytes, but the header
Content-Type missing, classic ASP.NET generates standard header
Content-Type:text/html… ASP.NET Core not generating header
Content-Type is forced by default, and clients who believe that the header
Content-Type always appears in the answer, begin to experience problems. We mimicked the classic ASP.NET behavior by adding a default Content-Type when not present in downstream services.
The move to .NET Core resulted in a doubling of the bandwidth of our service, and it was a great decision. Our journey through .NET Core won’t end with the transition. In the future, we are considering the following options:
Upgrade to .NET 5.0 to improve productivity.
Moving to Kestrel so we can intercept connections at the TLS layer for better resiliency.
Use YARP Components / Best Practices (https://microsoft.github.io/reverse-proxy/) in our own reverse proxy and contribute as well.
The translation of the material was prepared within the framework of the specialization “Network Engineer”… This curriculum is suitable for those planning to take up the network engineering profession from scratch and prepare for the CCNA industrial certification.