How to protect your public site with ESNI

Hi Habr, my name is Ilya, I work in the Exness platform team. We develop and implement basic infrastructure components that use our product development teams.

In this article, I would like to share the experience of implementing encrypted SNI (ESNI) technology in the infrastructure of public websites.


Using this technology will increase the level of security when working with a public website and comply with the internal security standards adopted by the Company.

First of all, I want to draw attention to the fact that the technology is not standardized and is still in draft, however CloudFlare and Mozilla already support it (in draft01) This motivated us to such an experiment.

Bit of theory

ESNI – This is an extension to the TLS 1.3 protocol, which allows you to encrypt SNI in the “Client Hello” TLS handshake message. Here’s what Client Hello looks like with ESNI support (instead of the usual SNI, we see ESNI):

To use ESNI, you need three things:

  • DNS
  • Customer support
  • Server side support.

DNS

You need to add two DNS records – A, and Txt (TXT record contains a public key with which the client can encrypt SNI) – see below. In addition, there should be support Doh (DNS over HTTPS), because the available clients (see below) do not activate ESNI support without DoH. This is logical, since ESNI implies the encryption of the name of the resource we are accessing, that is, it makes no sense to access DNS over UDP. Moreover, the use of DNSSEC allows you to protect yourself from cache poisoning attacks in this scenario.

Currently available several DoH providers, among them:

Cloudflare declares (Check My Browser → Encrypted SNI → Learn More) that their servers now support ESNI, that is, for CloudFlare servers in DNS, we have at least two records – A and TXT. In the example below, we query Google DNS (over HTTPS):

AND record:

curl 'https://dns.google.com/resolve?name=www.cloudflare.com&type=A' 
-s -H 'accept: application/dns+json'
{
  "Status": 0,
  "TC": false,
  "RD": true,
  "RA": true,
  "AD": true,
  "CD": false,
  "Question": [
    {
      "name": "www.cloudflare.com.",
      "type": 1
    }
  ],
  "Answer": [
    {
      "name": "www.cloudflare.com.",
      "type": 1,
      "TTL": 257,
      "data": "104.17.210.9"
    },
    {
      "name": "www.cloudflare.com.",
      "type": 1,
      "TTL": 257,
      "data": "104.17.209.9"
    }
  ]
}

Txt record, the request is formed according to the template _esni.FQDN:

curl 'https://dns.google.com/resolve?name=_esni.www.cloudflare.com&type=TXT' 
-s -H 'accept: application/dns+json'
{
  "Status": 0,
  "TC": false,
  "RD": true,
  "RA": true,
  "AD": true,
  "CD": false,
  "Question": [
    {
    "name": "_esni.www.cloudflare.com.",
    "type": 16
    }
  ],
  "Answer": [
    {
    "name": "_esni.www.cloudflare.com.",
    "type": 16,
    "TTL": 1799,
    "data": ""/wEUgUKlACQAHQAg9SiAYQ9aUseUZr47HYHvF5jkt3aZ5802eAMJPhRz1QgAAhMBAQQAAAAAXtUmAAAAAABe3Q8AAAA=""
    }
  ],
  "Comment": "Response from 2400:cb00:2049:1::a29f:209."
}

So, in terms of DNS, we should use DoH (preferably with DNSSEC) and add two entries.

Customer support

If we are talking about browsers, then at the moment support is implemented only in FireFox. Here provides instructions on how to activate ESNI and DoH support in FireFox. After the browser is configured, we should see something like this:

Link to check the browser.

Of course, TLS 1.3 should be used to support ESNI, as ESNI is an extension to TLS 1.3.

For testing the backend with ESNI support, we implemented the client on go, But more on that later.

Server side support

ESNI is currently not supported by web servers such as nginx / apache, etc., since they work with TLS through OpenSSL / BoringSSL, in which ESNI is not officially supported.

Therefore, we decided to create our own front-end component (ESNI reverse proxy), which would support the termination of TLS 1.3 with ESNI and proxying HTTP (S) traffic to upstream that does not support ESNI. This allows you to apply the technology in an existing infrastructure, without changing the main components – that is, use current web-servers that do not support ESNI.

For clarity, we give a diagram:

I note that the proxy was conceived with the ability to terminate a TLS connection without ESNI, to support clients without ESNI. Also, the protocol for communicating with the upstream can be either HTTP or HTTPS with a TLS version lower than 1.3 (if the upstream does not support 1.3). This design provides maximum flexibility.

Implement ESNI Support on go we borrowed from Cloudflare. Immediately, I note that the implementation itself is quite nontrivial, since it implies changes in the standard library crypto / tls and therefore requires “patching” GOROOT before assembly.

To generate ESNI keys we used esnitool (also the brainchild of CloudFlare). These keys are used to encrypt / decrypt SNI.
We tested the build using go 1.13 on Linux (Debian, Alpine) and MacOS.

A few words about operational features

ESNI reverse proxy provides metrics in the Prometheus format, such as rps, upstream latency & response codes, failed / successful TLS handshakes & TLS handshake duration. At first glance, this seemed sufficient to evaluate how the proxy handles traffic.

We also performed stress testing before use. Results below:

wrk -t50 -c1000 -d360s 'https://esni-rev-proxy.npw:443' --timeout 15s
Running 6m test @ https://esni-rev-proxy.npw:443
  50 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.77s     1.21s    7.20s    65.43%
    Req/Sec    13.78      8.84   140.00     83.70%
  206357 requests in 6.00m, 6.08GB read
Requests/sec:    573.07
Transfer/sec:     17.28MB 

We carried out load testing of pure quality, to compare the schemes using and without ESNI reverse proxy. We “poured” traffic locally in order to eliminate “interference” in the intermediate components.

So, with ESNI support and proxying to upstream with HTTP, we got around ~ 550 rps from one instance, with the average CPU / RAM consumption of ESNI reverse proxy:

  • 80% CPU Usage (4 vCPU, 4 GB RAM hosts, Linux)
  • 130 MB Mem RSS

For comparison, RPS for the same upstream nginx without TLS termination (HTTP protocol) ~ 1100:

wrk -t50 -c1000 -d360s 'http://lb.npw:80' –-timeout 15s
Running 6m test @ http://lb.npw:80
  50 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.11s     2.30s   15.00s    90.94%
    Req/Sec    23.25     13.55   282.00     79.25%
  393093 requests in 6.00m, 11.35GB read
  Socket errors: connect 0, read 0, write 0, timeout 9555
  Non-2xx or 3xx responses: 8111
Requests/sec:   1091.62
Transfer/sec:     32.27MB 

The presence of timeouts indicates that there is a lack of resources (we used 4 vCPU, 4 GB RAM hosts, Linux), and in fact the potential RPS is higher (we received numbers up to 2700 RPS on more powerful resources).

In conclusion, I note that ESNI technology looks quite promising. There are still many open questions, for example, questions of storing a public ESNI key in DNS and rotating ESNI keys – these issues are actively discussed, and the latest version of the draft (at the time of writing) ESNI is already 7.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *