How to scan the entire Internet

The entire range of IPv4 addresses is 4 billion IP addresses. This seems like a huge number, but the entire IPv4 Internet can be completely scanned for a single TCP port in 40 minutes, for example, find all the web servers in the world or all open SSH ports. In this case, one server and a gigabit channel are enough. This is useful for research, for example, if you collect statistics on the technologies used in the world, or estimate the percentage of vulnerable services that are open to the outside.

Program zmap (not to be confused with nmap) allows you to scan huge ranges of networks much faster than any scanner because of the special architecture. In this article, we will look at an example of how to compile a list of all the web servers in the world using zmap. Having a list of hosts with an open HTTP port, you can already use a more intelligent scanner, passing it an accurate list of targets.

What is bad nmap

For scanning small subnets, Nmap is traditionally used – a popular open source multitool with a lot of Pentester buns inside. It is well suited for scanning small ranges and selective hosts. Nmap has its own engine for writing custom scripts in lua (Nmap Script Engine) and many cool ready-made scripts. But nmap is not well suited for scanning large networks, such as millions or even billions of addresses. Such tasks will take several days or even weeks. The fact is that nmap uses the network subsystem of the operating system and a full socket is opened for each request. You can’t do without it for a full TCP connection when you need to chat with the service, for example, make a GET request to a web server. But this greatly reduces performance when a quick SYN scan is required, where the task is only to find out if the port is open.

Zmap and Masscan

Zmap and Masscan asynchronous scanners are more suitable for us: both work much faster, use a driver Pf_ring (a socket that significantly speeds up packet capture) and randomize addresses to avoid a killer DoS. Unlike Nmap, they do not use system TCP / IP: the first generates bare Ethernet frames, the second uses a self-written TCP stack.

Masscan is considered the fastest Internet scanner that can go through the entire IPv4 range in six minutes, but only with a few physical adapters connected in parallel, outgoing traffic with SYN packets is generated from one, and answers come to the other. In conditions of a single connection, the modified Zmap was almost one and a half times faster. These records, of course, were set on the top iron on 10 Gbit / s channels, and it will be difficult and expensive to repeat them yourself. On a much more affordable gigabit connection, Zmap also works faster than Masscan due to more efficient use of channel and CPU resources and can be completed in about 45-50 minutes.


Simplified workflow of Zmap and Masscan. The outgoing traffic generator and the incoming response processor work separately

Without going into technical details, Zmap and Masscan use two entities: an outgoing traffic generator containing SYN requests and a separate incoming response processor. The performance here is limited only by the channel width and network interface performance, namely the PPS limit (packets per second). Therefore, the scanning process can be divided into several interfaces or even several physical hosts, if it is possible to replace Source IP with the address of the incoming packet handler.

Scan Prep

It must be borne in mind that the scanner loads the system and especially the network interfaces, utilizing the entire available channel. If you start scanning without warning the host, it will look like DDoS and you will most likely be disconnected from the network very quickly. You also need to be prepared that scanning the entire Internet will provoke a response – they will start complaining about you. That is, the host will receive a bunch of automatic letters complaining “you are scanning us.”

So before starting your project it is better to prepare:

  • Tell your host about your plans. – keep in touch with the network administrator to be able to quickly respond to problems (or someone’s dissatisfaction with the clogged channel). Make sure that the scan will not overload the local network or the provider’s routers. Start only after obtaining explicit consent from the hoster.
  • Set PTR record – Your actions will probably be triggered by automatic scanning detection systems, system administrators will look at these logs. Try to make what they see as clear as possible. For example, set an informative PTR record for the IP address from which the scan will take place, something like:
    scanner-for-educational-project.ivan-ivanov.com
  • Add Explanations to User-Agent – if you scan the web and perform some HTTP requests, install additional explanations in the User-Agent, clearly explain the purpose and scope of the scan. There you can add a link to a page explaining the objectives of the project.
  • Randomize addresses – if possible, do not scan networks in a row, it looks like a clear malicious behavior pattern. Use random ordering of destination addresses.

NAT, filters and firewalls

It is important to understand that zmap generates millions of TCP requests at the same time. If a router with NAT, a firewall with connection tracking, DDoS protection, or any other system with a stateful firewall trying to track connections is installed between the Internet and the scanning server, it will break down because it cannot digest so many connections. Therefore, you cannot run zmap while in the NAT, for example, behind a home WiFi router.

Trying Zmap

Install Zmap (instruction)

Test on something simple, see neighboring web servers from habr.com:

$ zmap -p 80 178.248.237.0/24 -B 100M -o habr.txt

Options:

  • -p 80 – TCP port 80, that is, we are only looking for HTTP servers
  • 178.248.237.0/24 – target range of addresses. If you do not specify it, the entire Internet will be scanned.
  • -B 100M – The maximum channel width that zmap uses. If this option is not specified, the entire available channel will be disposed of.
  • -o habr.txt – write the results to a text file. It will get the addresses that answered the request, that is, on which there is an HTTP server


Scanning 254 addresses with zmap took a few seconds

Zmap output formats

By default, zmap simply adds the found addresses to a text file, separating them with line breaks. But it can also write results in json and xml format. Option –output-fields allows you to specify additional fields that will be added to the output.

Let’s try for example a more advanced output format in json, indicating the port on which the response request and the TTL of the packet came:

$ zmap -p80  --output-module=json --output-fields=ttl,sport,dport,saddr 178.248.237.68 -o habr.com.json

# Посмотрим результат
$ cat habr.com.json
{ "ttl": 58, "sport": 80, "dport": 51309, "saddr": "178.248.237.68" }

Scan the entire Internet!

I tried to run a scan on VPS, and after a few minutes I was banned with the requirement to check the server for viruses. This is quite logical, because such a scan looks like DDoS from the host. But after discussing my task with support, I was offered to take a dedicated server and scan on it.

This is how scanning on the 200Mbit / s interface looks like, the predicted time is about six hours:

Even with a channel of 100 Mbit / s, the entire Internet can be scanned overnight.

What to do next

Now we can tell how many world addresses are listening on port 80 and collect a list of them. It can be sent to the L7 scanner to analyze the application layer for vulnerabilities.

For example, get the HTML Title of all web servers in the world using nmap. To the nmap input, we transfer the file received from zmap in the usual format:


$ nmap -sV -p 80 -v -n  --script http-title  -iL habr.txt


Similar Posts

One Comment

  1. Hi, I really want to know the name of the host provider that its support told you that you can perform such scan with dedicated server condition.

Leave a Reply

Your email address will not be published. Required fields are marked *