PHP developers know how to write code, but they don't always know how a web server works

One of the key features of PHP is the ease with which a developer can write his first program. Many manuals reduce the information about the web server to a minimum to start development, for example, run openserver or copy the assembled docker image, where everything will already be configured and just go to the address http://localhost. All this leads to a narrowing of the knowledge of the general picture of how a web application works, which negatively affects the reputation of developers in this programming language as a whole. In the last article, I promised to talk about web servers for PHP, just to broaden the horizons of those people who missed this topic and try to reveal it in the most simple and understandable language.

The PHP programming language is interpreted, it does not have a built-in production-ready http server. Most often in my practice I have encountered:

They all work on a simple principle:

Listens to IP + port
Worker with interpreter is launched
Incoming requests are forwarded to the worker, and the received response is forwarded to the requesting client.

In theory, everything is simple, but as in any business, there are nuances.

In most cases, system administrators are responsible for the operation of the web server, but this article is more for developers, so we will only consider issues related to development, and not the complete setup of web servers.

In all types of web servers, the configuration specifies the rules by which the entry point to the application is located, as well as which files are interpreted and which are static. In popular PHP frameworks, everything comes down to one entry point to which all requests are sent, and routing is already configured inside the application. After the web server is launched, workers are launched along with it. At one time, 1 worker can only process 1 request. This is the most important thing that a developer should understand when writing code in PHP.

Apache and Nginx + PHP-FPM – with each request, they launch the script from the entry point, reconnect all the connected files and execute the code, in the case of RoadRunner, a new cli process is launched, in which the application may already be initialized. As a rule, to speed up the work, they use opcacheit is not part of the web server, but part of the interpreter. Its main function is to store the opcode received from Zend VM for its further reuse.

To reduce the overhead costs of the web server itself for processing a request, they initially, according to the configuration, raise a certain number of workers. Based on this, it is easy to conclude that at one point in time, the web server can process a number of requests equal to the number of workers, so the main task of the developer is to free the worker as quickly as possible so that it can process a new request.

It is important that the worker is released not at the moment of sending a response to the client, but at the moment when the last instruction of the code has been executed, that is, all __destruct functions have been called, all shutdown functions have been executed.

For better performance, it is important to specify the timeout of the connection itself and the timeout of the response to the request for each external connection. To protect against this, PHP has special settings – max_execution_time (default: 60 seconds), default-socket-timeout (default: 60 seconds).

What happens if we have 20 workers and 50 requests come in:

20 requests will be taken into work, the remaining 30 requests will go to the queue or will be immediately rejected by the web server.
After each request is processed, the next request will be taken, reducing the waiting queue.
If the waiting time and the request execution time fit into the total request waiting time of the web server, then such requests will be processed, if not, then such requests will generally return the 504 Gateway time out error.

What is important for a developer to know about Apache

.htaccess files are part of the web server configuration and are included with every request, so any changes to these files will change the operation of the web server.

What is important for a developer to know about Nginx + PHP-FPM server

PHP-FPM is not a web server, it is a FastCGI Process Manager, so Nginx is required along with it, which proxies http requests to PHP-FPM.

What is important for a developer to know about RoadRunner

RoadRunner is a web server written in golang. It runs the worker cli command and communicates with the running script via socket. Depending on the settings, it can restart workers using different parameters, but most often a running worker processes more than one request. All this leads to the fact that it is necessary to monitor:

Behind open connections (can be a database, message broker, socket for sending logs and any other open connection);
For static variables, they will be relevant only within the framework of one worker's work, but are available in different requests, ideally refuse them or use them with a full understanding of how late static binding works;
Clearing memory (there are some nuances here, the Zend VM itself, in which the code runs, is not ideal, so memory leaks can occur due to poor code organization and there is no way to get rid of them).

In real projects, a PHP script most often waits for a response from external systems, such as a database. While waiting, PHP consumes almost no processor resources, but uses registered RAM. I posted an example of the code on github:

As a result of the test I got:

User cpu: 6.048ms, system cpu: 3.629ms, memory: 0.38MB, memory real usage: 2.00MB, total time: 3.009s

This means that we occupied the worker for 3 seconds, but in reality we only used 3.5ms of CPU time, the rest of the time we were just waiting.

Conclusion

All this suggests that PHP is not suitable for use in high-load projects in its pure form. Opcache with preload, jit, roadrunner, which help reduce the initialization moment, do not solve the problem of synchronous execution of requests to external systems, and 90% of the code consists of receiving data from a stateful system, transforming and outputting the result and/or passing it further to a stateful system. But do not rush to get upset and reject the language, in most cases, even in relatively large projects, it is not necessary to withstand heavy loads and processing requests within one second is acceptable and writing business logic in this language is quite simple.

P.S. For me personally, PHP has become a close PL, although I work with different languages, golang, java, c-lang… these are excellent languages for their purposes, each has its pros and cons. Understanding the essence of how different languages work and what are the disadvantages of PHP, I want to continue talking about the language and its capabilities. For example, I managed to create a web server using swoole version 5 (previous versions were really bad), which, by accessing the database and giving a response, can simultaneously process a large number of connections without problems with processing requests at one time.