Understanding JIT in PHP 8

4 min


Translation of the article prepared in advance of the start of the course “Backend PHP Developer”


TL; DR

The compiler Just In Time in PHP 8 is implemented as part of opcache extensions and is intended to compile the operating code in processor instructions in runtime.

It means that with JIT, some operating codes should not be interpreted by Zend VM, such instructions will be executed directly as processor level instructions.

JIT in PHP 8

One of the most commented features of PHP 8 is the Just In Time (JIT) compiler. He is well-known in many blogs and communities – there is a lot of noise around him, but so far I have not found a lot of details about the work of JIT in detail.

After repeated attempts and disappointments to find useful information, I decided to study the PHP source code. Combining my little knowledge of C and all the scattered information that I have been able to collect so far, I have managed to prepare this article and I hope that it will help you better understand JIT PHP.

Simplifying things: when the JIT is working properly, your code will not be executed through the Zend VM; instead, it will be executed directly as a set of processor level instructions.

That is the whole idea.

But in order to better understand this, we need to think about how php works inside. This is not very difficult, but requires some introduction.

I already wrote an article with a brief an overview of how php works. If this article seems to be overly complicated, just read its predecessor and come back. This should ease the situation a bit.

How is PHP code executed?

We all know that php is an interpreted language. But what does this really mean?

Whenever you want to execute PHP code, be it a fragment or an entire web application, you have to go through the php interpreter. The most commonly used ones are PHP FPM and the CLI interpreter. Their work is very simple: get the php code, interpret it and return the result back.

This is the usual picture for each interpreted language. Some steps may vary, but the general idea is the same. In PHP, this happens like this:

  1. PHP code is read and converted to a set of keywords known as Tokens. This process allows the interpreter to understand in which part of the program each piece of code is written. This first step is called Lexing or Tokenizing..
  2. With the tokens in hand, the PHP interpreter will analyze this collection of tokens and try to find meaning in them. As a result, an Abstract Syntax Tree (AST) is generated using a process called parsing. AST is a set of nodes indicating which operations should be performed. For example, “echo 1 + 1” should actually mean “print the result 1 + 1” or, more realistically, “print the operation, the operation is 1 + 1”.
  3. Having an AST, for example, makes it much easier to understand operations and their priority. Converting this tree to something that can be done requires an Intermediate Representation IR, which in PHP we call the opcode. The process of converting AST to operating code is called compilation.
  4. Now that we have the opcodes, the fun part is: performance code! PHP has an engine called Zend VM, which is able to get a list of opcodes and execute them. After all the opcodes have completed, the program ends.

To make this a little clearer, I made a diagram:


A simplified diagram of the PHP interpretation process.

Straightforward enough, as you can see. But there is also a bottleneck: what is the point of lexing and parsing the code every time you execute it, if your php code may not even change so often?

In the end, we are only interested in opcodes, right? Right! That’s why there is opcache extension.

Opcache Extension

The Opcache extension comes with PHP, and there is usually no particular reason to deactivate it. If you are using PHP, you should probably enable Opcache.

What he does is add an operational shared cache layer for the opcodes. Its task is to extract the opcodes recently generated from our AST and cache them, so that with subsequent execution it would be easy to skip the phases of lexing and parsing.

Here is a diagram of the same process, given the Opcache extension:


PHP interpretation thread with Opcache. If the file has already been parsed, php retrieves the cached opcode for it, rather than parsing it again.

It is simply mesmerizing how beautifully the steps of lexing, parsing and compilation are skipped.

Note: it is here that performs best PHP 7.4 preload function! This allows you to tell PHP FPM to analyze your code base, convert it to opcodes, and cache them even before you do anything.

You can start to think about where you can stick JIT here, right ?! At least I hope so, which is why I am writing this article …

What does the Just In Time compiler do?

After listening to Ziv’s explanation in PHP and JIT podcast episode from PHP Internals News, I managed to get some idea of ​​what JIT really should do …

If Opcache allows you to quickly get the operating code so that it can go directly to the Zend VM, JIT is intended to make it work without the Zend VM at all.

Zend VM is a C program that acts as a layer between the operating code and the processor itself. JIT generates compiled code at runtime, so php can skip Zend VM and go directly to the processor. Theoretically, we should benefit from this.

It sounded strange at first, because to compile machine code you need to write a very specific implementation for each type of architecture. But actually it is quite real.

PHP JIT implementation uses library DynASM (Dynamic Assembler), which maps a set of CPU instructions in a specific format to assembly code for many different types of CPUs. Thus, the Just In Time compiler converts the operating code into machine code for a specific architecture using DynASM.

Although one thought still haunted me …

If preloading is able to parse php code into the operating one before execution, and DynASM can compile the operating code into the machine code (Just In Time compilation), why the hell don’t we compile PHP right away in place using Ahead of Time compilation ?!

One of the thoughts that the podcast episode prompted me was that PHP is weakly typed, that is, often PHP does not know what type the variable is until Zend VM tries to execute a specific opcode.

This can be understood by looking at union type zend_value, which has many pointers to different type representations for a variable. Whenever a Zend virtual machine tries to extract a value from zend_value, it uses macros like ZSTR_VALthat try to access a string pointer from a union of values.

For example, this zend VM handler must process the expression “less than or equal to” (<=). See how it forks into many different code paths to guess the types of operands. Duplication of such type inference logic with machine code is not feasible and could potentially make work even slower.

Final compilation after the types have been evaluated is also not a good option, because compiling to machine code is a laborious CPU task. So compiling EVERYTHING at runtime is a bad idea.

How does the Just In Time compiler behave?

Now we know that we cannot deduce types to generate sufficiently good leading compilation. We also know that compilation at runtime is expensive. How can JIT be useful for PHP?

To balance this equation, JIT PHP only tries to compile a few opcodes, which, in his opinion, are worth it. For this it profiles the operation codes performed by the Zend virtual machine and checks which ones it makes sense to compile. (depending on your configuration).

When a particular opcode is compiled, it then delegates the execution to this compiled code instead of being delegated to the Zend VM. It looks like the diagram below:


PHP interpretation thread with JIT. If they are already compiled, the opcodes are not executed through the Zend VM.

Thus, the Opcache extension has a couple of instructions that determine whether a particular operating code should be compiled or not. If so, then the compiler will convert it to machine code using DynASM and execute this new generated machine code.

Interestingly, since the current implementation has a megabyte limit for compiled code (also customizable), code execution should be able to seamlessly switch between JIT and interpreted code.

By the way this Benoit Jacquemont talk about jit from php VERY helped me figure this out.

I’m still not sure in what specific cases the compilation takes place, but I think that I don’t really want to know yet.

So, probably, your performance gain will not be colossal

I hope that it’s much clearer now, WHY everyone says that most php applications will not get much performance benefits from using the Just In Time compiler. And why recommending Ziva for profiling and experimenting with various JIT configurations for your application is the best way.

Compiled opcodes will usually be distributed among several requests if you use PHP FPM, but this still will not change the rules of the game.

This is because JIT optimizes processor operations, and currently most php applications are more dependent on I / O than on anything else. It doesn’t matter if processing operations are compiled if you still have to access the disk or network. The timings will be very similar.

If only…

You are doing something not related to input / output, for example, image processing or machine learning. Anything other than I / O will benefit from the Just In Time compiler. This is also the reason why people now say that they are more inclined to write native PHP functions written in PHP rather than C. The overhead will not be dramatically different if such functions are compiled anyway.

Interesting time to be a PHP programmer …

I hope this article was useful to you, and you’ve been able to better understand what JIT is in PHP 8. Feel free to contact me on Twitter if you want to add something that I could forget here and don’t forget to share this with your fellow developers, this will undoubtedly add a little benefit to your conversations!-- @nawarian


PHP: static code analyzers



0 Comments

Leave a Reply