Cooking bytecode in the JVM kitchen

My name is Alexander Kotsyuruba, I lead the development of internal services at DomKlik. Many Java developers with experience come to understand the internal structure of the JVM. To facilitate this journey of the Java Samurai, I decided to lay out the basics of the Java Virtual Machine (JVM) and working with bytecode in a simple language.

What is a mysterious bytecode and where does it live?

I will try to answer this question using the example of pickling.


Why do I need a JVM and bytecode?

JVM arose under the slogan Write Once Run Anywhere (WORA) at Sun Microsystems. In contrast to the concept Write Once Compile Anywhere (WOCA), WORA implies a virtual machine for each OS that runs once compiled code (bytecode).


Write Once Run Anywhere (WORA)


Write Once Compile Anywhere (WOCA)

JVM and bytecode are the basis of the WORA concept and save us from the nuances and the need to compile for each OS.

Bytecode

To understand what a bytecode is, let’s look at an example. Of course, this code does not do anything useful, it will only serve for further analysis.

Source:

class Solenya(val jarForPickles: Any? = Any(), var ingredientsCount: Int = 0) {


    /**
     *  Добавляет ингредиент
     *  @param ingredient - что добавляем
     */
    fun add(ingredient: Any) {
        ingredientsCount = ingredientsCount.inc()
        //какой-то код
    }

    /**
     *  Нагревает банку
     *  @param duration - сколько времени засекать
     */
    fun warmUp(duration: Int) {
        for (x in 1..duration)
            println("Warming")
    }

    init {
        //взять банку под соленья
        val jarForPickles = takeJarForPickles()
        //берем огурцы
        val pickles = Any()
        //берем воду
        val water = Any()

        //смешиваем
        add(pickles)
        add(water)

        //нагреваем
        warmUp(10)
    }

    /**
     *  Взять банку
     */
    private fun takeJarForPickles(): Any = openLocker()

    /**
     *  Открыть шкаф
     */
    private fun openLocker(): Any = takeKeyForLocker()

    /**
     *  Взять ключи под шкафом
     */
    private fun takeKeyForLocker(): Any = {}
}

Using the Intellij IDEA built-in tools (Tools -> Kotlin -> Show Kotlin Bytecode) we get a disassembled bytecode (in the example, only a part is given):

...
   INVOKEVIRTUAL java/io/PrintStream.println (Ljava/lang/Object;)V
   L5
   L6
    LINENUMBER 12 L6
    RETURN
   L7
    LOCALVARIABLE this Lcom/company/Solenya; L0 L7 0
    LOCALVARIABLE ingredient Ljava/lang/Object; L0 L7 1
    LOCALVARIABLE $i$f$add I L1 L7 2
    MAXSTACK = 2
    MAXLOCALS = 5

  // access flags 0x11
  public final warmUp(I)V
    // annotable parameter count: 1 (visible)
    // annotable parameter count: 1 (invisible)
   L0
    LINENUMBER 19 L0
    ICONST_1
    ISTORE 2
...

At first glance – an incomprehensible set of instructions. To understand how and what they work with, you will need to dive into the JVM’s inner kitchen.

JVM Kitchen

Let’s look at the JVM runtime memory:

We can say that JVM is our kitchen. Next, consider the remaining participants:

Method area – Cookbook

The Method area stores the compiled code for each function. When a thread begins to perform a function, in general, it receives instructions from this area. In fact, it is a culinary recipe book that details how to cook everything from scrambled eggs to Catalan zarzuela.

Thread 1..N – Team Cooks

Streams strictly follow the instructions prescribed by them (method area), for this they have PC Register and JVM Stack. You can compare each stream with a cook who performs the assignment given to him, exactly following the recipes from the cookbook.

PC Register – Field Notes

Program Counter Register – the counter of commands of our stream. It stores the address of the instruction being executed. In the kitchen, these would be some notes on which page of the cookbook we are now.

Jvm stack

Stack of frames. A frame is allocated for each function, within which the current thread works with variables and operands. As part of the analogy with the preparation of our pickles, this could be a set of nested operations:

Приготовить соленья -> взять банку -> открыть шкаф -> взять ключи...

Frame – Desktop

The frame acts as the cook’s desktop, on which lies a cutting board and signed containers.

Local variables – Signed containers

This is an array of local variables (local variable table), which, as the name implies, stores the values, type and scope of local variables. This is similar to signed containers, where you can add intermediate results of professional activity.

Operand stack – cutting board

Operand stack stores arguments for JVM instructions. For example, integer values ​​for an addition operation, references to heap objects, etc.

The closest example I can give is a cutting board on which a tomato and a cucumber turn into a salad at one point. Unlike local variables, we put on the board only what we will execute the next instruction with.

Heap – Distribution Table

As part of working with the frame, we operate on links to objects; the objects themselves are stored in heap. An important difference is that the frame belongs to only one thread, and local variables “live” while the frame is alive (the function is executed). And heap is accessible to other streams, and lives until the garbage collector is turned on. By analogy with the kitchen, we can give an example with a distribution table, which alone is common. And it is cleaned by a separate team of cleaners.

JVM kitchen. A look from the inside. Work with Frame

Let’s start with the function warmUp:

    /**
     *  Нагревает банку
     *  @param duration - сколько времени засекать
     */
    fun warmUp(duration: Int) {
        for (x in 1..duration)
            println("Warming...")
    }

Disassembled bytecode function:

  public final warmUp(I)V
    // annotable parameter count: 1 (visible)
    // annotable parameter count: 1 (invisible)
   L0
    LINENUMBER 19 L0
    ICONST_1
    ISTORE 2
    ILOAD 1
    ISTORE 3
    ILOAD 2
    ILOAD 3
    IF_ICMPGT L1
   L2
    LINENUMBER 20 L2
    LDC "Warming..."
    ASTORE 4
   L3
    ICONST_0
    ISTORE 5
   L4
    GETSTATIC java/lang/System.out : Ljava/io/PrintStream;
    ALOAD 4
    INVOKEVIRTUAL java/io/PrintStream.println (Ljava/lang/Object;)V
   L5
   L6
    LINENUMBER 19 L6
    ILOAD 2
    ILOAD 3
    IF_ICMPEQ L1
    IINC 2 1
   L7
    GOTO L2
   L1
    LINENUMBER 21 L1
    RETURN
   L8
    LOCALVARIABLE x I L2 L7 2
    LOCALVARIABLE this Lcom/company/Solenya; L0 L8 0
    LOCALVARIABLE duration I L0 L8 1
    MAXSTACK = 2
    MAXLOCALS = 6

Frame Initialization – Workplace Preparation

To execute this function, a frame will be created in the JVM stack stream. Let me remind you that the stack consists of an array of local variables and operand stack.

  1. So that we can understand how much memory to allocate for this frame, the compiler provided meta-information about this function (explanation in the code comment):
        MAXSTACK = 2 // выделяем стек размером 2*32bit
        MAXLOCALS = 6 // выделяем массив размером 6*32bit
    
  2. We also have information about some elements of the local variable array:
        LOCALVARIABLE x I L2 L7 2 // переменная x типа Int(I), находится в области видимости меток L2-L7 под индексом 2
        LOCALVARIABLE this Lcom/company/Solenya; L0 L8 0
        LOCALVARIABLE duration I L0 L8 1
    
  3. The arguments of the function when initializing the frame fall into local variables. In this example, the duration value will be written to the array with index 1.

Thus, initially the frame will look like this:

Start executing instructions

To understand how the frame works, just arm yourself with a list of JVM instructions (Java bytecode instruction listings) and step through the label L0:

   L0
    LINENUMBER 19 L0 //метаинформация о соответствии строчки исходного кода
    ICONST_1
    ISTORE 2
    ILOAD 1
    ISTORE 3
    ILOAD 2
    ILOAD 3
    IF_ICMPGT L1

ICONST_1 – add 1 (Int) on operand stack:

ISTORE 2 – pull values ​​(with type Int) from the operand stack and write to local variables with index 2:

These two operations can be interpreted in Java code: int x = 1.

ILOAD 1 – load the value from local variables with index 1 in the operand stack:

ISTORE 3 – pull values ​​(with type Int) from the operand stack and write to local variables with index 3:

These two operations can be interpreted in Java code: int var3 = duration.

ILOAD 2 – load the value from local variables with index 2 in the operand stack.

ILOAD 3 – load the value from local variables with index 3 in the operand stack:

IF_ICMPGT L1 – instruction for comparing two integer values ​​from the stack. If the “lower” value is greater than the “upper”, then go to the label L1. After executing this instruction, the stack will become empty.

Here is what these Java bytecode lines would look like:

      int x = 1;
      int var3 = duration;
      if (x > var3) {
         ....L1...

Decompiling code with Intellij IDEA along the way Kotlin -> Java:

   public final void warmUp(int duration) {
      int x = 1;
      int var3 = duration;
      if (x <= duration) {
         while(true) {
            String var4 = "Warming";
            boolean var5 = false;
            System.out.println(var4);
            if (x == var3) {
               break;
            }
            ++x;
         }
      }
   }

Here you can see unused variables (var5) and the lack of a function call println(). Do not worry, this is due to the specifics of compiling inline-functions (println()) and lambda expressions. There will be practically no overhead for the execution of these instructions, moreover, dead code will be deleted thanks to JIT. This is an interesting topic, which should be devoted to a separate article.

Drawing an analogy with a kitchen, this function can be described as a task for a cook to “boil water for 10 minutes”. Further, our professional in his field:

  1. opens a cookbook (method area);
  2. finds instructions on how to boil water (warmUp());
  3. prepares the workplace, allocating a hot plate (operand stack) and containers (local variables) for temporary storage of products.

JVM kitchen. A look from the inside. Work with Heap

Consider the code:

val pickles = Any()

Disassembled bytecode:

    NEW java/lang/Object
    DUP
    INVOKESPECIAL java/lang/Object. ()V
    ASTORE 3

NEW java / lang / Object - memory allocation for the class object Object from heap. The object itself will not be placed on the stack, but a link to it in heap:


Dup - duplication of the "top" element of the stack. One link is needed to initialize the object, the second to save it in local variables:


INVOKESPECIAL java / lang / Object. () V - initialization of the object of the corresponding class (Object) on the link from the stack:


ASTORE 3 - the last step, saving the reference to the object in local variables with index 3.

Drawing an analogy with a kitchen, I would compare the creation of a class object with cooking on a common table (heap). To do this, you need to allocate enough space for yourself on the distribution table, return to the workplace and throw a note with the address (reference) in the appropriate container (local variables). And only after that start creating an object of the class.

JVM kitchen. A look from the inside. Multithreading

Now consider this example:

    fun add(ingredient: Any) {
        ingredientsCount = ingredientsCount.inc()
        //какой-то код
    }

This is a classic example of the threading problem. We have an ingredient count. ingredientsCount. Function addIn addition to adding an ingredient, it increments ingredientsCount.

A disassembled bytecode looks like this:

    ALOAD 0
    ALOAD 0
    GETFIELD com/company/Solenya.ingredientsCount : I
    ICONST_1
    IADD
    PUTFIELD com/company/Solenya.ingredientsCount : I

The state of our operand stack as the instructions execute:

When working in one thread, everything will be executed correctly. If there are several threads, then the following problem may occur. Imagine that both threads simultaneously received a field value ingredientsCount and pushed it onto the stack. Then the operand stack state and fields ingredientsCount might look like this:

The function was executed twice (once by each thread) and the value ingredientsCount should be equal to 2. But in fact, one of the threads worked with an obsolete value ingredientsCount, and therefore the actual result is 1 (Lost Update problem).

The situation is similar to the parallel work of a team of chefs who add spices to the dish. Imagine:

  1. There is a distribution table on which the dish (Heap) lies.
  2. There are two cooks in the kitchen (Thread * 2).
  3. Each cook has their own cutting table, where they prepare a mixture of spices (JVM Stack * 2).
  4. Task: add two servings of spices to the dish.
  5. On the distribution table lies a piece of paper with which they read and on which they write which portion was added (ingredientsCount) And in order to save spices:
    • Before starting the preparation of spices, the cook must read on a piece of paper that the number of spices added is not enough;
    • after adding spices, the cook can write how many, in his opinion, spices are added to the dish.

Under such conditions, a situation may arise:

  1. Cook # 1 read that 3 servings of spices were added.
  2. Cook # 2 read that 3 servings of spices were added.
  3. Both go to their desks and prepare a mixture of spices.
  4. Both chefs add spices (3 + 2) to the dish.
  5. Cook # 1 writes that 4 servings of spices have been added.
  6. Cook # 2 writes that 4 servings of spices have been added.

Bottom line: the products were missing, the dish turned out spicy, etc.

To avoid such situations, there are various tools like locks, thread-safety functions, etc.

To summarize

It is extremely rare for a developer to need to crawl into bytecode, unless this is specific to his work. At the same time, understanding the work of bytecode helps to better understand the multithreading and advantages of a particular language, and also helps to grow professionally.

It is worth noting that these are far from all parts of the JVM. There are many more interesting “things”, for example, constant pool, bytecode verifier, JIT, code cache, etc. But in order not to overload the article, I focused only on those elements that are necessary for a common understanding.

Useful links:

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *