Experience in optimizing computations through dynamic JVM bytecode generation

In its small random variable modeling project I faced the problem of low performance in calculating mathematical expressions entered by the user, and for a long time I was looking for different ways to solve it: I tried to write an interpreter in C ++ in the hope that it would be fast, I wrote my own bytecode. The most successful idea turned out to be generating JVM classes and loading them at runtime.

Learning about KMath, I decided to generalize this optimization to a variety of mathematical structures and operators defined on them.

KMath is a library for mathematics and computer algebra that makes extensive use of context-oriented programming in Kotlin. KMath separates mathematical entities (numbers, vectors, matrices) and operations on them – they are supplied as a separate object, algebra corresponding to the type of operands, Algebra

import scientifik.kmath.operations.*

ComplexField {
   (i pow 2) + Complex(1.0, 3.0)
}

Thus, after writing a bytecode generator, taking into account JVM optimizations, you can get fast calculations for any mathematical object – it is enough to define operations on them in Kotlin.

API

To begin with, it was necessary to develop an API of expressions and only then proceed to the grammar of the language and the syntax tree. It also came up with the clever idea of ​​defining algebra over the expressions themselves to present a more intuitive interface.

Base of all expressions API – interface Expressionand the simplest way to implement it is to directly define the method invoke from these parameters or, for example, nested expressions. A similar implementation was integrated into the root module as a reference, albeit the slowest.

interface Expression {
   operator fun invoke(arguments: Map): T
}

More advanced implementations are already based on MST. These include:

  • MST interpreter,
  • MST class generator.

API for parsing MST string expressions are already available on the KMath dev branch as is the more or less final JVM code generator.

Let’s move on to MST. There are currently four kinds of nodes in MST:

  • terminal:
    • symbols (i.e. variables)
    • numbers;
  • unary operations;
  • binary operations.

The first thing you can do with it is bypass and calculate the result from the available data. By passing the operation ID, for example “+”, and arguments, for example, to the target algebra 1.0 and 1.0, we can hope for a result if this operation is defined. Otherwise, when evaluated, the expression will fall with an exception.

To work with MST, in addition to the expression language, there is also algebra – for example, MstField:

import scientifik.kmath.ast.*
import scientifik.kmath.operations.*
import scientifik.kmath.expressions.*

RealField.mstInField { number(1.0) + number(1.0) }() // 2.0

The result of the method above is the implementation Expression, when called, causing traversal of the tree obtained by evaluating the function passed to mstInField

Generating code

But that’s not all: when traversing, we can use the tree parameters as we like and not worry about the order of actions and the arity of operations. This is what is used to generate bytecode.

Code generation in kmath-ast is a parameterized assembly of the JVM class. Input data – MST and target algebra, output – instance Expression

The corresponding class, AsmBuilder, and a few more extension functions provide methods for imperatively assembling bytecode on top of ObjectWeb ASM. They make MST traversal and class assembly look clean and take less than 40 lines of code.

Consider the generated class for the expression “2 * x”, the decompiled Java source code from bytecode is shown:

package scientifik.kmath.asm.generated;

import java.util.Map;
import scientifik.kmath.asm.internal.MapIntrinsics;
import scientifik.kmath.expressions.Expression;
import scientifik.kmath.operations.RealField;

public final class AsmCompiledExpression_1073786867_0 implements Expression {
   private final RealField algebra;

   public final Double invoke(Map arguments) {
       return (Double)this.algebra.add(((Double)MapIntrinsics.getOrFail(arguments, "x")).doubleValue(), 2.0D);
   }

   public AsmCompiledExpression_1073786867_0(RealField algebra) {
       this.algebra = algebra;
   }
}

First the method was generated here invoke, in which the operands were sequentially placed (since they are deeper in the tree), then the call add… After invoke the corresponding bridge method was recorded. Then the field was written algebra and a constructor. In some cases, when the constants cannot be simply put into the pool of class constants, another field is written constants, array java.lang.Object

However, due to the many edge cases and optimizations, the generator implementation is rather complicated.

Optimizing calls to Algebra

To call an operation from algebra, you need to pass its ID and arguments:

RealField { binaryOperation("+", 1.0, 1.0) } // 2.0

However, such a call is expensive in terms of performance: in order to choose which method to call, RealField will execute a relatively expensive instruction tableswitch, and you also need to remember about boxing. Therefore, although all MST operations can be represented in this form, it is better to make a direct call:

RealField { add(1.0, 1.0) } // 2.0

No special convention for mapping operation IDs to methods in implementations Algebra no, so I had to insert a crutch, in which it is manually written that “+”, for example, corresponds to the add method. There is also support for favorable situations when you can find a method for an operation ID with the same name, the required number of arguments and their types.

private val methodNameAdapters: Map, String> by lazy {
    hashMapOf(
        "+" to 2 to "add",
        "*" to 2 to "multiply",
        "/" to 2 to "divide",
...

private fun  AsmBuilder.findSpecific(context: Algebra, name: String, parameterTypes: Array): Method? =
    context.javaClass.methods.find { method ->
...
        nameValid && arityValid && notBridgeInPrimitive && paramsValid
    }

Another major problem is boxing. If we look at the Java method signatures that are obtained after compiling the same RealField, we will see two methods:

public Double add(double a, double b)

// $FF: synthetic method
// $FF: bridge method
public Object add(Object var1, Object var2)

Of course, it’s easier not to bother with boxing and unboxing and call the bridge method: it appeared here because of type erasure, in order to correctly implement the method add (T, T): Twhose type T in the descriptor is actually erased to java.lang.Object

Direct call add from two double is also not perfect because the return value is bauxite (there is a discussion on this in YouTrack Kotlin (KT-29460), but it is better to call it in order to save at best two conversions of the input object types to java.lang.Number and unboxing them in double

It took the longest to solve this problem. The difficulty here lies not in creating calls to the primitive method, but in the fact that you need to combine both primitive types (like double) and their wrappers on the stack (java.lang.Double, for example), and insert boxing in the right places (for example, java.lang.Double.valueOf) and unboxing (doubleValue) – there was absolutely no need to work with instruction types in the bytecode.

I had ideas to hang my typed abstraction on top of the bytecode. To do this, I had to dig deeper into the ObjectWeb ASM API. In the end I turned to the Kotlin / JVM backend, studied the class in detail StackValue (a typed piece of bytecode, which ultimately leads to the receipt of some value on the operand stack), figured out the utility Type, which allows you to conveniently operate with the types available in bytecode (primitives, objects, arrays), and rewrote the generator using it. The problem of whether to box or unbox the value on the stack was solved by itself by adding ArrayDequeholding the types expected by the next call.

  internal fun loadNumeric(value: Number) {
        if (expectationStack.peek() == NUMBER_TYPE) {
            loadNumberConstant(value, true)
            expectationStack.pop()
            typeStack.push(NUMBER_TYPE)
        } else ...?.number(value)?.let { loadTConstant(it) }
            ?: error(...)
    }

conclusions

In the end, I was able to make a code generator using ObjectWeb ASM to evaluate MST expressions in KMath. The performance gain over a simple MST traversal depends on the number of nodes, since the bytecode is linear and does not waste time on node selection and recursion. For example, for an expression with 10 nodes, the time difference between the evaluation with the generated class and the interpreter is 19 to 30%.

After examining the problems I encountered, I made the following conclusions:

  • you need to immediately study the capabilities and utilities of ASM – they greatly simplify development and make the code readable (Type, InstructionAdapter, GeneratorAdapter);
  • no point in wasting time counting MaxStack on its own, if it is not critical for performance, there is a ClassWriter COMPUTE_MAXS and COMPUTE_FRAMES;
  • it is very useful to learn the compiler backends of real languages;
  • you should understand the syntax of descriptors and, in particular, signatures, so as not to make mistakes when using generics;
  • and finally, not in all cases you need to go so deeply – there are more convenient ways to work with classes in runtime, for example, ByteBuddy and cglib

Thanks for reading.

Authors of the article:

Yaroslav Sergeevich Postovalov, MBOU “Lyceum № 130 named. Academician Lavrentiev “, participant of the laboratory of mathematical modeling under the leadership of Voytishek Anton Vatslavovich

Tatiana Abramova, researcher laboratories of methods of nuclear physics experiments in JetBrains Research

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *