Performance of Kotlin Android Applications vs. C++

Hi all! In the process of studying, I became interested in Android development, as part of one of the tasks it is necessary to conduct a study. I chose a topic that has long piqued my curiosity, namely the performance of Kotlin code in comparison with C ++.

A search on the Russian-speaking Internet did not give almost anything, everything, one way or another, comes down to mixing, primitive types in a cycle – bubble sort and other classical algorithms. In general, such experiments lead to the conclusion that using JNI is too expensive and Java is faster.

A search among scientific and student papers on the English-language Internet yielded several interesting articles [1], [2],[3],[4] – image processing, database access, fast Fourier transform, etc. However, in most works we are talking about devices with Android 2.0 – 7.0 OS.

IN manuals from Google, too, could not find specific conditions for when to use the NDK, dry

Squeeze extra performance out of a device to achieve low latency or run
computationally intensive applications, such as games or physics simulations.

Reuse your own or other developers’ C or C++ libraries.

“use the NDK if you want to squeeze the maximum performance or use old libraries” somehow does not give a clear understanding of the issue.

In general, it was decided to conduct our own experiment with blackjack and NDK.

Features of program execution

There is a wonderful article on the topic of the Android device on Habré, I don’t pretend to be the truth, so I’m a June June and in general programming has always been a hobby for me, but I decided to take it more seriously already in adulthood, and there’s nowhere to rush anyway, I started by getting education, but still briefly outline how I understood the question.

Virtual machine – translates java bytecode into instructions for execution by the processor.

In the Android operating system up to version 5.0, the Dalvik virtual machine was used, the bytecode compilation took place dynamically during the program execution (Just In Time compilation – JIT). Dalvik was specifically designed with limited hardware in mind.

With the release of Android 5.0 in 2014, Dalvik was replaced by Android Runtime (ART), a virtual machine with increased performance [3]. ART uses Ahead of Time (AOT) compilation – the application is compiled for a specific device during installation or when the device is idle (for example, when the phone is charging). In Android 7.0, a JIT compiler with code profiling was added to ART, which made it possible to find bottlenecks in programs and optimize their execution.

The Dalvik and ART virtual machines are implemented as low-level C/C++ applications compiled for the target platform.

Executing C++ Programs

Java Native Interface (JNI) is a programming interface that allows Java code that runs inside the Java Virtual Machine (VM) to interact with applications and libraries written in other programming languages ​​such as C, C++. With JNI, you can call C/C++ functions from Java like normal Java methods, passing primes or Java objects as parameters and receiving them as return values.[3]. To run native code, Android provides the NativeDevelopment Kit (NDK).

Algorithm selection

Armed with knowledge about Dalvik and ART machines, JIT and Ahead of Time compilation, the only thing left to do is to check everything on real devices.
When choosing an algorithm for the experiment, I was guided by simple criteria:

  1. The algorithm should run for a long time – the longer, the better, to catch the difference in microseconds, when the measurement error exceeds the algorithm’s running time, such a solution.

  2. Calculations must be performed by the processor, without the explicit use of pixel shaders and a graphics accelerator.

  3. The algorithm should not write to disk or access the database, since a large number of unpredictable factors can affect the running time of such an algorithm: the operation of other applications, the battery level, the availability of free disk space, fragmentation of database files, etc.

  4. During the execution of the algorithm, user-defined data types must be used. Using classical algorithms with primitive types (sorting an array of numbers, etc.) can give unexpected results, since such algorithms can be changed beyond recognition by modern compilers during compilation.

Regarding point 4, it is necessary to add – my idea was to check how the Java machine with objects actually behaves. The Java machine creates objects on the “heap” (heap) and releases memory using the garbage collector. What will happen if I do not need to sort the Int array, but I need to process classes, and even perform floating point calculations.

In search of a suitable algorithm, I remembered the Mandelbrot set and a pretty fractal, at one time I wrote labs on OpenGL – I remember that calculating fractals on a processor quite heavily loaded a PC, which means it should definitely take some time on a mobile phone. Yes, and complex numbers can not be attributed to primitive data types.

Application Implementation

During the development of the application, I, to the best of my ability, tried to follow the recommendations of Google:

  • simple interface on JetPack Compose

  • ViewModel and StateHoder – so that changing the configuration (screen rotation) does not become a hindrance

  • calculations are run in a separate thread using withContext(Dispatchers.Default), since the user interface runs in the main thread and you cannot run “heavy” processes in it

  • data exchange between JNI and JVM is minimized – no JNI method calls in a loop, we prepare all data in C ++, copy it as an array with one JNI call, then we collect what we need on the JVM side from an array

To calculate a set, it is necessary to implement the ability to perform calculations with complex numbers: addition, squaring, and calculating the square of the modulus. For these purposes, the Complex class was implemented, the class contains: an overload of the addition operator, the square method returns the square of a complex number, the sqr_abs method returns the square of the modulus of a complex number.

The calculations themselves take place in the calc method of the Mandelbrot class.

    // подготовить vector так как зранее неизвестно количество точек для закрашивания

std::pair<size_t,double*> Mandelbrot::calc(const double start_x,const double end_x,const double start_y,const double end_y,unsigned int max_iter){
    // рассчет шага
    const  int HD_WIDTH=1920;
    const  int HD_HEIGHT=1280;
    double step_x=(end_x-start_x)/HD_WIDTH;
    double step_y=(end_y-start_y)/HD_HEIGHT;

    int iterNumb=0;
    // пройти все точки с нужным шагом
    for (double x=start_x;x<=end_x;x+=step_x){
        for (double y=start_y;y<=end_y;y+=step_y){
            Complex cn; // new complex 0
            unsigned int i=0;
            while (i<max_iter && sqr_abs(cn)<=4){
            if (sqr_abs(cn)<=4){
    __android_log_print(ANDROID_LOG_VERBOSE, "Native VS Kotlin","Iteration number %d",iterNumb);

    return std::make_pair(points.size(),;

For each point (out of 1920×1280) a cycle limited by max_iter=200 iterations will be launched, if at the end of the iterations it turns out that the point belongs to the set, its coordinates will be sent to the vector.

I did not try to optimize the code as much as possible, I paid more attention to ensuring that the implementation in Kotlin was minimally different from the implementation in C ++.

class Mandelbrot {

    // подготовить vector так как зранее неизвестно количество точек для закрашивания
    private var points: ArrayList<Double> = ArrayList(300000);

     fun calc(
        startX : Double,
        endX : Double,
        startY : Double,
        endY : Double,
        maxIter : Int
        var iterNumb:Int=0; //total iterations
        val stepX:Double
        val stepY:Double

        stepX= (endX-startX)/HD_WIDTH
        // пройти все точки с нужным шагом
        var x=startX
            var y=startY
            while (y<=endY){
                var cn=Complex() //New complex 0
                var i:Int=0
                while (i<maxIter && sqrAbs(cn)<4){
                        cn = square(cn)+Complex(x,y)
                if (sqrAbs(cn)<=4){

        Log.d("Native VS Kotlin","Kotlin iterations number =$iterNumb")
        //return Pair(points.size,points)
      return points

The C++ algorithm, after calculating all the points, stores them in a C++ vector, the Java machine cannot work with it directly. There are ways to create a Java object directly from native code, that is, it is possible to add data directly to a Java array during iteration, but calling any JNI method has a high cost. Calling JNI methods for each point in a loop would defeat the whole point of using native code. Therefore, after the execution of the algorithm is completed, the JNI functions will be called

  • NewDoubleArray – Creates a primitive Java array

  • SetDoubleArrayRegion – copies a region of memory of a given length to an array. The C++ standard ensures that the data in a vector is stored sequentially, which makes it possible to copy it directly from memory.

Thus, after executing the native code, the Java machine will receive a Double array, its conversion to ArrayList using asList() takes minimal (constant) time.

PNG files as a result of calculations

After the data is ready, the button “Create PNG” will become available to the user, after clicking it, the buildPNG function will be called (png.kt file)
This function generates images “line by line” using the library pngj, all points of the set will be colored blue, the rest – white. The finished images will be saved to disk and displayed on the screen.
The function plays an auxiliary role, the images are formed for a visual comparison of the obtained sets, so the execution time of the function is not taken into account.

Test results

Four devices were used to measure the execution time of the algorithm.

  1. HUAWEI Y6 2019 – OS Android 9.0 , Processor MT6761, 2000 MHz

  2. Realme C15 – Android 11 OS, Mediatek MT6765G Helio G35 processor, 2300 Mhz

  3. Meizu Note 8 M822H – Android 8.1 OS, Qualcomm Snapdragon 632 processor, 1800 MHz

  4. Realme 10 – Android 13 OS, MediaTek Helio G99 2200MHz (556) 15c 1820ms processor

  • All devices were fully charged.

  • Warm-up runs were performed for each device

  • For each device, ten separate launches of each algorithm were performed in flight mode and another ten in normal mode.

At this stage, it became obvious that the execution speed of Kotlin and C++ is different.



Kotlin, ms

C++, ms

Huawei Y6 2019




Realme C15




Meizu Note 8




Realme 10




Time was measured using measureTimeMillis.

APK generated in Build mode

While the algorithm is running on Kotlin in Logcat, you can observe an interesting picture:

If I understand correctly, this is the work of the “garbage collector”. Otherwise, Logcat swears only that the application does not save battery.

In general, we can say with confidence that there are cases when the performance of native code colossally exceeds the performance of the JVM and, despite some complication of the project, it makes sense to implement with the help of the NDK those program modules that require complex calculations, especially if it is necessary to perform calculations not with primitive ones. types, but with objects.

Source code is available at Github

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *