Video with transparency on Jetpack Compose – made easy

This article is intended for an advanced reader who is already familiar with Jetpack Compose and Android development in general.

Hello! My name is Vladimir, and I am a mobile developer at Finam. In our practice, we actively use Android Jetpack Compose, which has proven itself to be the best.

In this article I want to show a simple way to solve a well-known problem in Android development – ​​playing a video file with full transparency. Compose does not yet have ready-made components for this, so the developer has to come up with various tricks.

What could be the benefit of this solution? The answer is obvious – any complex animation in an application with a minimum size. For example, a cartoon in a picture to attract attention takes only 370 KB of memory with a frame size of 480×270.

Where did this problem come from? The thing is, not everyone codecs Android supports alpha channel in the frame (potential candidates are H.265, VP8, VP9). There are many manufacturers, but no one guarantees that the file will be played by standard tools as expected. Most often, there is simply no support for transparency at all! And in mobile development, especially on Android, it is very important to get a stable and predictable product on the maximum coverage of client devices.

There are already several articles on this topic on the Internet, and even ready-made working code. I found two main information sources that are worth attention: once And two. Both describe almost the same method. But the first one describes how to do it in XML markup, the second one adapts the first method to Compose.

All methods (including the one proposed in this article) are based on the general principle of restoring the transparency of a video frame using a mask. This means that the video file already included in the application resources must be prepared in a special way. To do this, the main video stream is first divided into two parallel ones – a color (RGB) and an alpha mask. Then both streams in the prepared file are “glued” into one, where each takes up half of the frame.

This is approximately what the prepared video file looks like when played with a regular player.

This is approximately what the prepared video file looks like when played with a regular player.

You can prepare any video file for packaging into application resources using the well-known utility ffmpeg:

ffmpeg -i input_file.mov -vf “split [a]pad=iw*2:ih [b], [a] alphaextract, [b] overlay=w” -c:v libx264 -s 960×270 output_file.mp4

As already described in the sources mentioned above, then to render the animated image, a drawing canvas with an OpenGL context is added to the overall screen layout (GLSurfaceView or TextureView). And also a video player instance, which is passed a link to the resource of the prepared video file for playback. At the same time, a special pixel shader is built into the process of rendering the video stream image, gluing two halves of the frame into one – in the RGBA format (color with transparency). Thus, the picture acquires transparency at the stage of image manipulation in the context of OpenGL, which it copes well with on most Android devices.

The second method mentioned for Compose essentially does the same thing as the first one. But instead of the standard MediaPlayer It is suggested to use ExoPlayerwrapped by a TextureView in a Compose-compatible component AndroidView (from Compose in this case – only an interop wrapper for View).

Less intermediaries, more control

I suggest doing roughly the same thing with a pre-prepared video file, but simplifying the process to two minimum necessary links: a video codec and Compose itself in its pure form without wrappers.

First, let's write our own convenient component for extracting raw data from a video file for subsequent decoding. The external interface of our component will be as follows:

interface VideoDataSource {
    fun getMediaFormat(): MediaFormat
    fun getNextSampleData(): ByteBuffer
}

To implement the component, we will use the standard Android class for extracting data from media containers – Media Extractor. One instance of the implementation class will be responsible for reading one file. To do this, we will add a simple factory:

object VideoDataSourceFactory {

    fun getVideoDataSource(context: Context, uri: Uri): VideoDataSource {
        return VideoDataSourceImpl(context = context, uri = uri)
    }
}

Our component methods:

  • getMediaFormat(): get the MediaFormat structure with a description of the characteristics of the open file – we will need it to configure the codec;

  • getNextSampleData(): read the next portion of raw video stream data (for subsequent transmission to the codec).

Our component class code:

Hidden text
internal class VideoDataSourceImpl(context: Context, uri: Uri) : VideoDataSource {

    private val mediaExtractor = MediaExtractor().apply {
        setDataSource(context, uri, null)
        setVideoTrack()
    }

    private var mediaFormat: MediaFormat? = null

    private var initialSampleTime: Long = 0L

    private val dataBuffer = ByteBuffer
        .allocate(SAMPLE_DATA_BUFFER_SIZE)
        .apply { limit(0) }

    override fun getMediaFormat(): MediaFormat {
        return mediaFormat!!
    }

    override fun getNextSampleData(): ByteBuffer {
        if (!dataBuffer.hasRemaining()) {
            mediaExtractor.readSampleData(dataBuffer, 0)
            if (!mediaExtractor.advance()) {
                mediaExtractor.seekTo(initialSampleTime, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
            }
        }
        return dataBuffer
    }

    private fun MediaExtractor.setVideoTrack() {
        val availableMimeTypes =
        	(0 until trackCount).mapNotNull { getTrackFormat(it).getString(MediaFormat.KEY_MIME) }

        val videoTrackIndex = availableMimeTypes
            .indexOfFirst { it.startsWith("video/") }
            .takeIf { it >= 0 }

        this.selectTrack(requireNotNull(videoTrackIndex))

        mediaFormat = this.getTrackFormat(videoTrackIndex)
        initialSampleTime = this.sampleTime
    }
}

private const val SAMPLE_DATA_BUFFER_SIZE = 100_000

For demonstration purposes, the component infinitely “loops” reading data with a simple condition:

if (!mediaExtractor.advance()) {
            mediaExtractor.seekTo(initialSampleTime, MediaExtractor.SEEK_TO_CLOSEST_SYNC)
}

Next, we need a component for decoding the raw data of the video stream, the interface of which will have only one method:

interface VideoFramesDecoder {
    fun getOutputFramesFlow(inputSampleDataCallback: () -> ByteBuffer): Flow<Bitmap>
}

The only method of the component will return a Flow with decoded images (frames) as a class Bitmapready for rendering. To implement the component, we will use the standard Android class for decoding the video stream – Media Code.

We will also create an instance of the component class through a factory:

object VideoFramesDecoderFactory {

    fun getVideoFramesDecoder(mediaFormat: MediaFormat): VideoFramesDecoder {
        return VideoFramesDecoderImpl(mediaFormat = mediaFormat)
    }
}

Our component class code:

Hidden text
internal class VideoFramesDecoderImpl(private val mediaFormat: MediaFormat) : VideoFramesDecoder {

    private val mimeType = mediaFormat.getString(MediaFormat.KEY_MIME)!!
    private val frameRate = mediaFormat.getInteger(MediaFormat.KEY_FRAME_RATE)

    private val nowMs: Long
        get() = System.currentTimeMillis()

    private val random = Random(nowMs)

    override fun getOutputFramesFlow(inputSampleDataCallback: () -> ByteBuffer): Flow<Bitmap> {
        return channelFlow {
            val threadName = "${this.javaClass.name}_HandlerThread_${random.nextLong()}"
            val handlerThread = HandlerThread(threadName).apply { start() }
            val handler = Handler(handlerThread.looper)

            val decoder = MediaCodec.createDecoderByType(mimeType)

            val frameIntervalMs = (1_000f / frameRate).toLong()
            var nextFrameTimestamp = nowMs

            val callback = object : MediaCodec.Callback() {

                override fun onInputBufferAvailable(codec: MediaCodec, index: Int) {
                    runCatching {
                        val sampleDataBuffer = inputSampleDataCallback()
                        val bytesCopied = sampleDataBuffer.remaining()
                        codec.getInputBuffer(index)?.put(sampleDataBuffer)
                        codec.queueInputBuffer(index, 0, bytesCopied, 0, 0)
                    }
                }

                override fun onOutputBufferAvailable(codec: MediaCodec, index: Int, info: MediaCodec.BufferInfo) {
                    runCatching {
                        codec.getOutputImage(index)?.let { frame ->
                            val bitmap = frame.toBitmap()
                            val diff = (nextFrameTimestamp - nowMs).coerceAtLeast(0L)
                            runBlocking { delay(diff) }
                            trySend(bitmap)
                            nextFrameTimestamp = nowMs + frameIntervalMs
                        }
                        codec.releaseOutputBuffer(index, false)
                    }
                }

                override fun onError(codec: MediaCodec, e: MediaCodec.CodecException) = Unit

                override fun onOutputFormatChanged(codec: MediaCodec, format: MediaFormat) = Unit
            }

        	decoder.apply {
            	    setCallback(callback, handler)
            	    configure(mediaFormat, null, null, 0)
            	    start()
        	}

            awaitClose {
         	     decoder.apply {
                	stop()
                	release()
            	     }
            }
        }.conflate()
    }
}

In the getOutputFramesFlow() method, the class creates and returns ChannelFlowconvenient for working with callback calls, in our case with MediaCodec.Callback().

The codec signals the readiness of the input and output buffers via the onInputBufferAvailable() and onOutputBufferAvailable() callbacks, respectively.

If the next input buffer is ready, we give it a portion of the read raw data returned by the inputSampleDataCallback function. And when the output buffer is ready, we read the image byte array and give it by subscription to all consumers of our Flow data.

Before sending the image to subscribers, we make a delay equal to the interframe interval (in milliseconds, this is 1000/FrameRate). The delay is made in a simple way, through non-suspend blocking of the stream (runBlocking). For the test environment, this is quite sufficient: one separately allocated stream will not consume CPU resources during the waiting period and affect the measurement result.

Then we bring all the components together into one simple Compose widget:

@Composable
fun VideoAnimationWidget(
    @RawRes resourceId: Int,
    modifier: Modifier = Modifier
) {
    val context = LocalContext.current
    var lastFrame by remember { mutableStateOf<Bitmap?>(null) }

    LaunchedEffect(resourceId) {
        withContext(Dispatchers.IO) {
            val videoDataSource = VideoDataSourceFactory.getVideoDataSource(
                context = context,
                uri = context.getUri(resourceId = resourceId)
            )
            val videoFramesDecoder = VideoFramesDecoderFactory.getVideoFramesDecoder(
                mediaFormat = videoDataSource.getMediaFormat()
            )

            videoFramesDecoder
                .getOutputFramesFlow(inputSampleDataCallback = { videoDataSource.getNextSampleData() })
                .collectLatest { lastFrame = it }
        }
    }

    Canvas(modifier = modifier) {
        lastFrame?.let { frame ->
            drawImage(
                image = frame.asImageBitmap(),
                topLeft = Offset(
                    x = (size.width - frame.width) / 2,
                    y = (size.height - frame.height) / 2
                ),
                blendMode = BlendMode.SrcOver
            )
        }
    }
}

In LaunchedEffect, we create a data source and subscribe to Flow, which gives the current frame for rendering. Resource release and file closing occur automatically inside the decoder component (upon unsubscribing from Flow), so we don’t do anything special for this inside the widget. In Canvas, we simply draw the last current frame.

That's it! The minimum set in Compose for video with transparency is ready.

But, admittedly, there are still some details that are probably worth paying attention to. It would be unfair to highlight only the strong points of such a decision without touching on the weak ones.

The standard Android codec in the callback function returns an image in the format YUV_420_888 (Class Image). And to draw on Canvas, it still needs to be somehow converted into RGBA pixels that are understandable to everyone. And at the same time, restore the transparency of each pixel (we prepared our file in advance, dividing the color and alpha components into two halves of the frame).

For this article, I took and adapted one of the ready-made examples of transformation. The function algorithm, in addition to the transformation itself, calculates its transparency at each iteration of obtaining the color of one pixel, optimizing the entire process in one pass.

And these calculations, by the way, will be performed on the CPU, not the device's GPU. Yes, this is the price to pay for flexibility… But more on that later.

The code for extracting the final image in RGBA format immediately with multiplication by the alpha mask:

Hidden text
    private fun Image.getBitmapWithAlpha(buffers: Buffers): ByteArray {
        val yBuffer = this.planes[0].buffer
        yBuffer.get(buffers.yBytes, 0, yBuffer.remaining())

        val uBuffer = this.planes[1].buffer
        uBuffer.get(buffers.uBytes, 0, uBuffer.remaining())

        val vBuffer = this.planes[2].buffer
        vBuffer.get(buffers.vBytes, 0, vBuffer.remaining())

        val yRowStride = this.planes[0].rowStride
        val yPixelStride = this.planes[0].pixelStride

        val uvRowStride = this.planes[1].rowStride
        val uvPixelStride = this.planes[1].pixelStride

        val halfWidth = this.width / 2

        for (y in 0 until this.height) {
            for (x in 0 until halfWidth) {

                val yIndex = y * yRowStride + x * yPixelStride
                val yValue = (buffers.yBytes[yIndex].toInt() and 0xff) - 16

                val uvIndex = (y / 2) * uvRowStride + (x / 2) * uvPixelStride
                val uValue = (buffers.uBytes[uvIndex].toInt() and 0xff) - 128
                val vValue = (buffers.vBytes[uvIndex].toInt() and 0xff) - 128

                val r = 1.164f * yValue + 1.596f * vValue
                val g = 1.164f * yValue - 0.392f * uValue - 0.813f * vValue
                val b = 1.164f * yValue + 2.017f * uValue

                val yAlphaIndex = yIndex + halfWidth * yPixelStride
                val yAlphaValue = (buffers.yBytes[yAlphaIndex].toInt() and 0xff) - 16

                val uvAlphaIndex = uvIndex + this.width * uvPixelStride
                val vAlphaValue = (buffers.vBytes[uvAlphaIndex].toInt() and 0xff) - 128

                val alpha = 1.164f * yAlphaValue + 1.596f * vAlphaValue

                val pixelIndex = x * 4 + y * 4 * halfWidth

                buffers.bitmapBytes[pixelIndex + 0] = (r * alpha / 255f).toInt().coerceIn(0, 255).toByte()
                buffers.bitmapBytes[pixelIndex + 1] = (g * alpha / 255f).toInt().coerceIn(0, 255).toByte()
                buffers.bitmapBytes[pixelIndex + 2] = (b * alpha / 255f).toInt().coerceIn(0, 255).toByte()
                buffers.bitmapBytes[pixelIndex + 3] = alpha.toInt().coerceIn(0, 255).toByte()
            }
        }

        return buffers.bitmapBytes
    }

Performance

Now let's evaluate the applicability of this method by comparing its performance with OpenGL rendering.

To measure the speed of work and resource consumption, I did not supplement the code with super-fashionable benchmarks, warming up the garbage collector and caches of all types. Instead, I chose the simplest approach – the rendering of one video file was launched on the same emulator in two different ways. And the results of the load of the central processor (CPU) and the graphic subsystem (GPU) were recorded in the form of beautiful graphs using standard profiling tools.

Emulator parameters (Android API 34):

Parameters of the PC (laptop) on which the experiments were conducted:

Intel Core i5-12500H, RAM 40 GB, GeForce RTX 3050 4 GB

First measurement (CPU):

CPU Load: Drawing in Compose

CPU Load: Drawing in Compose

CPU load: OpenGL with shader

CPU load: OpenGL with shader

Second measurement (GPU):

GPU Rendering: Drawing in Compose

GPU Rendering: Drawing in Compose

GPU rendering: OpenGL with shader

GPU rendering: OpenGL with shader

As we can see, there are no miracles. The key features of each method have a noticeable impact on productivity.

CPU load is higher for pure Compose rendering, since the main calculations occur in the conversion function of each frame (from YUV_420_888 format to RGBA format). In OpenGL rendering, this is done by the player (codec), tightly coupled with the OpenGL context, and GPU shaders. This removes all the computational load from the CPU.

The GPU diagram shows the same picture: the time it takes to prepare a frame in OpenGL is noticeably longer (red area). Compose spends almost no GPU resources (only on its internal drawing mechanism). I attribute the difference in the orange areas (solid field versus rare bars) to the specifics of both subsystems. This area looks exactly the same for Compose, even if you run the simplest vector animation.

Instead of conclusions

The purpose of this article is to show Jetpack Compose from another good side, but in no way motivate to use it absolutely everywhere. Each tool has its own case.

Rendering with OpenGL (GLSurfaceView, TextureView), in my opinion, is intended for video animation with a single display surface (ideal for a video player and a game application). With an increase in the number of rendering canvases, the load on the GPU (and the CPU too) increases many times. I even managed to “drop” the emulator with a high load (already with 20 simultaneously running OpenGL animations). In this case, the emergency termination of processes occurred not in the application, but in the virtual device itself.

The method proposed in the article may be appropriate in cases where template animation is needed in the plural at one time. For example, when live markers are needed on a map. In this case, only one component with a codec that sends rendering frames to all subscribers via Flow will be enough. At the same time, the load on the device resources will not increase with the number of widgets on the screen.

Source code for self-testing herethere is also a ready one there release-build for quick launch on your Android device.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *