Android and 3D camera. Face recognition with Fraud protection

Hello! My name is Vladimir Shalkov, I am an Android developer at Surf.
Not so long ago, we needed to implement a face recognition system on Android with fraud protection. In this article, I will share the most interesting aspects of the implementation with code examples and links. I am sure you will find something new and interesting for yourself, so sit back, start.

Face recognition systems are now becoming more and more popular: the number of devices with face unlock feature is growing, as well as the number of tools for developers.

Apple uses in its products FaceidIn addition, they took care of the developers and added an API to access this functionality. FaceID is considered quite safe and can be used to unlock banking applications. Android SDK, until recently, did not have a ready-made solution. Although device manufacturers added the ability to unlock the device with their faces in their firmware, developers could not use the functionality in applications, and the security of this method of unlocking left much to be desired.

Recently, class Fingerprintmanager, which was used to unlock applications by fingerprint, was assigned API 28 and higher, and developers are encouraged to use Biometricpromt. This class contains logic related to biometrics, including identification of persons. However, using it in every smartphone will not work, because according to information from googleThe device must have a high security rating.

Some devices do not have a built-in fingerprint scanner, they refused it because of the high level of protection against fraud in face recognition and all thanks to the front ToF (Time-of-flight) to the sensor. Using it, you can build a depth map, thereby increasing the system’s resistance to cracking.


The application that we implemented, by its functionality, is an access control system, where a person is a way of identifying a person. Using special algorithms, the affiliation of a person to a real person is checked. You can add a new user to the database directly from the device by taking a picture and specifying a name. If it is necessary to determine the presence of a person in the database, then the search is carried out using a photograph taken in real time from the device. Algorithms determine the similarity with persons from the database, if this is found, information about this person is issued.

Our main goal was to ensure the maximum level of security: it was necessary to minimize the possibility of bypassing the face recognition system, for example, using a photograph that was brought to the viewfinder. For this, we decided to use a 3D camera Intel RealSense (model D435i), which has a built-in ToF sensor, thanks to it you can get all the necessary data to build a depth map.


As a working device, we needed to use a tablet with a large screen diagonal, which did not have a built-in battery and required a constant connection to the mains.

Another no less important limitation is offline work. Because of this, we could not use cloud services for face recognition. In addition, it is unreasonable to write face recognition algorithms from scratch, taking into account time limitations and labor costs. The question arises: why reinvent the wheel, if there are already ready-made solutions? Based on the foregoing, we decided to use the library 3DiVi Face SDK.

Getting Image from Intel RealSense Camera

At the first stage of implementation, it was necessary to obtain two images from a 3D camera: one color, the second with a depth map. Then they will be used by the Face SDK library for further calculations.

To start working with the Intel RealSense camera in an Android project, you need to add a dependency RealSense SDK for Android OS: it is a wrapper over the official C ++ library. In the official samples you can find how to initialize and display a picture from the cameras, we will not stop there, everything is quite simple there. Let’s go straight to the image acquisition code:

private val pipeline = Pipeline()
private val streamingHandler = Handler()

private var streamRunnable: Runnable = object : Runnable {
    override fun run() {
        try {
            FrameReleaser().use { fr ->
                val frames = pipeline.waitForFrames(1000).releaseWith(fr)

                val orgFrameSet = frames.releaseWith(fr)
                val processedFrameSet = frames.applyFilter(align).releaseWith(fr)

                val orgFrame: Frame = orgFrameSet.first(StreamType.COLOR, StreamFormat.RGB8).releaseWith(fr)
                // Получаем фрейм цветного изображения
                val videoFrame: VideoFrame = orgFrame.`as`(Extension.VIDEO_FRAME)

                val processedDepth: Frame = processedFrameSet.first(StreamType.DEPTH, StreamFormat.Z16).releaseWith(fr)
                // Получаем фрейм глубины изображения
                val depthFrame: DepthFrame = processedDepth.`as`(Extension.DEPTH_FRAME)

                upload(orgFrame) // Выводим на экран цветное изображение
        } catch (e: Exception) {
            Logger.d("Streaming, error: " + e.message)
} // Запуск

Using FrameReleaser () we get individual frames from the video stream that are of type Frame. You can apply various filters to frames through applyFilter ().
To get a frame of the desired format, the frame must be converted to the appropriate type. In our case, the first one is of type VideoFrame, the second is DepthFrame.
If we want to display a picture on the device’s screen, then for this there is the upload () method, in the parameters the type of frame to be displayed on the screen is indicated, we have frames from a color camera.

Convert frames to image

The next step is to get the images in the format we need from the VideoFrame and DepthFrame. We will use these pictures to determine whether the person in the image belongs to a real person and add information to the database.
Image Format:

  • Color, with the extension .bmp obtained from VideoFrame
  • With a depth map, having the extension .tiff and obtained from DepthFrame

To implement the conversion, we need an OpenCV open-source computer vision library. All work consists in forming the Mat object and converting it to the desired format:

fun videoFrameToMat(videoFrame: VideoFrame): Mat {
    val colorMat = Mat(videoFrame.height, videoFrame.width, CvType.CV_8UC3)
    val returnBuff = ByteArray(videoFrame.dataSize)
    colorMat.put(0, 0, returnBuff)

    val colorMatNew = Mat()
    Imgproc.cvtColor(colorMat, colorMatNew, Imgproc.COLOR_RGB2BGR)
    return colorMatNew

To save the color image, it is necessary to form a matrix with the type CvType.CV_8UC3, then convert it to BRG, so that the colors have a normal hue. Using the Imgcodecs.imwrite method, save to the device:

fun VideoFrame.saveToFile(path: String): Boolean {
    val colorMat = videoFrameToMat(this)
    return Imgcodecs.imwrite(path + COLOR_IMAGE_FORMAT, colorMat)

The same thing needs to be done for DepthFrame with the only difference that the matrix must be of type CvType.CV_16UC1, since the image will be built from a frame that contains data from the depth sensor:

fun depthFrameToMat(depthFrame: DepthFrame): Mat {
    val depthMat = Mat(depthFrame.height, depthFrame.width, CvType.CV_16UC1)
    val size = ( * depthMat.elemSize()).toInt()
    val returnBuff = ByteArray(size)
    val shorts = ShortArray(size / 2)
    depthMat.put(0, 0, shorts)
    return depthMat

Saving image with depth map:

fun DepthFrame.saveToFile(path: String): Boolean {
    val depthMat = depthFrameToMat(this)
    return Imgcodecs.imwrite(path + DEPTH_IMAGE_FORMAT, depthMat)

Working with the Face SDK

Face SDK has a large volume of software components, but most of them we do not need. The library, like the RealSense SDK, is written in C ++ and has a wrapper to make it convenient to work under Android. Face SDK is not free, but if you are a developer, you will be given a test license.

Most library components are configured using XML configuration files. Depending on the configuration, this or that algorithm will be applied.
To start working, you need to create an instance of the FacerecService class, it is used to initialize other components, the path to the DLLs, configuration files and licenses is passed in the parameters.

Next, using this service, you need to create objects of the FacerecService.Config and Capturer classes:

private val service: FacerecService = FacerecService.createService(
private val confManual: FacerecService.Config = service.Config("manual_capturer.xml")
private val capturerManual: Capturer = service.createCapturer(confManual)

The Capturer class is used for face recognition. The manual_capturer.xml configuration means that we will use the algorithms from the OpenCV library – this is the Viola-Jones frontal face detector; Haar features are used for recognition. The library provides a ready-made set of XML files with configurations that differ in terms of recognition quality and runtime. Slower methods have better recognition performance. If we need to recognize faces in a profile, then we should use another XML configuration file – common_lprofile_capturer.xml. There are a lot of configs, you can read more about them in documentation. In our case, it was necessary to use the common_capturer4_singleface.xml config – this is a configuration with a lowered quality threshold as a result of which no more than one person will always be returned.

To find the face in the image, the capturerSingleFace.capture () method is used, into which the byte array of the image that contains the person’s face is transferred:

fun createRawSample(imagePath: String): RawSample? {
    val imageColorFile = File(imagePath)
    val originalColorByteArray = ImageUtil.readImage(imageColorFile)
    return capturerSingleFace.capture(originalColorByteArray).getOrNull(0)

The RawSample object stores information about the found face and contains a set of different methods, for example, if you call getLandmarks (), you can get anthropometric points faces.

Affiliation of a person to a real person

To determine whether a real person is in the frame, and not a photograph attached to the face detection camera, the Face SDK library provides the DepthLivenessEstimator module, it returns an enum with one of four values:

  • NOT_ENOUGH_DATA – Too many missing values ​​on the depth map
  • REAL – the observed person belongs to a living person.
  • FAKE – The observed person is a photograph.
  • NOT_COMPUTED – calculation failed

Module initialization:

val depthLivenessEstimator: DepthLivenessEstimator = service.createDepthLivenessEstimator(

Determination of belonging of a person to a real person

fun getLivenessState(
            rgbPath: String,
            depthPath: String
    ): DepthLivenessEstimator.Liveness {
val imageColorFile = File(rgbPath + COLOR_IMAGE_FORMAT)
        val originalColorByteArray = readImage(imageColorFile)

        val originalRawSimple = capturerSingleFace.capture(originalColorByteArray).getOrNull(0)
        val originalRawImage = RawImage(

        val originalDepthPtr = Natives().readDepthMap(depthPath + DEPTH_IMAGE_FORMAT)

// параметры камеры
        val hFov = 69.4f 
        val vFov = 42.5f 

        val depthMapRaw = DepthMapRaw()
        with(depthMapRaw) {
            depth_map_rows = originalRawImage.height
            depth_map_cols = originalRawImage.width
            depth_map_2_image_offset_x = 0f
            depth_map_2_image_offset_y = 0f
            depth_map_2_image_scale_x = 1f
            depth_map_2_image_scale_y = 1f
            horizontal_fov = hFov
            vertical_fov = vFov
            depth_unit_in_millimeters = 1f
            depth_data_ptr = originalDepthPtr
            depth_data_stride_in_bytes = (2 * originalRawImage.width)
        return depthLivenessEstimator.estimateLiveness(originalRawSimple, depthMapRaw)

The getLivenessState () method as parameters gets links to images: color and with a depth map. We form the RawImage object from the colored one. This class provides raw image data and optional cropping information. DepthMapRaw is formed from the depth map – a depth map registered in accordance with the original color image. This must be done in order to use the estimateLiveness method (originalRawSimple, depthMapRaw), which will return us an enum with information whether the real person was in the frame.

It is worth paying attention to the formation of the DepthMapRaw object. One of the variables is called depth_data_ptr – this is a pointer to depth data, but as you know in Java there are no pointers. To get the pointer you need to use Jni function, which takes as an argument a link to an image with a depth map:

extern "C" JNIEXPORT jlong JNICALL Java_ru_face_detect_Natives_readDepthMap
(JNIEnv *env, jobject obj, jstring jfilename)
    const char * buf = env->GetStringUTFChars(jfilename, NULL);
    std::string filename = buf;
    env->ReleaseStringUTFChars(jfilename, buf);

    cv::Mat depth_map = cv::imread(filename, -1);
    unsigned char * data = new unsigned char[depth_map.rows * depth_map.cols * depth_map.elemSize()];
    memcpy(data,, depth_map.rows * depth_map.cols * depth_map.elemSize());

    return (jlong) data;

To call code written in C in Kotlin, you need to create a class of this type:

class Natives {

    init {

    external fun readDepthMap(fileName: String): Long

The name of the .cpp file is passed to System.loadLibrary (), where the readDepthMap () method is contained, in our case it is native-lib.cpp. It is also necessary to install an external modifier, which means that the method is not implemented in Kotlin.

Face identification

An equally important function is the identification of the person found in the frame. Face SDK allows you to implement this using the Recognizer module. Initialization:

val recognizer: Recognizer = service.createRecognizer(

We use the configuration file method8v7_recognizer.xml, which has the highest recognition rate, but the recognition quality is lower than that of methods 6v7 and 7v7.

Before identifying a person, it is necessary to create a list of persons, using which we will find correspondence according to the sample photograph. To implement, you need to create a Vector from the Template objects:

var templates = Vector