Live face tracking in the browser using TensorFlow.js. Part 6

Activating screen magic with your face in the browser
Activating screen magic with your face in the browser

This is the finale of this series (links to previous parts are at the end of this article), in which we created Snapchat-style filters in the browser, training the AI ​​model to understand facial expressions, and achieved even more using the Tensorflow.js library and face tracking.

Wouldn’t it be great to finish by implementing facial motion detection? Let me show you how to use the key points of the face to determine when we open our mouths and blink our eyes in order to activate the events taking place on the screen.


You can download a demo version of this project. You might need to enable WebGL support in your web browser to get the performance you want. You can also download code and files for this series. It assumes that you are familiar with JavaScript and HTML and have at least a basic understanding of neural networks.

Blinking and mouth opening detection

We’re going to use face cues provided by the face tracking code we developed in the first article of this series, as well as real-time face tracking to detect eye blinking and mouth opening.

The annotated points on the face provide enough information to determine when the eyes are closed and when the mouth is open. The trick is to scale the positions for the relative size in the front view.

To do this, we can refer to the convenient distance between the eyes to approximate the relative scale in the function trackFace:

async function trackFace() {
    ...

    faces.forEach( face => {
        const eyeDist = Math.sqrt(
            ( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
            ( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
            ( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
        );
        const faceScale = eyeDist / 80;
    });

    requestAnimationFrame( trackFace );
}

We can then calculate the distance between the top and bottom of both the left and right eyes and use the value faceScale to assess the moment of crossing the threshold. We can use a similar calculation for mouth opening detection.

Take a look:

async function trackFace() {
    ...

    let areEyesClosed = false, isMouthOpen = false;
    faces.forEach( face => {
        ...

        // Check for eyes closed
        const leftEyesDist = Math.sqrt(
            ( face.annotations.leftEyeLower1[ 4 ][ 0 ] - face.annotations.leftEyeUpper1[ 4 ][ 0 ] ) ** 2 +
            ( face.annotations.leftEyeLower1[ 4 ][ 1 ] - face.annotations.leftEyeUpper1[ 4 ][ 1 ] ) ** 2 +
            ( face.annotations.leftEyeLower1[ 4 ][ 2 ] - face.annotations.leftEyeUpper1[ 4 ][ 2 ] ) ** 2
        );
        const rightEyesDist = Math.sqrt(
            ( face.annotations.rightEyeLower1[ 4 ][ 0 ] - face.annotations.rightEyeUpper1[ 4 ][ 0 ] ) ** 2 +
            ( face.annotations.rightEyeLower1[ 4 ][ 1 ] - face.annotations.rightEyeUpper1[ 4 ][ 1 ] ) ** 2 +
            ( face.annotations.rightEyeLower1[ 4 ][ 2 ] - face.annotations.rightEyeUpper1[ 4 ][ 2 ] ) ** 2
        );
        if( leftEyesDist / faceScale < 23.5 ) {
            areEyesClosed = true;
        }
        if( rightEyesDist / faceScale < 23.5 ) {
            areEyesClosed = true;
        }

        // Check for mouth open
        const lipsDist = Math.sqrt(
            ( face.annotations.lipsLowerInner[ 5 ][ 0 ] - face.annotations.lipsUpperInner[ 5 ][ 0 ] ) ** 2 +
            ( face.annotations.lipsLowerInner[ 5 ][ 1 ] - face.annotations.lipsUpperInner[ 5 ][ 1 ] ) ** 2 +
            ( face.annotations.lipsLowerInner[ 5 ][ 2 ] - face.annotations.lipsUpperInner[ 5 ][ 2 ] ) ** 2
        );
        // Scale to the relative face size
        if( lipsDist / faceScale > 20 ) {
            isMouthOpen = true;
        }
    });

    setText( `Eyes: ${areEyesClosed} Mouth: ${isMouthOpen}` );

    requestAnimationFrame( trackFace );
}

We are now ready to detect some facial movements.

Confetti party time

Every holiday requires confetti, right? We’re going to pair virtual confetti with blinking eyes and opening mouth to create a real party.

For this, we will use an open source JavaScript library called Party-JS… Include it at the top of your page as follows:

<script src="https://cdn.jsdelivr.net/npm/party-js@1.0.0/party.min.js"></script>

Let’s set a global variable, by the state of which we will track the launch of confetti.

let didParty = false;

Last but not least, we can turn on the party animation when we blink or open our mouth.

async function trackFace() {
    ...

    if( !didParty && ( areEyesClosed || isMouthOpen ) ) {
        party.screen();
    }
    didParty = areEyesClosed || isMouthOpen;

    requestAnimationFrame( trackFace );
}

Now it’s party time! Using face and confetti tracking capabilities, you start a party on the screen by the movement of your lips.

This project is not complete without complete code for you to take a look at. Therefore, here it is:

Bed sheet with code
<html>
    <head>
        <title>Tracking Faces in the Browser with TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/party-js@1.0.0/party.min.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <video id="webcam" playsinline style="
            visibility: hidden;
            width: auto;
            height: auto;
            ">
        </video>
        <h1 id="status">Loading...</h1>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( "loadeddata", resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        let output = null;
        let model = null;
        let didParty = false;

        async function trackFace() {
            const video = document.getElementById( "webcam" );
            const faces = await model.estimateFaces( {
                input: video,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                video,
                0, 0, video.width, video.height,
                0, 0, video.width, video.height
            );

            let areEyesClosed = false, isMouthOpen = false;
            faces.forEach( face => {
                const eyeDist = Math.sqrt(
                    ( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
                    ( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
                    ( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
                );
                const faceScale = eyeDist / 80;

                // Check for eyes closed
                const leftEyesDist = Math.sqrt(
                    ( face.annotations.leftEyeLower1[ 4 ][ 0 ] - face.annotations.leftEyeUpper1[ 4 ][ 0 ] ) ** 2 +
                    ( face.annotations.leftEyeLower1[ 4 ][ 1 ] - face.annotations.leftEyeUpper1[ 4 ][ 1 ] ) ** 2 +
                    ( face.annotations.leftEyeLower1[ 4 ][ 2 ] - face.annotations.leftEyeUpper1[ 4 ][ 2 ] ) ** 2
                );
                const rightEyesDist = Math.sqrt(
                    ( face.annotations.rightEyeLower1[ 4 ][ 0 ] - face.annotations.rightEyeUpper1[ 4 ][ 0 ] ) ** 2 +
                    ( face.annotations.rightEyeLower1[ 4 ][ 1 ] - face.annotations.rightEyeUpper1[ 4 ][ 1 ] ) ** 2 +
                    ( face.annotations.rightEyeLower1[ 4 ][ 2 ] - face.annotations.rightEyeUpper1[ 4 ][ 2 ] ) ** 2
                );
                if( leftEyesDist / faceScale < 23.5 ) {
                    areEyesClosed = true;
                }
                if( rightEyesDist / faceScale < 23.5 ) {
                    areEyesClosed = true;
                }

                // Check for mouth open
                const lipsDist = Math.sqrt(
                    ( face.annotations.lipsLowerInner[ 5 ][ 0 ] - face.annotations.lipsUpperInner[ 5 ][ 0 ] ) ** 2 +
                    ( face.annotations.lipsLowerInner[ 5 ][ 1 ] - face.annotations.lipsUpperInner[ 5 ][ 1 ] ) ** 2 +
                    ( face.annotations.lipsLowerInner[ 5 ][ 2 ] - face.annotations.lipsUpperInner[ 5 ][ 2 ] ) ** 2
                );
                // Scale to the relative face size
                if( lipsDist / faceScale > 20 ) {
                    isMouthOpen = true;
                }
            });

            if( !didParty && ( areEyesClosed || isMouthOpen ) ) {
                party.screen();
            }
            didParty = areEyesClosed || isMouthOpen;

            setText( `Eyes: ${areEyesClosed} Mouth: ${isMouthOpen}` );

            requestAnimationFrame( trackFace );
        }

        (async () => {
            await setupWebcam();
            const video = document.getElementById( "webcam" );
            video.play();
            let videoWidth = video.videoWidth;
            let videoHeight = video.videoHeight;
            video.width = videoWidth;
            video.height = videoHeight;

            let canvas = document.getElementById( "output" );
            canvas.width = video.width;
            canvas.height = video.height;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

What’s next?

Actually, that’s all for now. In this series, we’ve learned how to apply AI to faces to track them in real time, as well as detect facial emotions and mouth and eye movements. We even created our own game with augmented reality and virtual glasses from scratch, and it all works in a web browser.

While we have chosen some fun examples to apply, there are many business applications for this technology as well. Imagine a glasses retailer who wants to let their website visitors choose glasses by trying them on. It is not hard to imagine how you will use the knowledge gained in this article series to create the functionality you want. Hopefully you now have the tools to create more useful AI and TensorFlow.js solutions.

Try to implement confetti in a virtual glasses project. Check if you can apply emotion detection to your photo album.

And if these series of articles inspire you to create even cooler projects, share them in the comments! We will be glad to know about your projects.

Good luck and fun programming!

  • Tracking faces in real time in the browser. Part 1

  • Tracking faces in real time in the browser. Part 2

  • Tracking faces in real time in the browser. Part 3

  • Tracking faces in real time in the browser. Part 4

  • Tracking faces in real time in the browser. Part 5

Find out the detailshow to get a Level Up in skills and salary or an in-demand profession from scratch by taking SkillFactory online courses with a 40% discount and a promotional code HABR, which will give another + 10% discount on training.

Other professions and courses

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *