Live face tracking in the browser using TensorFlow.js. Part 1

Apps like Snapchat offer an amazing variety of face and lens filters that let you add cool effects to your photos and videos. If you’ve ever gifted yourself with virtual dog ears or a party hat, you know how much fun it can be!

Have you ever thought about the possibility of creating such filters from scratch? Well, now you have the opportunity to learn everything using just your web browser! In this article series, we’ll learn how to create Snapchat-style filters in the browser, train an AI model to understand facial expressions, and do even more with the Tensorflow.js library and face tracking.


You can download a demo version of this project. You might need to enable WebGL support in your web browser to get the performance you want. You can also download code and files for this series. It assumes that you are familiar with JavaScript and HTML and have at least a basic understanding of neural networks.

The first step to creating a face filter from scratch is to detect and localize faces in images. Therefore, we can start from here.

Faces can be tracked using the TensorFlow.js library and model face landmarks detection, which in a couple of milliseconds can give us 486 different key points in space for each face in an image or video frame. What’s especially great is that this model can work within a web page, so you can also track faces on mobile using the same code.

Let’s set up a project to load the model and start face tracking on the webcam video feed.

The starting point

Here’s the initial web page template we’ll be using for face tracking.

This template contains:

  • TensorFlow.js libraries required for this project;

  • the index of the reference mesh for the face, specified in the triangles.js file (included in the project code);

  • a canvas element for rendered output;

  • hidden video element for webcam;

  • status text element and utility function setText;

  • service functions drawLine and drawTriangle for canvas elements.

<html>
    <head>
        <title>Real-Time Face Tracking in the Browser with TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
        <script src="web/triangles.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <video id="webcam" playsinline style="
            visibility: hidden;
            width: auto;
            height: auto;
            ">
        </video>
        <h1 id="status">Loading...</h1>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        function drawLine( ctx, x1, y1, x2, y2 ) {
            ctx.beginPath();
            ctx.moveTo( x1, y1 );
            ctx.lineTo( x2, y2 );
            ctx.stroke();
        }

        function drawTriangle( ctx, x1, y1, x2, y2, x3, y3 ) {
            ctx.beginPath();
            ctx.moveTo( x1, y1 );
            ctx.lineTo( x2, y2 );
            ctx.lineTo( x3, y3 );
            ctx.lineTo( x1, y1 );
            ctx.stroke();
        }

        (async () => {
            // TODO: Add code here
        })();
        </script>
    </body>
</html>

Using the Webcam API in HTML5 with TensorFlow.js

In JavaScript, it’s pretty easy to start a webcam if you have a snippet of code for it. Here is the utility function to launch the webcam and request access from the user:

async function setupWebcam() {
    return new Promise( ( resolve, reject ) => {
        const webcamElement = document.getElementById( "webcam" );
        const navigatorAny = navigator;
        navigator.getUserMedia = navigator.getUserMedia ||
        navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
        navigatorAny.msGetUserMedia;
        if( navigator.getUserMedia ) {
            navigator.getUserMedia( { video: true },
                stream => {
                    webcamElement.srcObject = stream;
                    webcamElement.addEventListener( "loadeddata", resolve, false );
                },
            error => reject());
        }
        else {
            reject();
        }
    });
}

We can call the function setupWebcam in the async block at the bottom of our code and make it play the webcam video after loading.

(async () => {
    await setupWebcam();
    const video = document.getElementById( "webcam" );
    video.play();
})();

Next, let’s set up the output canvas and prepare to draw lines and triangles for the bounding box and the wireframe of the face.

The canvas context will be used to render the face tracking results, so we can save them globally outside of the async block. Note that we have flipped the webcam horizontally for a more natural behavior like a real mirror.

let output = null;

(async () => {
    await setupWebcam();
    const video = document.getElementById( "webcam" );
    video.play();
    let videoWidth = video.videoWidth;
    let videoHeight = video.videoHeight;
    video.width = videoWidth;
    video.height = videoHeight;

    let canvas = document.getElementById( "output" );
    canvas.width = video.width;
    canvas.height = video.height;

    output = canvas.getContext( "2d" );
    output.translate( canvas.width, 0 );
    output.scale( -1, 1 ); // Mirror cam
    output.fillStyle = "#fdffb6";
    output.strokeStyle = "#fdffb6";
    output.lineWidth = 2;
})();

Let’s trace some faces

We are now ready! All we need this is to load the TensorFlow face landmark detection model and apply it to our webcam frames to show the results.

First, we need a global model variable to store the loaded model:

let model = null;

We can then load the model at the end of the async block and set the status text to indicate that our application is ready for face tracking:

// Load Face Landmarks Detection
model = await faceLandmarksDetection.load(
    faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
);

setText( "Loaded!" );

Now let’s create a function trackFacethat accepts webcam video frames, applies the face tracking model, copies the webcam image to the output canvas, and then draws a bounding box around the face and wireframe mesh triangles over the face.

async function trackFace() {
    const video = document.getElementById( "webcam" );
    const faces = await model.estimateFaces( {
        input: video,
        returnTensors: false,
        flipHorizontal: false,
    });
    output.drawImage(
        video,
        0, 0, video.width, video.height,
        0, 0, video.width, video.height
    );

    faces.forEach( face => {
        setText( `Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )}` );

        // Draw the bounding box
        const x1 = face.boundingBox.topLeft[ 0 ];
        const y1 = face.boundingBox.topLeft[ 1 ];
        const x2 = face.boundingBox.bottomRight[ 0 ];
        const y2 = face.boundingBox.bottomRight[ 1 ];
        const bWidth = x2 - x1;
        const bHeight = y2 - y1;
        drawLine( output, x1, y1, x2, y1 );
        drawLine( output, x2, y1, x2, y2 );
        drawLine( output, x1, y2, x2, y2 );
        drawLine( output, x1, y1, x1, y2 );

        // Draw the face mesh
        const keypoints = face.scaledMesh;
        for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
            let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
            let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
            let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
            drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ] );
        }
    });

    requestAnimationFrame( trackFace );
}

Finally, we can start the first frame for tracking by calling this function at the end of our async block:

(async () => {
    ...

    trackFace();
})();

Home stretch

The complete code should look like this:

<html>
    <head>
        <title>Real-Time Face Tracking in the Browser with TensorFlow.js</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
        <script src="web/triangles.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <video id="webcam" playsinline style="
            visibility: hidden;
            width: auto;
            height: auto;
            ">
        </video>
        <h1 id="status">Loading...</h1>
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        function drawLine( ctx, x1, y1, x2, y2 ) {
            ctx.beginPath();
            ctx.moveTo( x1, y1 );
            ctx.lineTo( x2, y2 );
            ctx.stroke();
        }

        function drawTriangle( ctx, x1, y1, x2, y2, x3, y3 ) {
            ctx.beginPath();
            ctx.moveTo( x1, y1 );
            ctx.lineTo( x2, y2 );
            ctx.lineTo( x3, y3 );
            ctx.lineTo( x1, y1 );
            ctx.stroke();
        }

        let output = null;
        let model = null;

        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( "loadeddata", resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        async function trackFace() {
            const video = document.getElementById( "webcam" );
            const faces = await model.estimateFaces( {
                input: video,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                video,
                0, 0, video.width, video.height,
                0, 0, video.width, video.height
            );

            faces.forEach( face => {
                setText( `Face Tracking Confidence: ${face.faceInViewConfidence.toFixed( 3 )}` );

                // Draw the bounding box
                const x1 = face.boundingBox.topLeft[ 0 ];
                const y1 = face.boundingBox.topLeft[ 1 ];
                const x2 = face.boundingBox.bottomRight[ 0 ];
                const y2 = face.boundingBox.bottomRight[ 1 ];
                const bWidth = x2 - x1;
                const bHeight = y2 - y1;
                drawLine( output, x1, y1, x2, y1 );
                drawLine( output, x2, y1, x2, y2 );
                drawLine( output, x1, y2, x2, y2 );
                drawLine( output, x1, y1, x1, y2 );

                // Draw the face mesh
                const keypoints = face.scaledMesh;
                for( let i = 0; i < FaceTriangles.length / 3; i++ ) {
                    let pointA = keypoints[ FaceTriangles[ i * 3 ] ];
                    let pointB = keypoints[ FaceTriangles[ i * 3 + 1 ] ];
                    let pointC = keypoints[ FaceTriangles[ i * 3 + 2 ] ];
                    drawTriangle( output, pointA[ 0 ], pointA[ 1 ], pointB[ 0 ], pointB[ 1 ], pointC[ 0 ], pointC[ 1 ] );
                }
            });

            requestAnimationFrame( trackFace );
        }

        (async () => {
            await setupWebcam();
            const video = document.getElementById( "webcam" );
            video.play();
            let videoWidth = video.videoWidth;
            let videoHeight = video.videoHeight;
            video.width = videoWidth;
            video.height = videoHeight;

            let canvas = document.getElementById( "output" );
            canvas.width = video.width;
            canvas.height = video.height;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

Can you do more with face tracking?

By combining TensorFlow’s facial landmark detection model with webcam video, we were able to track faces in real time right in the browser. Our face tracking code also works with images, and key points could tell us more than we might expect. Maybe we should try applying this code to a dataset of faces, like FER + Facial Expression Recognition (facial expression recognition)?

In the next article in this series, we’ll use deep learning on tracked faces from the FER + dataset and try to pinpoint a person’s emotions from facial points in the browser using TensorFlow.js. See you tomorrow at the same time.

Find out the detailshow to get a Level Up in skills and salary or an in-demand profession from scratch by taking SkillFactory online courses with a 40% discount and a promotional code HABR, which will give another + 10% discount on training.

Other professions and courses

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *