Live face tracking in the browser using TensorFlow.js. Part 5

Wearing virtual accessories is fun, but it’s only one step to wearing them in real life. We could easily create an application that allows you to virtually try on hats – the kind of application you might want to create for an e-commerce website. But, if we’re going to do it, why not have a little more fun doing it? The great thing about software is that we can bring our imaginations to life.

In this article, we’re going to put all the previous pieces together to create a magic emotion detection hat that recognizes and reacts to facial expressions when we wear it virtually.


You can download a demo version of this project. You might need to enable WebGL support in your web browser to get the performance you want. You can also download code and files for this series. It assumes that you are familiar with JavaScript and HTML and have at least a basic understanding of neural networks.

Making a magic hat

Remember how we created real-time facial emotion detection earlier in this series? Now let’s add some graphics to this project – give it a face, so to speak.

To create our virtual hat, we are going to add graphic assets to the web page as hidden elements img:

<img id="hat-angry" src="https://habr.com/ru/company/skillfactory/blog/545336/web/hats/angry.png" style="visibility: hidden;" />
<img id="hat-disgust" src="web/hats/disgust.png" style="visibility: hidden;" />
<img id="hat-fear" src="web/hats/fear.png" style="visibility: hidden;" />
<img id="hat-happy" src="web/hats/happy.png" style="visibility: hidden;" />
<img id="hat-neutral" src="web/hats/neutral.png" style="visibility: hidden;" />
<img id="hat-sad" src="web/hats/sad.png" style="visibility: hidden;" />
<img id="hat-surprise" src="web/hats/surprise.png" style="visibility: hidden;" />

The key property of this project is that the hat must be displayed all the time, in the correct position and in the correct size, so we will store the hat’s “states” in a global variable:

let currentEmotion = "neutral";
let hat = { scale: { x: 0, y: 0 }, position: { x: 0, y: 0 } };

Draw a hat of this size and in this position we will use a 2D transformation of the canvas in each frame.

async function trackFace() {
    ...

    output.drawImage(
        video,
        0, 0, video.width, video.height,
        0, 0, video.width, video.height
    );
    let hatImage = document.getElementById( `hat-${currentEmotion}` );
    output.save();
    output.translate( -hatImage.width / 2, -hatImage.height / 2 );
    output.translate( hat.position.x, hat.position.y );
    output.drawImage(
        hatImage,
        0, 0, hatImage.width, hatImage.height,
        0, 0, hatImage.width * hat.scale, hatImage.height * hat.scale
    );
    output.restore();

    ...
}

From the key points of the face provided by TensorFlow, we can calculate the size and position of the hat in relation to the face to set the above values.

The size of the head can be estimated by the distance between the eyes. The vector “up” is approximated by a point midwayBetweenEyes and point noseBottomwhich can be used to move the hat up and towards the top of the face (as opposed to the virtual glasses from the previous article).

const eyeDist = Math.sqrt(
    ( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
    ( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
    ( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
);

const faceScale = eyeDist / 80;
let upX = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];
let upY = face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ];
const length = Math.sqrt( upX ** 2 + upY ** 2 );
upX /= length;
upY /= length;

hat = {
    scale: faceScale,
    position: {
        x: face.annotations.midwayBetweenEyes[ 0 ][ 0 ] + upX * 100 * faceScale,
        y: face.annotations.midwayBetweenEyes[ 0 ][ 1 ] + upY * 100 * faceScale,
    }
};

After saving the name of the predicted emotion in currentEmotion the corresponding hat image is displayed and we are ready to try it on!

if( points ) {
    let emotion = await predictEmotion( points );
    setText( `Detected: ${emotion}` );
    currentEmotion = emotion;
}
else {
    setText( "No Face" );
}
Here is the complete code for this project
<html>
    <head>
        <title>Building a Magical Emotion Detection Hat</title>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@2.4.0/dist/tf.min.js"></script>
        <script src="https://cdn.jsdelivr.net/npm/@tensorflow-models/face-landmarks-detection@0.0.1/dist/face-landmarks-detection.js"></script>
    </head>
    <body>
        <canvas id="output"></canvas>
        <video id="webcam" playsinline style="
            visibility: hidden;
            width: auto;
            height: auto;
            ">
        </video>
        <h1 id="status">Loading...</h1>
        <img id="hat-angry" src="https://habr.com/ru/company/skillfactory/blog/545336/web/hats/angry.png" style="visibility: hidden;" />
        <img id="hat-disgust" src="web/hats/disgust.png" style="visibility: hidden;" />
        <img id="hat-fear" src="web/hats/fear.png" style="visibility: hidden;" />
        <img id="hat-happy" src="web/hats/happy.png" style="visibility: hidden;" />
        <img id="hat-neutral" src="web/hats/neutral.png" style="visibility: hidden;" />
        <img id="hat-sad" src="web/hats/sad.png" style="visibility: hidden;" />
        <img id="hat-surprise" src="web/hats/surprise.png" style="visibility: hidden;" />
        <script>
        function setText( text ) {
            document.getElementById( "status" ).innerText = text;
        }

        function drawLine( ctx, x1, y1, x2, y2 ) {
            ctx.beginPath();
            ctx.moveTo( x1, y1 );
            ctx.lineTo( x2, y2 );
            ctx.stroke();
        }

        async function setupWebcam() {
            return new Promise( ( resolve, reject ) => {
                const webcamElement = document.getElementById( "webcam" );
                const navigatorAny = navigator;
                navigator.getUserMedia = navigator.getUserMedia ||
                navigatorAny.webkitGetUserMedia || navigatorAny.mozGetUserMedia ||
                navigatorAny.msGetUserMedia;
                if( navigator.getUserMedia ) {
                    navigator.getUserMedia( { video: true },
                        stream => {
                            webcamElement.srcObject = stream;
                            webcamElement.addEventListener( "loadeddata", resolve, false );
                        },
                    error => reject());
                }
                else {
                    reject();
                }
            });
        }

        const emotions = [ "angry", "disgust", "fear", "happy", "neutral", "sad", "surprise" ];
        let emotionModel = null;

        let output = null;
        let model = null;

        let currentEmotion = "neutral";
        let hat = { scale: { x: 0, y: 0 }, position: { x: 0, y: 0 } };

        async function predictEmotion( points ) {
            let result = tf.tidy( () => {
                const xs = tf.stack( [ tf.tensor1d( points ) ] );
                return emotionModel.predict( xs );
            });
            let prediction = await result.data();
            result.dispose();
            // Get the index of the maximum value
            let id = prediction.indexOf( Math.max( ...prediction ) );
            return emotions[ id ];
        }

        async function trackFace() {
            const video = document.querySelector( "video" );
            const faces = await model.estimateFaces( {
                input: video,
                returnTensors: false,
                flipHorizontal: false,
            });
            output.drawImage(
                video,
                0, 0, video.width, video.height,
                0, 0, video.width, video.height
            );
            let hatImage = document.getElementById( `hat-${currentEmotion}` );
            output.save();
            output.translate( -hatImage.width / 2, -hatImage.height / 2 );
            output.translate( hat.position.x, hat.position.y );
            output.drawImage(
                hatImage,
                0, 0, hatImage.width, hatImage.height,
                0, 0, hatImage.width * hat.scale, hatImage.height * hat.scale
            );
            output.restore();

            let points = null;
            faces.forEach( face => {
                const x1 = face.boundingBox.topLeft[ 0 ];
                const y1 = face.boundingBox.topLeft[ 1 ];
                const x2 = face.boundingBox.bottomRight[ 0 ];
                const y2 = face.boundingBox.bottomRight[ 1 ];
                const bWidth = x2 - x1;
                const bHeight = y2 - y1;

                // Add just the nose, cheeks, eyes, eyebrows & mouth
                const features = [
                    "noseTip",
                    "leftCheek",
                    "rightCheek",
                    "leftEyeLower1", "leftEyeUpper1",
                    "rightEyeLower1", "rightEyeUpper1",
                    "leftEyebrowLower", //"leftEyebrowUpper",
                    "rightEyebrowLower", //"rightEyebrowUpper",
                    "lipsLowerInner", //"lipsLowerOuter",
                    "lipsUpperInner", //"lipsUpperOuter",
                ];
                points = [];
                features.forEach( feature => {
                    face.annotations[ feature ].forEach( x => {
                        points.push( ( x[ 0 ] - x1 ) / bWidth );
                        points.push( ( x[ 1 ] - y1 ) / bHeight );
                    });
                });

                const eyeDist = Math.sqrt(
                    ( face.annotations.leftEyeUpper1[ 3 ][ 0 ] - face.annotations.rightEyeUpper1[ 3 ][ 0 ] ) ** 2 +
                    ( face.annotations.leftEyeUpper1[ 3 ][ 1 ] - face.annotations.rightEyeUpper1[ 3 ][ 1 ] ) ** 2 +
                    ( face.annotations.leftEyeUpper1[ 3 ][ 2 ] - face.annotations.rightEyeUpper1[ 3 ][ 2 ] ) ** 2
                );
                const faceScale = eyeDist / 80;
                let upX = face.annotations.midwayBetweenEyes[ 0 ][ 0 ] - face.annotations.noseBottom[ 0 ][ 0 ];
                let upY = face.annotations.midwayBetweenEyes[ 0 ][ 1 ] - face.annotations.noseBottom[ 0 ][ 1 ];
                const length = Math.sqrt( upX ** 2 + upY ** 2 );
                upX /= length;
                upY /= length;

                hat = {
                    scale: faceScale,
                    position: {
                        x: face.annotations.midwayBetweenEyes[ 0 ][ 0 ] + upX * 100 * faceScale,
                        y: face.annotations.midwayBetweenEyes[ 0 ][ 1 ] + upY * 100 * faceScale,
                    }
                };
            });

            if( points ) {
                let emotion = await predictEmotion( points );
                setText( `Detected: ${emotion}` );
                currentEmotion = emotion;
            }
            else {
                setText( "No Face" );
            }
            
            requestAnimationFrame( trackFace );
        }

        (async () => {
            await setupWebcam();
            const video = document.getElementById( "webcam" );
            video.play();
            let videoWidth = video.videoWidth;
            let videoHeight = video.videoHeight;
            video.width = videoWidth;
            video.height = videoHeight;

            let canvas = document.getElementById( "output" );
            canvas.width = video.width;
            canvas.height = video.height;

            output = canvas.getContext( "2d" );
            output.translate( canvas.width, 0 );
            output.scale( -1, 1 ); // Mirror cam
            output.fillStyle = "#fdffb6";
            output.strokeStyle = "#fdffb6";
            output.lineWidth = 2;

            // Load Face Landmarks Detection
            model = await faceLandmarksDetection.load(
                faceLandmarksDetection.SupportedPackages.mediapipeFacemesh
            );
            // Load Emotion Detection
            emotionModel = await tf.loadLayersModel( 'web/model/facemo.json' );

            setText( "Loaded!" );

            trackFace();
        })();
        </script>
    </body>
</html>

What’s next? Is it possible to control the condition of the eyes and mouth?

This project brings together all the pieces created earlier in this series for fun with visuals. But what if it was possible to implement interaction with the face in it?

In the next and final article in this series, we will implement eye blink and mouth opening detection to create an interactive scene. Stay with us and see you tomorrow at the same time.

  • Tracking faces in real time in the browser. Part 1

  • Tracking faces in real time in the browser. Part 2

  • Tracking faces in real time in the browser. Part 3

  • Tracking faces in real time in the browser. Part 4

Find out the detailshow to get a Level Up in skills and salary or an in-demand profession from scratch by taking SkillFactory online courses with a 40% discount and a promotional code HABR, which will give another + 10% discount on training.

Other professions and courses

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *