Face detection in Node.js using Rust and WebAssembly

For future students on the course “Node.js Developer” we offer to sign up for an open lesson on the topic “Dockerizing node.js applications”…
For now, let’s share the traditional translation of the article.

In the last article we talked about how to call Rust functions from Node.js… Today we will show you how to write an AIaaS (Artificial Intelligence as a Service) application based on Node.js.

Most artificial intelligence applications are now developed in Python, and JavaScript is the main programming language for web development. In order to realize the power of AI on the web, you need to wrap the AI algorithms in JavaScript, namely Node.js.

However, neither Python nor JavaScript by themselves are suitable for developing computationally intensive AI applications. They are high-level, slow languages with complex runtime environments where usability comes at the cost of performance degradation. To solve this problem, intelligent computing blocks in Python are wrapped in native C / C ++ modules. You can do the same in Node.js, but we found a better solution – WebAssembly.

WebAssembly VMs support tight integration with Node.js and other JavaScript runtimes. They are distinguished by high performance, safe from the point of view of memory access, inherently protected and compatible with different operating systems. Our approach combines the best features of WebAssembly and native code.

How it works

A Node.js AIaaS application has three components.

Node.js application is a web service and calls a function WebAssembly to perform intensive computations such as AI algorithms.
To prepare, post-process and integrate data with other systems, use the function WebAssembly… We will use the Rust language. The application developer must write the function.
To run the AI model, native code is used, which maximizes performance. This part of the code is very short and should be checked for security and safety. You just need to call this native program from the function WebAssembly – just like in Python and Node.js, native functions are called.

Now let’s look at an example.

How can you implement face detection functionality

Face detection web service allows the user to upload a photo and highlights all detected faces with a green border.

The Rust source code for implementing the MTCNN face detection model follows this excellent instruction from Cetra: Face detection using the Tensorflow library in Rust… In order for Tensorflow to work in WebAssembly, we have changed a few things.

Node.js application responsible for loading the file and displaying the result.

app.post('/infer', function (req, res) {
  let image_file = req.files.image_file;
  var result_filename = uuidv4() + ".png";

  // Call the infer() function from WebAssembly (SSVM)
  var res = infer(req.body.detection_threshold, image_file.data);

  fs.writeFileSync("public/" + result_filename, res);
  res.send('<img src="' +  result_filename + '"/>');
});

As you can see, the JavaScript application simply passes to the function infer() graphic data and parameter detection_thresholdthat specifies the minimum face size that the application can detect, and then saves the return value to a graphics file on the server. Function infer() is written in Rust and compiled to WebAssembly, so it can be called from JavaScript.

Function first infer() flattens the original graphic data and saves it to an array. It tunes the TensorFlow model and uses flat graphic data as input. After executing the TensorFlow model, a set of numbers is returned – the coordinates of the four vertices of the face frame. Then the function infer() draws a border around each face and saves the resized PNG image to the web server.

#[wasm_bindgen]
pub fn infer(detection_threshold: &str, image_data: &[u8]) -> Vec<u8> {
    let mut dt = detection_threshold;
    ... ...
    let mut img = image::load_from_memory(image_data).unwrap();

    // Run the tensorflow model using the face_detection_mtcnn native wrapper
    let mut cmd = Command::new("face_detection_mtcnn");
    // Pass in some arguments
    cmd.arg(img.width().to_string())
        .arg(img.height().to_string())
        .arg(dt);
    // The image bytes data is passed in via STDIN
    for (_x, _y, rgb) in img.pixels() {
        cmd.stdin_u8(rgb[2] as u8)
            .stdin_u8(rgb[1] as u8)
            .stdin_u8(rgb[0] as u8);
    }
    let out = cmd.output();

    // Draw boxes from the result JSON array
    let line = Pixel::from_slice(&[0, 255, 0, 0]);
    let stdout_json: Value = from_str(str::from_utf8(&out.stdout).expect("[]")).unwrap();
    let stdout_vec = stdout_json.as_array().unwrap();
    for i in 0..stdout_vec.len() {
        let xy = stdout_vec[i].as_array().unwrap();
        let x1: i32 = xy[0].as_f64().unwrap() as i32;
        let y1: i32 = xy[1].as_f64().unwrap() as i32;
        let x2: i32 = xy[2].as_f64().unwrap() as i32;
        let y2: i32 = xy[3].as_f64().unwrap() as i32;
        let rect = Rect::at(x1, y1).of_size((x2 - x1) as u32, (y2 - y1) as u32);
        draw_hollow_rect_mut(&mut img, rect, *line);
    }   
    let mut buf = Vec::new();
    // Write the result image into STDOUT
    img.write_to(&mut buf, image::ImageOutputFormat::Png).expect("Unable to write");
    return buf;
}

Command face_detection_mtcnn runs the TensorFlow model in native code. It takes three arguments: image width, image height, and the minimum face size to detect. The actual graphics data as flat RGB values are passed to the program from the function infer() WebAssembly via STDIN. The result of running the model is encoded in JSON and returns through STDOUT…

Please note that we passed the detection model parametert hreshold into tensor minsize and then used tensor input to transfer input graphic data to the program. Tensor box used to extract results from a model.

fn main() -> Result<(), Box<dyn Error>> {
    // Get the arguments passed in from WebAssembly
    let args: Vec<String> = env::args().collect();
    let img_width: u64 = args[1].parse::<u64>().unwrap();
    let img_height: u64 = args[2].parse::<u64>().unwrap();
    let detection_threshold: f32 = args[3].parse::<f32>().unwrap();
    let mut buffer: Vec<u8> = Vec::new();
    let mut flattened: Vec<f32> = Vec::new();

    // The image bytes are read from STDIN
    io::stdin().read_to_end(&mut buffer)?;
    for num in buffer {
        flattened.push(num.into());
    }

    // Load up the graph as a byte array and create a tensorflow graph.
    let model = include_bytes!("mtcnn.pb");
    let mut graph = Graph::new();
    graph.import_graph_def(&*model, &ImportGraphDefOptions::new())?;

    let mut args = SessionRunArgs::new();
    // The `input` tensor expects BGR pixel data from the input image
    let input = Tensor::new(&[img_height, img_width, 3]).with_values(&flattened)?;
    args.add_feed(&graph.operation_by_name_required("input")?, 0, &input);

    // The `min_size` tensor takes the detection_threshold argument
    let min_size = Tensor::new(&[]).with_values(&[detection_threshold])?;
    args.add_feed(&graph.operation_by_name_required("min_size")?, 0, &min_size);

    // Default input params for the model
    let thresholds = Tensor::new(&[3]).with_values(&[0.6f32, 0.7f32, 0.7f32])?;
    args.add_feed(&graph.operation_by_name_required("thresholds")?, 0, &thresholds);
    let factor = Tensor::new(&[]).with_values(&[0.709f32])?;
    args.add_feed(&graph.operation_by_name_required("factor")?, 0, &factor);

    // Request the following outputs after the session runs.
    let bbox = args.request_fetch(&graph.operation_by_name_required("box")?, 0);

    let session = Session::new(&SessionOptions::new(), &graph)?;
    session.run(&mut args)?;

    // Get the bounding boxes
    let bbox_res: Tensor<f32> = args.fetch(bbox)?;
    let mut iter = 0;
    let mut json_vec: Vec<[f32; 4]> = Vec::new();
    while (iter * 4) < bbox_res.len() {
        json_vec.push([
            bbox_res[4 * iter + 1], // x1
            bbox_res[4 * iter],     // y1
            bbox_res[4 * iter + 3], // x2
            bbox_res[4 * iter + 2], // y2
        ]);
        iter += 1;
    }
    let json_obj = json!(json_vec);
    // Return result JSON in STDOUT
    println!("{}", json_obj.to_string()); 
    Ok(())
}

Our goal is to create native wrappers for executing generic AI models that developers can use as libraries.

How to deploy face detection software

First you need to install Rust, Node.js, virtual machine Second State WebAssembly VM and tool ssvmup… Read the instructions for configuration or use a Docker image. You will also need the TensorFlow library.

$ wget https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz
$ sudo tar -C /usr/ -xzf libtensorflow-gpu-linux-x86_64-1.15.0.tar.gz

To deploy face detection software, you need a native driver for the TensorFlow model. It can be compiled from Rust source like this project…

# in the native_model_zoo/face_detection_mtcnn directory
$ cargo install --path .

Then go to web application project… Run ssvmup to assemble a WebAssembly function from Rust. This WebAssembly function prepares data for a web application.

# in the nodejs/face_detection_service directory
$ ssvmup build

Once you’ve assembled the WebAssembly function, you can run your Node.js web application.

$ npm i express express-fileupload uuid

$ cd node
$ node server.js

The web service is now available on port 8080 on your computer. Try uploading your selfies, family or group photos!

TensorFlow Model Zoo

Rust package facedetectionmtcnn Is a wrapper around the TensorFlow library. It loads a trained TensorFlow model (called a frozen saved model), passes input data to the model, runs the model, and extracts outputs from it.

In this case, our wrapper only extracts the coordinates of the frames around the detected faces. The model also determines the confidence levels for each detected face and the position of the eyes, mouth and nose. If you change the names of tensors to retrieve data, the wrapper will receive this information as well and return it to the WASM function.

If you want to use a different model, then it will be easy to create a wrapper for it using this example. You just need to know the names of the input and output tensors and their data types.

For this we have created a project native model zoo, within which we develop ready-to-use wrappers in the Rust language for a variety of TensorFlow models.

What’s next

In this article, we figured out how to create a Node.js AIaaS application using Rust and WebAssembly for practical use. Our approach allows everyone to contribute to the Model Zoo, which application developers can use as an AI library.

In the next article, we’ll look at another TensorFlow model for image classification and show how to add support for an entire class of similar models to the wrapper.