Creating a High-Five Gesture Counter with Deep Learning

For about ten years, I wanted to implement this silly idea – to measure the acceleration of a person’s hand to calculate how many times he gives five during the day. I didn’t know how to solve this problem using classical approaches to developing algorithms based on familiar rules, so the project was suspended. But when I was doing the MATLAB Tech Talk series on Deep Learning, I realized that Deep Learning is perfect for solving this problem!

The theme for the fourth video in the series was “Transfer Learning,” and it turned out to be a key concept I needed to quickly implement an algorithm that counted how many times I high five throughout the day. In this article, I will go into detail about the code and the tools I used to count the number of high-five gestures. Hopefully you can use this example as a starting point for solving the complex classification problems that you have been sitting on for the past 10 years.

So let’s get started!

Equipment overview

Setting up your hardware is pretty straightforward. I have an accelerometer connected to an Arduino Uno via the I2C bus. The Arduino then connects to my computer via USB.

To measure acceleration, I am using the MPU-9250. This is a 9 DOF Inertial Probe from TDK InvenSense. Instead of integrating the sensor into my own circuit, I use a breadboard that provides power and I2C communication pins.

You can see that my device is rather crudely designed using just a breadboard and a few jumpers, but that’s not a bad thing since you don’t have to do any complicated setup for the device to work properly.

Reading data from an accelerometer in MATLAB

To read acceleration data from MPU-9250 via Arduino use MATLAB Support Package for Arduino Hardware… This package allows you to communicate with the Arduino without having to compile code for it. In addition, there is a built-in function mpu9250, which allows you to read the sensor readings using a one-line command.

It only takes three lines of code to connect to the Arduino, instantiate the MPU9250 object, and read the accelerometer readings.

Data and scalogram preprocessing

If you’ve watched the 4th Tech Talk about Deep Learning, you know that I decided to convert the triaxial acceleration data to an image to take advantage of GoogLeNet – a network trained to recognize images. Specifically, I used continuous wavelet transform for creating scalograms

A scalogram is a time-frequency representation that is suitable for signals that exist at multiple scales. That is, signals that are low frequency and slowly changing, but then interrupted from time to time by high frequency transients. They turned out to be useful for visualizing hand acceleration data during a high-five gesture.

The version of the MATLAB code that I used to plot the above graph:

close all
clear

 % If your computer is not able to run this real-time, reduce the sample 
% rate or comment out the scalogram part
fs = 50; % Run at 50 Hz

a = arduino('COM3', 'Uno', 'Libraries', 'I2C');  % Change to your arduino
imu = mpu9250(a);

buffer_length_sec = 2; % Seconds of data to store in buffer
accel = zeros(floor(buffer_length_sec * fs) + 1, 3); % Init buffer

t = 0:1/fs:(buffer_length_sec(end)); % Time vector

subplot(2, 1, 1)
plot_accel = plot(t, accel); % Set up accel plot
axis([0, buffer_length_sec, -50, 50]);

subplot(2, 1, 2)
plot_scale = image(zeros(224, 224, 3)); % Set up scalogram

tic % Start timer
last_read_time = 0;

i = 0;
% Run for 20 seconds
while(toc <= 20)
    current_read_time = toc;
    if (current_read_time - last_read_time) >= 1/fs
        i = i + 1;

        accel(1:end-1, :) = accel(2:end, :); % Shift values in FIFO buffer
        accel(end, :) = readAcceleration(imu);

        plot_accel(1).YData = accel(:, 1);
        plot_accel(2).YData = accel(:, 2);
        plot_accel(3).YData = accel(:, 3);

        % Only run scalogram every 3rd sample to save on compute time
        if mod(i, 3) == 0

        fb = cwtfilterbank('SignalLength', length
            'VoicesPerOctave', 12);
        sig = accel(:, 1);
        [cfs, ~] = wt(fb, sig);
        cfs_abs = abs(cfs);
        accel_i = imresize(cfs_abs/8, [224 224]);


        fb = cwtfilterbank('SignalLength', length
            'VoicesPerOctave', 12);
        sig = accel(:, 2);
        [cfs, ~] = wt(fb, sig);
        cfs_abs = abs(cfs);
        accel_i(:, :, 2) = imresize(cfs_abs/8, [224 224]);

        fb = cwtfilterbank('SignalLength', length
            'VoicesPerOctave', 12);
        sig = accel(:, 3);
        [cfs, ~] = wt(fb, sig);
        cfs_abs = abs(cfs);
        accel_i(:, :, 3) = imresize(cfs_abs/8, [224 224]);

        if~(isempty(accel_i(accel_i>1)))
            accel_i(accel_i>1) = 1;
        end

        plot_scale.CData = accel_i;
        end

        last_read_time = current_read_time;
    end
end

Please note, this code uses the function cwtfilterbank to create a scalogram, which is part of Wavelet Toolbox… If you do not have access to this set of tools and do not want to write the code yourself, try a different type of time-frequency imaging. It might work spectrogram or some other algorithm you come up with. Whichever you choose, the idea is to create an image that has unique and distinguishable high-five patterns. It can be seen that the scaleogram works well, although other methods can be tried.

Creating training data

To teach the network to recognize the high-five gesture, we need a few examples of what the gesture looks and doesn’t look like. Since we’ll start with a pretrained network, we won’t need as many training examples as we need when training a network from scratch. I don’t know exactly how much training data is needed to fully cover the solution space for all possible high-fives, but I collected data for one hundred high-five and one hundred don-five, and it seems to work well. I think that if I was really building a product, I would use a lot more examples. You can play with the amount of labeled training data and see how it affects the outcome.

Collecting 200 images seems like a lot of work, but I wrote a small script that loops through them one by one and saves the images in the appropriate folder. I ran this script twice: once with the label “high five” with the images saved in the data / high_five folder, and once with the “no_high_five” label with the images saved in the data / no_high_five folder.

 % This script collects training data and places it in the specified
% label subfolder. 3 seconds of data is collected from the
% sensor but only keeps and saves off the last 2 seconds.
% This gives the user some buffer time to start the high five.

% The program pauses between images and prompts the user to continue.

% Note, you'll want to move the figure away from the MATLAB window so that
% you can see the acceleration after you respond to the wait prompt.

close all
clear all

% If your computer is not able to run this real-time, reduce the sample rate
fs = 50; % Run at 50 Hz

parentDir = pwd;
dataDir="data";

%% Set the label for the data that you are generating
% labels="no_high_five";
labels="high_five";

a = arduino('COM3', 'Uno', 'Libraries', 'I2C');  % Change to your arduino
imu = mpu9250(a);

buffer_length_sec = 2; % Seconds of data to store in buffer
accel = zeros(floor(buffer_length_sec * fs) + 1, 3);  % Init buffer

t = 0:1/fs:(buffer_length_sec(end)); % Time vector

subplot(2, 1, 1)
plot_accel = plot(t, accel); % Set up accel plot
axis([0 buffer_length_sec -50 50]);

subplot(2, 1, 2)
plot_scale = image(zeros(224, 224, 3)); % Set up scalogram

for j = 1:100 % Collect 100 images

    % Prompt user to be ready to record next high five
    H = input('Hit enter when ready: ');

    tic % Start timer
    last_read_time = 0;

    i = 0;
    % Run for 3 seconds
    while(toc <= 3)
        current_read_time = toc;
        if (current_read_time - last_read_time) >= 1/fs
            i = i + 1;

            accel(1:end-1, :) = accel(2:end, :);  % Shift values in buffer
            accel(end, :) = readAcceleration(imu);

            plot_accel(1).YData = accel(:, 1);
            plot_accel(2).YData = accel(:, 2);
            plot_accel(3).YData = accel(:, 3);

            % Run scalogram every 3rd sample

            if mod(i, 3) == 0

                fb = cwtfilterbank('SignalLength', length
                    'VoicesPerOctave', 12);
                sig = accel(:, 1);
                [cfs, ~] = wt(fb, sig);
                cfs_abs = abs(cfs);
                accel_i = imresize(cfs_abs/8, [224 224]);


                fb = cwtfilterbank('SignalLength', length
                    'VoicesPerOctave', 12);
                sig = accel(:, 2);
                [cfs, ~] = wt(fb, sig);
                cfs_abs = abs(cfs);
                accel_i(:, :, 2) = imresize(cfs_abs/8, [224 224]);

                fb = cwtfilterbank('SignalLength', length
                    'VoicesPerOctave', 12);
                sig = accel(:, 3);
                [cfs, ~] = wt(fb, sig);
                cfs_abs = abs(cfs);
                accel_i(:, :, 3) = imresize(cfs_abs/8, [224 224]);

                if~(isempty(accel_i(accel_i>1)))
                    accel_i(accel_i>1) = 1;
                end

                plot_scale.CData = accel_i;
            end

            last_read_time = current_read_time;
        end
    end

    % Save image to data folder
    imageRoot = fullfile(parentDir,dataDir);
    imgLoc = fullfile(imageRoot,char(labels));
    imFileName = strcat(char(labels),'_',num2str(j),'.jpg');

    imwrite(plot_scale.CData, fullfile(imgLoc,imFileName), 'JPEG');
end

After running the script, I manually looked at the training data and removed the images that I thought might degrade the training results. These were images where the high-five was not in the middle of the frame, or images in which I knew I had made a bad hand movement. In the gif below, I deleted image 49 because it was not in the center of the frame.

Transfer Learning and GoogLeNet

Once all my training data is in their respective folders, the next step is to set up the network. In this part, I followed the example Classify Time Series Using Wavelet Analysis and Deep Learningbut instead of running everything through a MATLAB script, it was easier for me to set up and train the network using the app Deep network designer

I started with a pretrained GoogLeNet to take advantage of all the knowledge of this network to recognize objects in images. GoogLeNet has been trained to recognize things like fish and hot dogs in images – clearly not what I’m looking for – but that’s where transfer learning comes in handy. Through transfer learning, I can keep most of the existing network intact and only replace the two layers at the end of the network that bundle common features into the specific patterns I’m looking for. Then, when I train the network, it is mainly only these two layers that need to be updated, so the training is much faster with transfer training.

If you want to know exactly how I replaced the layers and what training parameters I used, I recommend that you follow the MATLAB example or watch the Tech Talk, but you can experiment with the net yourself. You can try starting with a different pretrained network like SqueezeNet, or you can replace more layers in GoogLeNet or change the training parameters. There are many options here, and I think deviating from what I did might help you develop some intuition as to how all of these parameters affect the result.

Network training

Once the network is ready, it is very easy to train it using the Deep Network Designer. In the Data Tab, I imported the training data by selecting the folder where I saved the “high-five, low-five” image set. It also highlighted 20 percent of the images to be used for validation during training.

Then on the training tab I set my training parameters. Here I have used the same parameters that were used in the MATLAB example, but once again I recommend that you play with some of these values ​​and see how they affect the results.

Training on my single processor took just over four minutes and achieved an accuracy of about 97%. Not bad for a couple of hours of work!

Counter testing

Now that I have a trained network, I use the function classify from the Deep Learning Toolbox to pass the scaleogram at each sample time and return the label “high_five” or “no_high_five”. If the returned label was “high_five”, I increment the counter. In order not to count the same gesture several times, since the acceleration data passes through the entire buffer, I added a delay that will not count a new high-five gesture if at least two seconds have passed since the previous gesture.

Below is the code I used to count the gesture:

close all
clear

%% Update to the name of your trained network
load trainedGN
trainedNetwork = trainedGN;

% If your computer is not able to run this real-time, reduce the sample
% rate or comment out the scalogram part
fs = 50; % Run at 50 Hz

a = arduino('COM3', 'Uno', 'Libraries', 'I2C');  % Change to your arduino
imu = mpu9250(a);

buffer_length_sec = 2; % Seconds of data to store in buffer
accel = zeros(floor(buffer_length_sec * fs) + 1, 3); % Init buffer

t = 0:1/fs:(buffer_length_sec(end)); % Time vector

% Set up plots
h = figure;
h.Position = [100         100        900         700];
p1 = subplot(2, 1, 1);
plot_accel = plot(t, accel);
plot_accel(1).LineWidth = 3;
plot_accel(2).LineWidth = 3;
plot_accel(3).LineWidth = 3;
p1.FontSize = 20;
p1.Title.String = 'Acceleration';
axis([0 t(end) -50 60]);
xlabel('Seconds');
ylabel('Acceleration, mpss');
grid on;
label_string = text(1.3, 45, 'No High Five');
label_string.Interpreter="none";
label_string.FontSize = 25;
count_string = text(0.1, 45, 'High five counter:');
count_string.Interpreter="none";
count_string.FontSize = 15;
val_string = text(0.65, 45, '0');
val_string.Interpreter="none";
val_string.FontSize = 15;
p2 = subplot(2, 1, 2);
scale_accel = image(zeros(224, 224, 3));
p2.Title.String = 'Scalogram';
p2.FontSize = 20;

telapse = 0;
hfcount = 0;

tic  % Start timer
last_read_time = 0;

i = 0;
% Run high five counter for 20 seconds
while(toc <= 20)
    current_read_time = toc;
    if (current_read_time - last_read_time) >= 1/fs
        i = i + 1;
        telapse = telapse + 1;
        % Read accel
        accel(1:end-1, :) = accel(2:end, :); % Shift values in FIFO buffer
        accel(end, :) = readAcceleration(imu);

        plot_accel(1).YData = accel(:, 1);
        plot_accel(2).YData = accel(:, 2);
        plot_accel(3).YData = accel(:, 3);

        % Only run scalogram every 3rd sample to save on compute time
        if mod(i, 3) == 0

            % Scalogram
            fb = cwtfilterbank('SignalLength', length
                    'VoicesPerOctave', 12);
                sig = accel(:, 1);
                [cfs, ~] = wt(fb, sig);
                cfs_abs = abs(cfs);
                accel_i = imresize(cfs_abs/8, [224 224]);


                fb = cwtfilterbank('SignalLength', length
                    'VoicesPerOctave', 12);
                sig = accel(:, 2);
                [cfs, ~] = wt(fb, sig);
                cfs_abs = abs(cfs);
                accel_i(:, :, 2) = imresize(cfs_abs/8, [224 224]);

                fb = cwtfilterbank('SignalLength', length
                    'VoicesPerOctave', 12);
                sig = accel(:, 3);
                [cfs, ~] = wt(fb, sig);
                cfs_abs = abs(cfs);
                accel_i(:, :, 3) = imresize(cfs_abs/8, [224 224]);

            % Saturate pixels at 1
            if ~(isempty(accel_i(accel_i>1)))
                accel_i(accel_i>1) = 1;
            end

            scale_accel.CData = im2uint8(accel_i);

            % Classify Scalogram
            [YPred,probs] = classify(trainedNetwork,scale_accel.CData);
            if strcmp(string(YPred), 'high_five')
                label_string.BackgroundColor = [1 0 0];
                label_string.String = "High Five!";
                % Only count if 100 samples have past since last high five
                if telapse > 100
                    hfcount = hfcount + 1;
                    val_string.String = string(hfcount);
                    telapse = 0;
                end
            else
                label_string.BackgroundColor = [1 1 1];
                label_string.String = "No High Five";
            end
        end
    end
end

And here he is in action!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *