Automation of recognition and counting of transactions from images

Most processes in our modern world tend to be automated. I would like to post my development here. I hope this material will find its reader. In this article, we will consider the automation of the introduction of daily company reports.

What problems does the script solve?

Relieving an employee of a routine task.
Minimizing errors that a person might accidentally make.
Optimization of company costs, reduction of expenses on maintaining staff: managers and accountants.

In this article I want to share how you can automate processes within a company using my script.

Script architecture

The architecture you see in Figure 1 is very primitive. This script is in development mode, and it currently handles images of Sberbank transactions well.

To work with the image I used the following libraries:

Tesseract – is a JavaScript library that extracts words from images in almost any language.
Sharp – is a JavaScript library that processes photos, let's see what it specifically does using an example.

Additional: To output logs I used the library Pino.

Code Description

Image processing.
Recognize text in an image using OCR.
Extraction and calculation of financial data.

Let's look at the files in the directory utils .

logger.js

import pino from 'pino';
import pretty from 'pino-pretty';

export const logger = pino(pretty({
    colorize: true,
    levelFirst: true,
}));

Pino provides additional functionality and flexibility + in some projects linters may prohibit the use console.log .

preload.js

import { fileURLToPath } from 'url';
import path, { dirname as pathDirname, resolve } from 'path';
import sharp from 'sharp';

const filename = fileURLToPath(import.meta.url);
const dirname = pathDirname(filename);

export const getPath = (relativePath) => {
    const ROOT_PATH = path.join(dirname, '../');
    return resolve(ROOT_PATH, relativePath);
};

export const preload = async (input, output) => {
    return await sharp(input)
    .resize(1200)
    .grayscale()
    .normalize()
    .sharpen()
    .toBuffer();
};

In this code we use the library sharpwhich is designed for image processing in Node.js.

What are we doing here?
We change the image size, make the picture black and white, improve the contrast, increase the sharpness => save the result in output.

Why are we doing this?
Improving the visual quality of our image. Testing has shown that not all transactions are recognized by Tesseract.

Let's look at the main file app.js.

app.js

import Tesseract from 'tesseract.js';
import { preload, getPath } from './utils/preload.js';
import { logger } from './utils/logger.js';
import { keywords } from './keywords.js';

const input = getPath('./assets/input.jpg');

const run = async (input) => {
    const output = await preload(input);

    try {
        const { data: { text } } = await Tesseract.recognize(output, 'rus', {
            logger: info => logger.info(info.status)
        });

        const amounts = [];
        const priceWithRRegex = /[+-]?\d+(?:[\s,]\d{3})*(?:[.,]\d+)?\s*Р/g;

        let ignoreNext = false;
        text.split('\n').forEach(line => {
            if (keywords.some(keyword => line.includes(keyword))) {
                ignoreNext = true;
            }

            if (!ignoreNext) {
                const priceMatches = line.match(priceWithRRegex);
                if (priceMatches) {
                    priceMatches.forEach(priceMatch => {
                        let price = priceMatch
                            .replace(/\s/g, '')
                            .replace(',', '.')
                            .replace('Р', '')
                            .trim();

                        if (!price.startsWith('+') && !price.startsWith('-')) {
                            price="-" + price;
                        }

                        const numericPrice = parseFloat(price);
                        if (!isNaN(numericPrice)) {
                            amounts.push(numericPrice);
                        }
                    });
                }
            }

            if (ignoreNext) {
                ignoreNext = false;
            }
        });

        const total = amounts.reduce((acc, curr) => acc + curr, 0);

        return {
            amounts,
            total
        };

    } catch (error) {
        logger.info(error);
        throw error;
    }
};

run(input).then(result => {
    logger.info(result, 'Result: ');
});

First the function is called preloadwhich performs image processing. As I wrote earlier, we need this in order to improve the quality of text recognition.

We process the image using Tesseract. The text we received is split into lines. Using a regular expression priceWithRRegexwe look for the amounts marked with the symbol “P” in each line. I also set keywords after these keywords the amount is not included in the array. For example, the total amount for the day.

As a result, after processing the data, we receive an array of transactions and their total amount.

The code for the entire project is GitHub.

Testing

Let's look at how the script works using this image as an example.

The image we will be processing:

Processed image:

After the image is processed, the text is extracted and all transactions are received. As a result, we get an array of transactions and their total amount.

Conclusion

This example shows well how to automate the process of maintaining a daily report using such a script. It solves the problems described in the article and simplifies working with financial data. This script is in the development stage, it would be good to improve it together with other developers. In the nearest plans – to add support for recognizing transactions from other banks that use a slightly different data format. In the future, it is possible to integrate this script with other systems, such as MyWarehousewhich will allow you to automate even more work, including synchronizing data with accounting systems, managing inventory balances and improving overall business management.

My Github