How Computer Vision Helps Treat Diseases

Computer vision, or CV, is a general term for a variety of image recognition technologies: identifying objects, comparing faces, or assessing what’s going on in a photo. These technologies are used not only in graphic editors and smart cameras. CV is increasingly being used in more complex and demanding areas, such as medical research.

Together with Kirill Simonovan ML developer with expertise in computer vision, we understand the principles of CV and tell you what problems the technology solves in medicine.

What approaches are used in computer vision

Computer vision has been around for much longer than modern neural networks. CV engineers used algorithms that extracted key features from images using mathematics such as linear algebra and analytic geometry.

With the development of technology, the growth of computing power and data volumes, many problems began to be solved using neural networks. Now two approaches – classical CV and neural network – exist simultaneously. We tell in more detail how they differ and for what medical problems each of the approaches can be applied.

Classic CV. This is a set of algorithms that process images from a mathematical point of view. An image can be perceived as a signal, a two-dimensional or three-dimensional matrix of numbers, and therefore – mathematical transformations can be applied to it. Thanks to this, it is possible to change the image or extract useful information from it. For example:

using the gradient operation, you can detect a sharp color change and thus highlight the boundaries of an object;
amplitude component processing helps to find and filter out noise;
Converting images into vector representation helps to compare them – similar vectors mean that the objects in the pictures are also similar.

An example of a classic CV method is HOG — histogram of directional gradients. It helps to build a vector representation of an image with information about color differences and object boundaries. For example, if you combine HOG with the SVM (Support Vector Machine) classification algorithm, you can get the simplest object detector in an image. At one time, it was a breakthrough, although now it has given way to neural networks.

Classical CV methods require fewer computing resources and solve simple problems with high accuracy. In medicine, they are used more often as an auxiliary tool: to detect objects, pre-select areas of interest, etc.

Sometimes it is more profitable to use classical methods, for example, to find key points in an image or filter images. Although neural networks can cope with these tasks, traditional algorithms work faster and require less power.

For classical computer vision, the library is often used OpenCV — it contains many algorithms and supports different programming languages. Engineers working with classical methods also use Python libraries like NumPy, SciPy or Scikit-imagewhich contain the necessary mathematical functions for data processing.

Neural network CV. The first neural networks appeared several decades ago, but they were quite primitive and could not work with n-dimensional spaces. They could not handle precise image processing.

This changed with the advent of convolutional neural network architectures, which are well suited for computer vision tasks. Convolutional networks analyze pixels that are close to each other and contain continuous visual information. This allows them to understand the context of an image.

Most modern CV tools use convolutional neural networks that analyze, segment images, and find complex objects in detail, including for medical purposes.

To work with neural network CV, Python and one of the libraries are usually used: PyTorch or TensorFlow. Both are designed to create and train neural networks, and the choice between them depends on the engineer's preferences.

What is computer vision used for?

Traditionally, CV is applied to three main types of problems: classification, detection and segmentation.

This is how the solutions to three problems look like using one photo of a cat as an example. Source — This is what the solutions to three problems look like using one photo of a cat as an example. Source

Classification

This is the simplest of tasks – to understand whether a picture belongs to a certain class. The classification result is always binary: “yes” (1) if the image belongs to this class, or “no” (0) if it does not. There is also “multi-classification”, when one picture can belong to several classes.

For example, the classification task is to find out whether there are suspicious areas on an X-ray image. In medicine, the classification task in its pure form is not very common. A yes or no answer does not provide much information, and it requires clarification. The following tasks cope with this: detection and segmentation.

In this example, the model receives a photo of a skin lesion as input and classifies it as benign or malignant. Source

Detection

When solving this problem, the model not only answers whether an object is present in the image, but also detects it – determines its approximate boundaries.

If the image is two-dimensional, the model finds four corner points that form a rectangle around the object. For three-dimensional images, a parallelepiped is constructed instead of a rectangle, so eight points are needed. When working with video, the detection task can be transformed into a tracking task: the model must not only detect the object, but also track its movement in the frame.

Continuing with the X-ray example, the task of detection is to determine where exactly the suspicious area is. Later, the detected area can be sent to a classifier, for example, to answer the question: is it a tumor or not.

During the Covid restrictions, the detection task was used in conjunction with the classification task to detect whether a person was wearing a mask. The detector would highlight a person's face in the image, and the classifier would check whether it belonged to the — During the Covid restrictions, the detection task was used together with the classification task to detect whether a person was wearing a mask. The detector would highlight a person's face in the image, and the classifier would check whether it belonged to the “face with a mask” class. Source

Segmentation

This is a more complex task – the model classifies each pixel and determines the exact boundaries of the object. There are two types of segmentation:

semantic — identifies different classes of objects. For example, in an image with several cysts, the model will recognize them all and assign them to the general class “cyst”;
instance — highlights specific objects. In the same picture, the model will assign each brush its own number or symbol.

In this image, the model performed semantic segmentation - it highlighted the dark areas in the patient's lungs in red. The doctor will then evaluate them and make a conclusion. Source — In this image, the model performed semantic segmentation – it highlighted in red the dark areas in the patient's lungs. The doctor will then evaluate them and make a conclusion. Source

An example of instance segmentation. In the image, each organ and even each vertebra is defined as a separate object and highlighted in its own color. Source — An example of instance segmentation. In the image, each organ and even each vertebra is defined as a separate object and highlighted with its own color. Source

In addition to the main three, CV has other tasks, such as recognition and generation. Recognition is the comparison of an image with already known samples, for example, to identify a person in a photograph. And generation is the creation of new images based on existing ones.

The generation task is also used in medicine, mainly to train new models. There is not much high-quality medical data on which to train algorithms, and its preparation takes a lot of time. Therefore, data is created artificially based on existing data using generative algorithms.

All of the above tasks are often solved in combination rather than separately. To do this, several models are combined into a cascade — a chain that consistently performs a given set of actions. For example, it first detects an object, and then classifies or recognizes it:

detection shows where the tumor is located on the image and highlights that area of the image;
classification determines whether the tumor is likely to be malignant;
segmentation finds its exact boundaries;
Recognition matches the tumor type with those known to the model with a certain accuracy.

What tasks can a CV solve in medicine?

Let's look at several examples of medical areas in which classification, detection and segmentation problems need to be solved.

Diagnostics

This is the main area where computer vision is used. Among the tasks, for example, are the detection and localization of a tumor or other neoplasm, assessment of brain activity, study of tissue densities, etc. With the help of CV, it is possible to find patterns and anomalies in images such as X-rays, CT, MRI – this helps to make a diagnosis faster and more accurately.

For example, CV algorithms were used during the coronavirus pandemic: they helped evaluate the results of CT scans of patients and thereby reduced the workload of doctors. In 30 seconds, such an algorithm could process up to 400 photos – this is much more than a person could manage.

Observation

Here we are talking about patients with a confirmed disease. Computer vision helps to track changes in their condition in time: the growth or reduction of a tumor, the speed of tissue recovery after an injury, etc. For example, CV can be used to prevent bedsores in bedridden patients. Such algorithms track the patient's posture, assess the risk of bedsores and analyze damage that has already occurred.

A device for analyzing pressure ulcers. Cameras and sensors collect information about the patient's posture and condition. A computer model receives this data and draws conclusions about the risk of developing pressure ulcers. Source — A device for analyzing bedsores. Cameras and sensors collect information about the patient's posture and condition. A computer model receives this data and draws conclusions about the risk of developing bedsores. Source

Drug Analysis

In this area, other machine learning algorithms are usually used, which work not with images, but, for example, with chemical formulas. At the same time, computer vision is also used – for specific tasks. For example, ratingshow fast an antibiotic destroys bacterial cell cultures. The CV algorithm detects areas in a Petri dish where growth inhibition is most active and helps select an antibiotic for treatment.

The circles that the model highlighted in the image of the Petri dish are zones where cell cultures died. The wider the zone, the more bacteria were killed by the antibiotic sample placed in the center. Source — The circles that the model highlighted in the image of the Petri dish are zones where cell cultures died. The wider the zone, the more bacteria were destroyed by the antibiotic sample placed in the center. Source

Production

Another area of application of CV in medicine is the creation of drugs. Algorithms control quality in pharmaceutical factories: for example, they are checkingwhether all the tablets in the blister are in place, whether the integrity of the packaging is intact and whether the product is labeled correctly.

Why is medicine a special field for a CV?

Computer vision in medicine has its own characteristics that distinguish it from other machine learning tasks. This is due to the fact that medical images are bulky and voluminous types of images that contain large amounts of data and require more complex processing methods.

Let's take a closer look at these features.

Medical data is most often three-dimensional. For a full diagnosis, a two-dimensional image is not enough. X-rays are taken in several projections, and more complex studies, such as CT or MRI, create detailed 3D models. These can be very large, include thousands of elements and contain metadata such as information about the patient and the procedure.

That's why n-dimensional convolutional neural networks are used to process such data. They analyze images in three dimensions, and sometimes in four dimensions — when it's important to take into account changes in data over time, such as when scanning brain activity. These networks require a lot of computing power, so they don't work on regular computers — special servers or cloud services are needed.

The scan results are not ordinary images. 3D images contain data in the form of voxels (three-dimensional pixels) or vectors. X-rays, CT scans, and MRIs store them in specific formats that differ from conventional images. Here are some examples:

DICOM — a file that, in addition to the image, contains a lot of metadata, such as information about the patient and the study itself;
NIFTI — a standard for neuroimaging, such as brain scans. The file contains information about the orientation of objects and changes in images over time. That is, this format is closer to a video or signal than to a photo;
NrrD — a format for n-dimensional raster data that is almost unprocessed and close to the original, which helps to simplify computer processing.

X-ray in DICOM format: a text layer with patient and equipment information can be placed on top of the image. Source — X-ray image in DICOM format: a text layer with patient and equipment information can be placed on top of the image. Source

There are dozens of such formats, and before loading into the model, each one needs to be converted into a form understandable to the algorithm. This creates an additional task for specialists – how to correctly convert data without losing important information.

The data is more difficult to prepare. Before using the model in real-world tasks, it needs to be trained on a large amount of test data. For example, this could be 3D images with tumors marked by voxels. Marking often has to be done manually, and it needs to be done by a doctor who can accurately read such images. This takes a lot of time: imagine that you need to highlight every point of the tumor on a 1000 × 1000 × 1000 image. In addition, in addition to the main work, the specialist must learn how to use the marking program.

Because of this, medical CVs often lack high-quality datasets for training. In such cases, generative artificial intelligence is used, which creates “synthetic” sets based on existing data. This is faster than manual labeling and allows doctors to take some of the workload off their shoulders.

Accuracy of interpretation is of great importanceIn medicine, even a small error in calculations can be critical, because we are talking about life and health issues. At the same time, scans contain huge amounts of data that can complicate the work of the model.

The problem of overfitting often arises. This is a situation when the model adjusts too precisely to the training data and loses the ability to correctly generalize to new data. In this case, the algorithm gives confident, but sometimes incorrect answers. A separate task arises – to calibrate the model in order to increase accuracy and reduce excessive confidence in forecasts.

The image shows the results of brain tumor segmentation on MRI images. The top row shows the true labels (labeled by experts or reference data), and the bottom row shows the labels predicted by the segmentation model. You can see that the differences are minimal. Source

How CVs are evolving in medicine

One of the important tasks is to cope with the difficulties we described in the previous section. That is, to obtain more high-quality data and maintain a high degree of accuracy in research. And also to expand the use of CV and neural networks in medicine. For example:

Develop integrated solutions. Algorithms will be able to analyze more than just one disease. For example, when diagnosing a heart defect, the model will simultaneously detect signs of osteoporosis.
Increase the level of automation. Minimize the involvement of the doctor when using algorithms in diagnostics. This is a difficult task, because in addition to precise technologies, a legal basis is needed. However, such solutions already exist. For example, the system “Cyberknife” for radiosurgery, it determines where to deliver radiation, and examines the patient's tissue using continuous X-ray scanning in real time.
Implement CV solutions in hospitals. So far, computer vision has been used mainly in high-tech centers. It is important to make it available to ordinary medical institutions. So computer vision is centralized are being implemented in Moscow medical institutions – with its help, more than 11 million studies were carried out.

In the CyberKnife system, an algorithm controls an industrial robot that delivers a stream of charged particles precisely to the areas where the tumor is located. Source

Computer vision is already changing medicine, and its role will grow as technology advances. In the future, diagnostics and treatment may become faster and easier because some of the tasks will be taken over by the computer.

Skillfactory and TSU opened Master's program in computer vision and neural networks, which trains specialists to solve problems in medicine, industry and other high-tech areas. Students will master the basics of mathematics and programming in Python, study the basics of machine learning and working with deep neural networks. They will be able to consolidate the knowledge they gain on real cases from the program's partner companies.

How Computer Vision Helps Treat Diseases