How computer vision works

If I ask you to look at the picture above and list some of the objects that you see in the frame, what objects will they be? You would probably not hesitate to write a long list of the objects you saw. This could include cars, various colorful billboards, brightly colored shops or bollards on the road, tall buildings or potted plants along the road, and more.

If I ask you to describe the picture you saw in just one sentence, you will probably say, “This is New York, Times Square!” Without thinking again. These tasks were very easy for you, as even a person with a lower than average level of intelligence and understanding or a six-year-old child can do the same. However, do you know how you did it?

A very complex process is going on behind the scenes. Human vision is a very complex segment of organic technology that encompasses our eyes and visual cortex… It involves mental models of objects and our abstract understanding of concepts. It also takes into account our personal experience gained as a result of countless interactions with the outside world. Today we see that digital devices can already take pictures with a resolution that surpasses human vision. Computers can also detect objects with great accuracy, along with their color.

What is computer vision?

In technical terms, this refers to the field of computer science that focuses on the study of how computers see and understand digital images. Computer vision covers vision, or perception visual stimulusas well as understanding and extracting complex information. The extracted data can be used in other processes.

In this interdisciplinary field, elements of human vision systems are modeled and automated using sensors and machine learning algorithms. It forms the core underlying artificial intelligence systems… Such systems gain the ability to see and understand their environment using computer vision.

Inspiration is drawn from the human vision system, allowing computers to identify and process objects in images and videos in the same way that humans do.

How does a digital device interpret an image?

The computer sees the image differently than we do. Unlike us, he interprets the entire image in terms of numbers. For a computer, an image is only two-dimensional matrix of numbers, where each entry represents the intensity of light or color for a given position, or the so-called pixels… For a grayscale image, the pixel numbers range from 0 to 255 according to the color intensity.

What we see in comparison with what the computer reads
What we see in comparison with what the computer reads

For example, we have a grayscale image of a handwritten number 8, which is represented as a 28 * 28 matrix with 784 pixels. The first image shows how we see the number 8, and the second and third images show how the computer reads the image of the number 8 in terms of pixels. Here 0 denotes the completely black part, and the intermediate numbers show the shades between black and white according to the image (note the border of the number). As we move towards the maximum number of 255, we move towards the whiter part of the image. The whole matrix will be aligned (converted to one-dimensional array): all strings are concatenated one after the other, as shown in the image below:

Formation of a one-dimensional array of pixels
Formation of a one-dimensional array of pixels

The second line is attached to the first, and the third to the second, thus creating single one-dimensional array of pixels which is served to the hidden layer convolutional neural network… You can also say that picture it’s just an array of pixels

Any color can be represented as a combination of three colors: RED, BLUE and GREEN… If the image is in color, then it is an RGB image and not in grayscale. We can imagine it as three two-dimensional matrices stacked on top of each other… Each of these three matrices corresponds to one channel colors: Red (red), Green (green) and Blue (blue). Thus, we have one matrix for red, another for green and a third for blue, and they are converted into the corresponding one-dimensional arrays. This means that the final color image is a combination of these arrays forming single three-dimensional arrayas shown below:

RGB image
RGB image

What Are the Applications of Computer Vision?

1. Face recognition

It is a powerful security tool used today to “see” those who are trying to access valuable resources. Example: locking the screen of a mobile device. Computer vision algorithms detect facial features in images and compare them to a database of profiles that store digitized faces. Law enforcement agencies are also using facial recognition technology to identify criminals in video streams.

2. Autonomous transport

Self-driving cars need information about their surroundings in order to decide how to behave. Cameras capture video of the vehicle’s surroundings from different angles. This video is fed into computer vision software. Images are processed in real time to find road boundaries, road signs, and other vehicles, objects and pedestrians. The autonomous vehicle can then follow its own path and hopefully no accidents.

3. Image search and object recognition

Computer vision theory is used to identify objects in images, search through image catalogs, and extract information from images.

4. Robotics

Most robotic machines (often in manufacturing) need to see their surroundings in order to perform their tasks.

5. In the field of medical technology

Computer vision algorithms can detect malignant moles on the skin. This can be useful when using X-rays and MRI scans. It can also help to accurately diagnose diseases, detect diseases in a timely manner and intensify treatment processes.


This was just basic information about computer vision. Computer vision theory is a vast topic that has reached great heights thanks to the revolution in artificial intelligence and advances in deep learning and neural networks. Recently, great changes have taken place in this area. Computers have successfully surpassed humans in many of the tasks associated with detecting and marking objects.

find outhow to upgrade in other specialties or master them from scratch:

Other professions and courses

Similar Posts

Leave a Reply