Convolutional neural networks. Creating a neural network for number recognition in the Python programming language
In the modern world, artificial intelligence and machine learning are rapidly developing, changing our daily lives and opening new horizons in various fields. One of the key technologies behind these advances is convolutional neural networks (Convolutional Neural Networks, CNN). These powerful algorithms enable efficient image processing and analysis, with applications ranging from medical diagnostics to security systems.
CNN is suitable for image classification, making it an excellent choice for handwritten digit recognition task.
CNN consists of:
Convolutional Layers (Conv2D): These layers perform convolution operations that help the model extract key features from images, such as edges, textures, and shapes.
Subsampling layers (MaxPooling2D): These layers reduce the dimensionality of the data while preserving important features, which helps speed up training and reduce the risk of overfitting.
Fully connected layers (Dense): These layers are responsible for classification by taking the feature vector from the previous layers and deciding on the class (digit) of the image.
In this article, I will talk about how to create your own convolutional neural network that can recognize numbers in images, whether handwritten or typewritten. We will look at the basic principles of CNN operation, study the architecture of the models, and also step-by-step analyze the process of developing and training a neural network.
Content:
Data preparation.
Creating a project.
Importing libraries.
Setting parameters.
Function for loading and preprocessing images.
Loading and preprocessing of data.
Creating a neural network model.
Model training.
Model evaluation.
Saving the model in Keras format.
Functions for loading and predicting images.
Example of use.
Complete neural network code.
Conclusion.
Data preparation
Data set architecture:
data/
└── train/
├── 0/
│ ├── 0_1.png
│ ├── 0_2.png
│ └── 0_3.png
├── 1/
├── ...
└── 9/
└── test/
├── 0/
├── ...
└── 9/
train/ contains data used to train the neural network model. It is this data that the model will analyze and use in order to “learn” to recognize numbers. Each subdirectory (0, 1, 2, …, 9) represents a digit class and contains images depicting the corresponding digit. This data is fed to the model's input during the training phase (model.fit), and the model adjusts its parameters based on this data.
test/ contains data used to test the performance of the trained model. This data is not seen by the model during training and is used to evaluate its ability to generalize to new data. Similar to the train folder, each subdirectory (0, 1, 2, …, 9) represents a digit class and contains images with the corresponding digit. This data is fed into the model's input during the evaluation phase (model.evaluate) to determine the accuracy of the model on new, previously unseen data. Separating the data into train and test helps avoid overtraining the model and allows you to honestly evaluate its performance on new data, which is an important step in the machine learning process. This makes the model more robust and capable of generalizing rather than simply “remembering” the training data.
Each file must contain an image of one number (0-9).
data/train/0/: photographs of handwritten zeros.
data/train/1/: photographs of handwritten units.
…
data/test/0/: photographs of handwritten zeros for testing.
data/test/1/: photographs of handwritten units for testing.
…
Minimum quantity:
For each class (0-9): It is recommended to have at least 100 images. Ideally, the more the better. For example, 500-1000 images per class will give more consistent results.
General guideline:
Training set (train): 1000-5000 images for each class.
Test set (test): 200-500 images for each class.
Machine learning models, including neural networks, are typically trained and tested using a standard data ratio. The most common ratio is 80/20 or 70/30. This means that 80% (or 70%) of the data is used to train the model (train), and the remaining 20% (or 30%) is for testing (test).
Creating a Project
Create a new project and two .py files: create_model.py And test_model.py
Script create_model.py is designed to create and train a convolutional neural network (CNN) model that can recognize numbers in images, both handwritten and typewritten.
Main functions and steps in the script:
Import of necessary libraries and modules (TensorFlow, NumPy, PIL).
Define image parameters (for example, height and width).
Function for loading and preprocessing images from the train and test folders.
Loading and preprocessing of data.
Creating a CNN model architecture using Conv2D, MaxPooling2D, Flatten and Dense layers.
Compile the model, specifying the optimizer, loss function, and metrics.
Training the model on training data.
Model evaluation on test data.
Saving the trained model in Keras format.
Script test_model.py is designed to load a saved model and make predictions on new images, as well as test the accuracy of the model's predictions.
Main functions and steps in the script:
Import of necessary libraries and modules (TensorFlow, NumPy, PIL).
Loading a trained model from a file saved in create_model.py.
A function to load and preprocess a new image for prediction.
A function to perform a prediction on a loaded image using a model.
An example of using the prediction function for a new image.
Output of the predicted figure and comparison with the expected figure.
Import libraries (create_model.py).
# Импорт модулей
import tensorflow as tf # pip install tensorflow
import numpy as np # pip install numpy
from PIL import Image # pip install pillow
import os
TensorFlow. A powerful library for machine learning and deep learning created by Google. This project uses it to create and train a neural network. We use TensorFlow to create a neural network model, define model layers, compile and train the model. TensorFlow provides flexibility, scalability, and a variety of tools for building and training neural networks. It also supports the high-level Keras API, which simplifies the development process.
NumPy. A library for working with multidimensional arrays and high-level mathematical functions. It is widely used in scientific computing and data analysis. This project uses NumPy to convert images into arrays, normalize data, and work with arrays of data. NumPy works efficiently with large data sets and provides many useful functions for processing them, making it ideal for working with data in machine learning.
Pillow (Python Imaging Library, PIL). Library for working with images. It provides tools for opening, manipulating and saving various image formats. This project uses Pillow to load images, convert them to grayscale, and resize them to the desired size. Pillow is the standard library for working with images in Python, providing a rich set of tools and support for many image formats.
os. A built-in Python library that provides functions to interact with the operating system. In this project, os is used to navigate the file system, get a list of files in a directory, and create file paths. os provides all the necessary functions for working with files and directories, making the code platform independent.
These imports provide all the necessary tools to perform data loading, data preprocessing, and neural network creation tasks.
Setting parameters (create_model.py).
# Параметры
img_height = 28 # Высота изображений в пикселях
img_width = 28 # Ширина изображений в пикселях
This block of code defines the parameters that will be used to preprocess the images.
These parameters set the size of the images to which they will be reduced before being transferred to the neural network. All images will be resized to 28×28 pixels. Consistent image size is important for consistency of model inputs, which helps the model learn and process the data better.
Why exactly 28×28 pixels? Small enough to be computationally efficient, yet large enough to contain important details for recognition.
Function for loading and preprocessing images (create_model.py).
# Функция для загрузки и предобработки изображений
def load_images_from_folder(folder):
images = []
labels = []
for label in range(10):
path = os.path.join(folder, str(label))
for filename in os.listdir(path):
img_path = os.path.join(path, filename)
img = Image.open(img_path).convert('L')
img = img.resize((img_width, img_height))
img = np.asarray(img)
img = img / 255.0
images.append(img)
labels.append(label)
return np.array(images), np.array(labels)
This function loads images from the specified folder, preprocesses them and returns arrays of images and labels.
images = [] | images: a list to store preprocessed images. |
for label in range(10): | We iterate over numbers from 0 to 9, which corresponds to 10 image classes. We generate the path to the folder with images of each class. |
for filename in os.listdir(path): | We go through each file in the folder. |
img = Image.open(img_path).convert('L') | Open the image and convert it to grayscale using convert('L'). |
images.append(img) | Add the preprocessed image to the images list. |
return np.array(images), np.array(labels) | We convert the lists of images and labels into NumPy arrays and return them. |
Loading and preprocessing of data (create_model.py).
# Загрузка и предобработка данных
x_train, y_train = load_images_from_folder('data/train')
x_test, y_test = load_images_from_folder('data/test')
This block of code loads and preprocesses data to train and test the neural network model.
Calling a function load_images_from_folder to load training data:
x_train: An array of images used to train the model.
y_train: An array of labels (classes) corresponding to the training images.
'data/train': Path to the directory containing the training images, divided into folders for each class (0-9).
Calling a function load_images_from_folder to load test data:
x_test: An array of images used to test the model.
y_test: An array of labels (classes) corresponding to test images.
'data/test': Path to the directory containing test images, divided into folders for each class (0-9).
Function load_images_from_folder Loads and preprocesses images, performing actions such as resizing, grayscale conversion, and normalization. Received data sets (x_train, y_train, x_test, y_test) are then used to train and test the model.
This data provides the basis for training a neural network to recognize handwritten digits, and its proper preprocessing plays a key role in achieving high model accuracy results.
# Убедимся, что данные имеют правильные формы перед reshape
print(f'Количество train-изобр.: {x_train.shape[0]}, высота/ширина: {x_train.shape[1]}x{x_train.shape[2]}px')
print(f'Количество test-изобр.: {x_test.shape[0]}, высота/ширина: {x_test.shape[1]}x{x_test.shape[2]}px')
x_train = x_train.reshape(-1, img_height, img_width, 1)
x_test = x_test.reshape(-1, img_height, img_width, 1)
This block of code performs shape checking on the data after loading and preprocessing, and then reshapes the data for use in the neural network model.
Validation of data form:
print(f'Number of train images: {x_train.shape[0]}, height/width: {x_train.shape[1]}x{x_train.shape[2]}px')
print(f'Number of test images: {x_test.shape[0]}, height/width: {x_test.shape[1]}x{x_test.shape[2]}px')
These lines display information about the number of images and their sizes (height and width) in the training (x_train) and test (x_test) datasets. This is useful for checking that the data is loaded correctly and has the expected dimensions (28×28 pixels).
Changing the data form:
x_train = x_train.reshape(-1, img_height, img_width, 1)
x_test = x_test.reshape(-1, img_height, img_width, 1)
reshape reshapes data arrays by adding a fourth dimension that represents a channel (in this case, a single grayscale channel).
-1 indicates that the size of the first dimension (number of images) is automatically calculated based on the total number of elements and the specified dimensions (height, width, channels).
The data is now in the form (number of images, 28, 28, 1), which matches the required input data format for convolutional neural networks in TensorFlow/Keras.
Example for training data set (x_train):
Before reshape: (number of images, 28, 28)
After reshape: (number of images, 28, 28, 1)
These changes are important for the neural network to process data correctly, since the convolutional layers expect data in the format (batch_size, height, width, channels).
Creating a neural network model (create_model.py).
# Создание модели нейросети с использованием Input
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(img_height, img_width, 1)),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
This block of code creates a neural network model architecture for handwritten digit recognition using the TensorFlow/Keras library.
Creating a model using Sequential: model = tf.keras.models.Sequential() | Goal: Sequential is an easy way to create a neural network model in Keras. Layers are added sequentially, one after another. |
Input Layer: tf.keras.layers.Input(shape=(img_height, img_width, 1)) | Purpose: Specifies the form of the input data expected by the model. In this case, the input data is 28×28 pixel images with one channel (grayscale). |
First convolution layer (Conv2D): tf.keras.layers.Conv2D(32, (3, 3), activation='relu') | Goal: This layer applies 32 3×3 filters to the input data, extracting important features such as edges and textures. |
First subsampling layer (MaxPooling2D): tf.keras.layers.MaxPooling2D((2, 2)) | Goal: Reduces the dimensionality of data by aggregating values in a 2×2 pixel region and selecting the maximum value in each region. |
Second convolution layer (Conv2D): tf.keras.layers.Conv2D(64, (3, 3), activation='relu') | Goal: Applies 64 3×3 filters to the data, extracting more complex features. |
Second subsampling layer (MaxPooling2D): tf.keras.layers.MaxPooling2D((2, 2)) | Goal: Reduces the data size again by aggregating the values in a 2×2 pixel region, choosing the maximum value in each region. |
Flatten Layer: tf.keras.layers.Flatten() | Purpose: Converts a multi-dimensional data array into a one-dimensional vector. This is necessary to connect to fully connected (Dense) layers. |
Fully connected layer (Dense): tf.keras.layers.Dense(64, activation='relu') | Purpose: Processes data using 64 neurons. In each neuron, a linear combination of input data occurs and the activation function relu is applied. |
Output fully connected layer (Dense): tf.keras.layers.Dense(10, activation='softmax') | Goal: Processes the output data and uses 10 neurons corresponding to 10 classes (numbers 0-9). Applies a softmax activation function to obtain class probabilities. |
Sequential: Linear model, adding layers sequentially.
Input: Specifies the form of the input data.
Conv2D (32 filters): Extracts basic features.
MaxPooling2D (2×2): Reduces data size.
Conv2D (64 filters): Extracts more complex features.
MaxPooling2D (2×2): Again reduces the data size while preserving important features.
Flatten: Converts data to a one-dimensional vector for fully connected layers.
Dense (64 neurons): Processes data and learns complex patterns with relu activation function.
Dense (10 neurons, softmax): Output layer predicting probabilities for 10 classes (digits 0-9).
Model training (create_model.py).
# Компиляция и обучение модели
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
This block of code is responsible for compiling the neural network model, setting the training parameters, and executing the training process itself.
model.compile(optimizer=”adam”, – Adam (Adaptive Moment Estimation) is an optimizer that combines the advantages of the AdaGrad and RMSProp methods. It adapts to model training by varying the learning rate based on first- and second-order moment. It has good performance on a wide variety of tasks and is the standard choice for many deep learning models.
loss=”sparse_categorical_crossentropy”, – The sparse_categorical_crossentropy loss function is used for multi-class classification problems where labels are represented as integers. It measures the difference between the predicted probabilities and the true labels. sparse_categorical_crossentropy is suitable for our data where the labels are integers (0-9), which simplifies calculations and improves performance.
metrics=[‘accuracy’]) – The metric by which the performance of the model will be assessed. In this case, we use accuracy, which shows what percentage of the model’s predictions are correct. This metric is easy to understand and widely used for classification problems.
model.fit(x_train, y_train, epochs=5) – Model training. x_train, y_train – training data (images and labels) that will be used to train the model. epochs=5 – number of training epochs. One epoch represents one complete pass through the entire training dataset. Why exactly 5 eras? This is a starting value that can be increased to train the model longer and more thoroughly. More epochs can improve accuracy, but can also lead to overfitting.
Learning process:
Compilation: Sets the training parameters and prepares the model for training.
Fit: The process by which a model goes through the training data multiple times (epochs) and updates its weights to minimize the loss function.
The model is now ready to be trained and optimized for the handwritten digit recognition task.
Model evaluation (create_model.py).
# Оценка модели
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Точность на тестовом наборе данных: {test_acc}')
if test_acc < 0.6:
print("Низкая точность. Рекомендуется улучшить модель и данные.")
elif 0.6 <= test_acc < 0.8:
print("Средняя точность. Неплохо, но есть куда стремиться.")
elif 0.8 <= test_acc < 0.9:
print("Хорошая точность. Модель работает хорошо.")
else:
print("Отличная точность. Модель отлично справляется с задачей.")
This block of code evaluates the performance of the trained model on a test dataset and outputs the accuracy of the predictions.
Function model.evaluat() – evaluates the model based on test data. The function takes test data (x_test) and labels (y_test) and returns the value of the loss function (test_loss) and accuracy metric (test_acc). Evaluation on a test data set allows you to understand how well the model generalizes to new, previously unseen data, which is an important indicator of its quality.
print(f'Accuracy on test data set: {test_acc}') – displays the accuracy of the model on the test data set in a readable format. This allows you to quickly and clearly evaluate the performance of the model, understand its current level and identify areas for further improvements.
To evaluate the performance of a convolutional neural network (CNN) model on a digit recognition task, you can use the following accuracy ranges:
Low accuracy (less than 60%)
Average accuracy (60-80%)
Good accuracy (80-90%)
Excellent accuracy (more than 90%)
Saving the model in Keras format (create_model.py).
# Сохранение модели в новом формате Keras
model.save('my_model.keras')
print(f'Модель создана!')
This block of code is responsible for saving the trained neural network model in Keras format and displaying a message indicating the completion of the saving process.
Function model.save() saves the current version of the trained model to disk with the specified name and in the specified format. The file is saved in Keras format (.keras), which includes the model architecture, its weights, training configuration (if applicable), and optimizer. Using model.save makes it easy to load and use the model in the future without having to retrain it.
print(f'Model created!') displays a message to the console confirming that the model was successfully saved. Confirming that the model was successfully saved helps you understand that the process completed correctly and the model is ready for use or further operations.
Functions for loading and predicting images (test_model.py).
import tensorflow as tf
import numpy as np
from PIL import Image
# Параметры
img_height = 28
img_width = 28
# Загрузка модели
model = tf.keras.models.load_model('my_model.keras')
# Функции для загрузки и предсказания изображения
def load_image(filepath):
img = Image.open(filepath).convert('L')
img = img.resize((img_height, img_width))
img = np.array(img)
img = img / 255.0
img = img.reshape(-1, img_height, img_width, 1)
return img
def predict_digit(test_img):
img = load_image(test_img)
prediction = model.predict(img)
return np.argmax(prediction)
model = tf.keras.models.load_model('my_model.keras') – This code loads a previously trained and saved neural network model from a file my_model.keras. This allows the model to make predictions on new data without the need for retraining.
def load_image(filepath): – the function loads an image from the specified file, converts it into a format suitable for the neural network model, and returns the preprocessed image.
img = Image.open(filepath).convert('L') – open the image and convert it to grayscale (shades of gray). The model was trained on black and white images, so it is important to convert new data to the same format.
img = img.resize((img_height, img_width)) – resize the image to 28×28 pixels. The model was trained on 28×28 pixel images, so the input data should be the same size.
img = np.array(img) – Convert the image into an array for further processing. The model accepts data in array format as input.
img = img / 255.0 – Normalize pixel values between 0 and 1. Normalization improves the performance of the model since it was trained on normalized data.
img = img.reshape(-1, img_height, img_width, 1) – Change the shape of the array by adding an extra dimension for the channel (one channel for a black and white image). The model expects data in the format (batch_size, height, width, channels).
def predict_digit(test_img): – the function performs a prediction based on the loaded model and returns the predicted number.
img = load_image(test_img) – Load and preprocess an image for the model. Data preprocessing is necessary to make predictions correctly.
prediction = model.predict(img) – Get model prediction for a given image. The model returns probabilities for each class (numbers 0 to 9).
return np.argmax(prediction) – Determine the class (number) with the highest probability. Return the most likely number predicted by the model.
Usage example (test_model.py).
# Пример использования
test_image="data\sample_a.png"
predicted_digit = predict_digit(test_image)
print(f'Цифра на изображении: {predicted_digit}')
This block of code demonstrates how to use a trained and stored model to make predictions on new images.
test_image=”data/sample_a.png” – Specifies the path to the image on which to perform the prediction. In this case this is an image sample_a.pnglocated in the folder data.
predicted_digit = predict_digit(test_image) – The function takes an image path, loads and preprocesses the image, makes a prediction using the model and returns the predicted digit.
print(f'Digit in image: {predicted_digit}') – Outputs the model's predicted figure for the specified image in a readable format.
File create_model.py:
# Импорт модулей
import tensorflow as tf # pip install tensorflow
import numpy as np # pip install numpy
from PIL import Image # pip install pillow
import os
# Параметры
img_height = 28
img_width = 28
# Функция для загрузки и предобработки изображений
def load_images_from_folder(folder):
images = []
labels = []
for label in range(10):
path = os.path.join(folder, str(label))
for filename in os.listdir(path):
img_path = os.path.join(path, filename)
img = Image.open(img_path).convert('L')
img = img.resize((img_width, img_height))
img = np.asarray(img)
img = img / 255.0
images.append(img)
labels.append(label)
return np.array(images), np.array(labels)
# Загрузка и предобработка данных
x_train, y_train = load_images_from_folder('data/train')
x_test, y_test = load_images_from_folder('data/test')
# Убедимся, что данные имеют правильные формы перед reshape
print(f'Количество train-изобр.: {x_train.shape[0]}, высота/ширина: {x_train.shape[1]}x{x_train.shape[2]}px')
print(f'Количество test-изобр.: {x_test.shape[0]}, высота/ширина: {x_test.shape[1]}x{x_test.shape[2]}px')
x_train = x_train.reshape(-1, img_height, img_width, 1)
x_test = x_test.reshape(-1, img_height, img_width, 1)
# Создание модели нейросети с использованием Input
model = tf.keras.models.Sequential([
tf.keras.layers.Input(shape=(img_height, img_width, 1)),
tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
# Компиляция и обучение модели
model.compile(optimizer="adam",
loss="sparse_categorical_crossentropy",
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5)
# Оценка модели
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f'Точность на тестовом наборе данных: {test_acc}')
# Сохранение модели в новом формате Keras
model.save('my_model.keras')
print(f'Модель создана!')
File test_model.py:
import tensorflow as tf
import numpy as np
from PIL import Image
# Параметры
img_height = 28
img_width = 28
# Загрузка модели
model = tf.keras.models.load_model('my_model.keras')
# Функции для загрузки и предсказания изображения
def load_image(filepath):
img = Image.open(filepath).convert('L')
img = img.resize((img_height, img_width))
img = np.array(img)
img = img / 255.0
img = img.reshape(-1, img_height, img_width, 1)
return img
def predict_digit(test_img):
img = load_image(test_img)
prediction = model.predict(img)
return np.argmax(prediction)
# Пример использования
test_image="data\sample_a.png"
predicted_digit = predict_digit(test_image)
print(f'Цифра на изображении: {predicted_digit}')
Conclusion
Convolutional neural networks (CNNs) are a powerful tool for solving problems related to image processing and analysis. In this article, we looked at how to create your own CNN to recognize numbers, whether handwritten or typewritten. We went through all the stages – from data preparation and project creation to training and evaluating the model, and also learned how to save the model and use it for predictions.
Creating and training a model on real data allows you not only to understand the basic principles of CNN operation, but also to gain practical experience that can be applied in various fields. This project demonstrates how modern technology can be used to solve problems that previously seemed complex and time-consuming.
Keep experimenting and improving your model with more data, using different neural network architectures and optimization techniques. The world of machine learning and artificial intelligence is full of opportunities, and each new project opens up new horizons for you.
I wish you the best of luck in your future machine learning research and projects and hope this article was helpful and inspiring. May your neural networks always be accurate and your projects successful!