Fitting to MNIST dataset
On the Internet, you can find 1000 and 1 article on training the Mnist dataset for handwriting recognition. However, when it comes to practice and you start to recognize your own pictures, the model does poorly or not at all. Of course, we can convert the image to grayscale, forcibly change the size to Mnistovsky by 28×28 pixels, and then our network will work with images like this:

Naturally, the main problem is that an arbitrary image is very different from the MNIST image database. The original MNIST numbers are placed in a 20×20 pixel square image. Then the center of mass of the image is calculated and it is placed on a field of size 28×28 pixels in such a way that the center of mass coincides with the center of the field. It is to this form that we must adjust our data.
We use any implementation of the model to recognize MNIST numbers. For example:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.optimizers import SGD
# load train and test dataset
def load_dataset():
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
return trainX, trainY, testX, testY
# scale pixels
def prep_pixels(train, test):
# convert from integers to floats
train_norm = train.astype('float32')
test_norm = test.astype('float32')
# normalize to range 0-1
train_norm = train_norm / 255.0
test_norm = test_norm / 255.0
# return normalized images
return train_norm, test_norm
# define cnn model
def define_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer="he_uniform", input_shape=(28, 28, 1)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer="he_uniform"))
model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer="he_uniform"))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer="he_uniform"))
model.add(Dense(10, activation='softmax'))
# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=['accuracy'])
return model
# run the test harness for evaluating a model
def run_test_harness():
# load dataset
trainX, trainY, testX, testY = load_dataset()
# prepare pixel data
trainX, testX = prep_pixels(trainX, testX)
# define model
model = define_model()
# fit model
model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=1)
# save model
model.save('digit_model.h5')
_, acc = model.evaluate(testX, testY, verbose=0)
print('> %.3f' % (acc * 100.0))
# entry point, run the test harness
run_test_harness()
>>> 99.040
We got pretty good accuracy. Now let’s take our personal pictures and see what the network gives us. The most standard way of preprocessing: scale up to 28 pixels, invert color:
import cv2
import numpy as np
def rec_digit(img_path):
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
gray = 255 - img
gray = cv2.resize(gray, (28, 28))
cv2.imwrite('gray'+ img_path, gray)
img = gray / 255.0
img = np.array(img).reshape(-1, 28, 28, 1)
out = str(np.argmax(model.predict(img)))
return out
Zero was recognized normally, because it is located in the center and is generally located quite well. The rest of the numbers are bad. It turns out that the accuracy on 5 test pictures is only 20 percent.
Once again, we formulate the main thesis, how it looks like dataset: The original black and white (bilevel) images from NIST were size normalized to fit in a 20×20 pixel box while preserving their aspect ratio. The resulting images contain gray levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28×28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28×28 field.
Convert all images to this format. Also note that if the background is off-white, then we will get something very different from the Mnist dataset, a white number on a black background, as in the nine example. Therefore, we add thresholding after the image is read:
def rec_digit(img_path):
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
gray = 255-img
# применяем пороговую обработку
(thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
gray = cv2.resize(gray, (28, 28))
cv2.imwrite('gray'+ img_path, gray)
img = gray / 255.0
img = np.array(img).reshape(-1, 28, 28, 1)
out = str(np.argmax(model.predict(img)))
return out
Now we want to fit the image into a 20×20 pixel box. You can do this in several ways. One option is to find the contour that bounds the digit, take it as the main image and make resize to the required sizes. Examplehow to do it. It can also be useful if you need to recognize numbers from more than one digit.
We will do a little easier and, on the other hand, more reliable. Namely, first we delete all rows and columns in which pixels are only black. Thus, we get a picture that is exactly the rectangular shell of our figure.
def rec_digit(img_path):
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
gray = 255-img
# применяем пороговую обработку
(thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# удаляем нулевые строки и столбцы
while np.sum(gray[0]) == 0:
gray = gray[1:]
while np.sum(gray[:,0]) == 0:
gray = np.delete(gray,0,1)
while np.sum(gray[-1]) == 0:
gray = gray[:-1]
while np.sum(gray[:,-1]) == 0:
gray = np.delete(gray,-1,1)
rows, сols = gray.shape
cv2.imwrite('gray'+ img_path, gray)
gray = cv2.resize(gray, (28, 28))
img = gray / 255.0
img = np.array(img).reshape(-1, 28, 28, 1)
out = str(np.argmax(model.predict(img)))
return out
Next, we want to resize the pictures so that they fit into a 20×20 square. Let’s add a factor so that the longest side is 20 pixels long:
def rec_digit(img_path):
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
gray = 255-img
# применяем пороговую обработку
(thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# удаляем нулевые строки и столбцы
while np.sum(gray[0]) == 0:
gray = gray[1:]
while np.sum(gray[:,0]) == 0:
gray = np.delete(gray,0,1)
while np.sum(gray[-1]) == 0:
gray = gray[:-1]
while np.sum(gray[:,-1]) == 0:
gray = np.delete(gray,-1,1)
rows, сols = gray.shape
# изменяем размер, чтобы помещалось в box 20x20 пикселей
if rows > cols:
factor = 20.0/rows
rows = 20
cols = int(round(cols*factor))
gray = cv2.resize(gray, (cols,rows))
else:
factor = 20.0/cols
cols = 20
rows = int(round(rows*factor))
gray = cv2.resize(gray, (cols, rows))
cv2.imwrite('gray'+ img_path, gray)
gray = cv2.resize(gray, (28, 28))
img = gray / 255.0
img = np.array(img).reshape(-1, 28, 28, 1)
out = str(np.argmax(model.predict(img)))
return out
Now expand the image to 28×28 pixels by adding black rows and columns around the edges using the function np.lib.pad
, which adds zeros around the edges. And immediately delete the line gray = cv2.resize(gray, (28, 28)).
After factorization add:
colsPadding = (int(math.ceil((28-cols)/2.0)),int(math.floor((28-cols)/2.0)))
rowsPadding = (int(math.ceil((28-rows)/2.0)),int(math.floor((28-rows)/2.0)))
gray = np.lib.pad(gray,(rowsPadding,colsPadding),'constant')
In general, the pictures are already arranged quite well. However, the next step is to move the inner box so that its center of mass coincides with the center of the whole picture. Let’s introduce two auxiliary functions. The first calculates the center of mass and the direction of shear:
from scipy.ndimage.measurements import center_of_mass
def getBestShift(img):
cy,cx = center_of_mass(img)
rows,cols = img.shape
shiftx = np.round(cols/2.0-cx).astype(int)
shifty = np.round(rows/2.0-cy).astype(int)
return shiftx,shifty
And actually a function that shifts the picture in the right direction. More about warpAffine. In our case, the following transformation matrix:

def shift(img,sx,sy):
rows,cols = img.shape
M = np.float32([[1,0,sx],[0,1,sy]])
shifted = cv2.warpAffine(img,M,(cols,rows))
return shifted
Add a couple more lines with a shift relative to the center of mass:
shiftx,shifty = getBestShift(gray)
shifted = shift(gray,shiftx,shifty)
gray = shifted
And as a result, we get a full-fledged fit for the Mnist dataset:
from scipy.ndimage.measurements import center_of_mass
import math
import cv2
import numpy as np
def getBestShift(img):
cy,cx = center_of_mass(img)
rows,cols = img.shape
shiftx = np.round(cols/2.0-cx).astype(int)
shifty = np.round(rows/2.0-cy).astype(int)
return shiftx,shifty
def shift(img,sx,sy):
rows,cols = img.shape
M = np.float32([[1,0,sx],[0,1,sy]])
shifted = cv2.warpAffine(img,M,(cols,rows))
return shifted
def rec_digit(img_path):
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
gray = 255-img
# применяем пороговую обработку
(thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
# удаляем нулевые строки и столбцы
while np.sum(gray[0]) == 0:
gray = gray[1:]
while np.sum(gray[:,0]) == 0:
gray = np.delete(gray,0,1)
while np.sum(gray[-1]) == 0:
gray = gray[:-1]
while np.sum(gray[:,-1]) == 0:
gray = np.delete(gray,-1,1)
rows,cols = gray.shape
# изменяем размер, чтобы помещалось в box 20x20 пикселей
if rows > cols:
factor = 20.0/rows
rows = 20
cols = int(round(cols*factor))
gray = cv2.resize(gray, (cols,rows))
else:
factor = 20.0/cols
cols = 20
rows = int(round(rows*factor))
gray = cv2.resize(gray, (cols, rows))
# расширяем до размера 28x28
colsPadding = (int(math.ceil((28-cols)/2.0)),int(math.floor((28-cols)/2.0)))
rowsPadding = (int(math.ceil((28-rows)/2.0)),int(math.floor((28-rows)/2.0)))
gray = np.lib.pad(gray,(rowsPadding,colsPadding),'constant')
# сдвигаем центр масс
shiftx,shifty = getBestShift(gray)
shifted = shift(gray,shiftx,shifty)
gray = shifted
cv2.imwrite('gray'+ img_path, gray)
img = gray / 255.0
img = np.array(img).reshape(-1, 28, 28, 1)
out = str(np.argmax(model.predict(img)))
return out
In general, one might wonder if the shift relative to the center of mass really makes any sense at all, especially if we are working with a 20×20 pixel image? There will be a difference, albeit a small one. Nevertheless, we adjusted an arbitrary image to fit the MNIST dataset.
As a result, the model above using the built image preprocessing gives the following result:
Post written for https://github.com/spbu-math-cs/ml-course