Fitting to MNIST dataset

On the Internet, you can find 1000 and 1 article on training the Mnist dataset for handwriting recognition. However, when it comes to practice and you start to recognize your own pictures, the model does poorly or not at all. Of course, we can convert the image to grayscale, forcibly change the size to Mnistovsky by 28×28 pixels, and then our network will work with images like this:

Naturally, the main problem is that an arbitrary image is very different from the MNIST image database. The original MNIST numbers are placed in a 20×20 pixel square image. Then the center of mass of the image is calculated and it is placed on a field of size 28×28 pixels in such a way that the center of mass coincides with the center of the field. It is to this form that we must adjust our data.

We use any implementation of the model to recognize MNIST numbers. For example:

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
from tensorflow.keras.optimizers import SGD
 
# load train and test dataset
def load_dataset():
  # load dataset
  (trainX, trainY), (testX, testY) = mnist.load_data()
  # reshape dataset to have a single channel
  trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
  testX = testX.reshape((testX.shape[0], 28, 28, 1))
  # one hot encode target values
  trainY = to_categorical(trainY)
  testY = to_categorical(testY)
  return trainX, trainY, testX, testY
 
# scale pixels
def prep_pixels(train, test):
  # convert from integers to floats
  train_norm = train.astype('float32')
  test_norm = test.astype('float32')
  # normalize to range 0-1
  train_norm = train_norm / 255.0
  test_norm = test_norm / 255.0
  # return normalized images
  return train_norm, test_norm
 
# define cnn model
def define_model():
  model = Sequential()
  model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer="he_uniform", input_shape=(28, 28, 1)))
  model.add(MaxPooling2D((2, 2)))
  model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer="he_uniform"))
  model.add(Conv2D(64, (3, 3), activation='relu', kernel_initializer="he_uniform"))
  model.add(MaxPooling2D((2, 2)))
  model.add(Flatten())
  model.add(Dense(100, activation='relu', kernel_initializer="he_uniform"))
  model.add(Dense(10, activation='softmax'))
  # compile model
  opt = SGD(learning_rate=0.01, momentum=0.9)
  model.compile(optimizer=opt, loss="categorical_crossentropy", metrics=['accuracy'])
  return model
 
# run the test harness for evaluating a model
def run_test_harness():
  # load dataset
  trainX, trainY, testX, testY = load_dataset()
  # prepare pixel data
  trainX, testX = prep_pixels(trainX, testX)
  # define model
  model = define_model()
  # fit model
  model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=1)
  # save model
  model.save('digit_model.h5')
  _, acc = model.evaluate(testX, testY, verbose=0)
  print('> %.3f' % (acc * 100.0))

# entry point, run the test harness
run_test_harness()

>>> 99.040

We got pretty good accuracy. Now let’s take our personal pictures and see what the network gives us. The most standard way of preprocessing: scale up to 28 pixels, invert color:

import cv2
import numpy as np

def rec_digit(img_path):
  img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
  gray = 255 - img
  
  gray = cv2.resize(gray, (28, 28))
  cv2.imwrite('gray'+ img_path, gray)
  img = gray / 255.0
  img = np.array(img).reshape(-1, 28, 28, 1)
  out = str(np.argmax(model.predict(img)))
  return out

Zero was recognized normally, because it is located in the center and is generally located quite well. The rest of the numbers are bad. It turns out that the accuracy on 5 test pictures is only 20 percent.

Once again, we formulate the main thesis, how it looks like dataset: The original black and white (bilevel) images from NIST were size normalized to fit in a 20×20 pixel box while preserving their aspect ratio. The resulting images contain gray levels as a result of the anti-aliasing technique used by the normalization algorithm. the images were centered in a 28×28 image by computing the center of mass of the pixels, and translating the image so as to position this point at the center of the 28×28 field.

Convert all images to this format. Also note that if the background is off-white, then we will get something very different from the Mnist dataset, a white number on a black background, as in the nine example. Therefore, we add thresholding after the image is read:

def rec_digit(img_path):
  img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
  gray = 255-img
  # применяем пороговую обработку
  (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
  
  gray = cv2.resize(gray, (28, 28))
  cv2.imwrite('gray'+ img_path, gray)
  img = gray / 255.0
  img = np.array(img).reshape(-1, 28, 28, 1)
  out = str(np.argmax(model.predict(img)))
  return out
After thresholding has been applied
After thresholding has been applied

Now we want to fit the image into a 20×20 pixel box. You can do this in several ways. One option is to find the contour that bounds the digit, take it as the main image and make resize to the required sizes. Examplehow to do it. It can also be useful if you need to recognize numbers from more than one digit.

We will do a little easier and, on the other hand, more reliable. Namely, first we delete all rows and columns in which pixels are only black. Thus, we get a picture that is exactly the rectangular shell of our figure.

def rec_digit(img_path):
  img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
  gray = 255-img
  # применяем пороговую обработку
  (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
  
  # удаляем нулевые строки и столбцы
  while np.sum(gray[0]) == 0:
    gray = gray[1:]
  while np.sum(gray[:,0]) == 0:
    gray = np.delete(gray,0,1)
  while np.sum(gray[-1]) == 0:
    gray = gray[:-1]
  while np.sum(gray[:,-1]) == 0:
    gray = np.delete(gray,-1,1)
  rows, сols = gray.shape
  
  cv2.imwrite('gray'+ img_path, gray)
  gray = cv2.resize(gray, (28, 28))
  img = gray / 255.0
  img = np.array(img).reshape(-1, 28, 28, 1)
  out = str(np.argmax(model.predict(img)))
  return out
In general, exactly bounding boxes are obtained.
In general, exactly bounding boxes are obtained.

Next, we want to resize the pictures so that they fit into a 20×20 square. Let’s add a factor so that the longest side is 20 pixels long:

def rec_digit(img_path):
  img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
  gray = 255-img
  # применяем пороговую обработку
  (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
  
  # удаляем нулевые строки и столбцы
  while np.sum(gray[0]) == 0:
    gray = gray[1:]
  while np.sum(gray[:,0]) == 0:
    gray = np.delete(gray,0,1)
  while np.sum(gray[-1]) == 0:
    gray = gray[:-1]
  while np.sum(gray[:,-1]) == 0:
    gray = np.delete(gray,-1,1)
  rows, сols = gray.shape
  
  # изменяем размер, чтобы помещалось в box 20x20 пикселей
  if rows > cols:
    factor = 20.0/rows
    rows = 20
    cols = int(round(cols*factor))
    gray = cv2.resize(gray, (cols,rows))
  else:
    factor = 20.0/cols
    cols = 20
    rows = int(round(rows*factor))
    gray = cv2.resize(gray, (cols, rows))
  
  cv2.imwrite('gray'+ img_path, gray)
  gray = cv2.resize(gray, (28, 28))
  img = gray / 255.0
  img = np.array(img).reshape(-1, 28, 28, 1)
  out = str(np.argmax(model.predict(img)))
  return out

Now expand the image to 28×28 pixels by adding black rows and columns around the edges using the function np.lib.pad, which adds zeros around the edges. And immediately delete the line gray = cv2.resize(gray, (28, 28)). After factorization add:

colsPadding = (int(math.ceil((28-cols)/2.0)),int(math.floor((28-cols)/2.0)))
rowsPadding = (int(math.ceil((28-rows)/2.0)),int(math.floor((28-rows)/2.0)))
gray = np.lib.pad(gray,(rowsPadding,colsPadding),'constant')
Added borders to 28x28
Added borders to 28×28

In general, the pictures are already arranged quite well. However, the next step is to move the inner box so that its center of mass coincides with the center of the whole picture. Let’s introduce two auxiliary functions. The first calculates the center of mass and the direction of shear:

from scipy.ndimage.measurements import center_of_mass
def getBestShift(img):
    cy,cx = center_of_mass(img)

    rows,cols = img.shape
    shiftx = np.round(cols/2.0-cx).astype(int)
    shifty = np.round(rows/2.0-cy).astype(int)

    return shiftx,shifty

And actually a function that shifts the picture in the right direction. More about warpAffine. In our case, the following transformation matrix:

def shift(img,sx,sy):
    rows,cols = img.shape
    M = np.float32([[1,0,sx],[0,1,sy]])
    shifted = cv2.warpAffine(img,M,(cols,rows))
    return shifted

Add a couple more lines with a shift relative to the center of mass:

shiftx,shifty = getBestShift(gray)
shifted = shift(gray,shiftx,shifty)
gray = shifted

And as a result, we get a full-fledged fit for the Mnist dataset:

from scipy.ndimage.measurements import center_of_mass
import math 
import cv2
import numpy as np

def getBestShift(img):
    cy,cx = center_of_mass(img)
    
    rows,cols = img.shape
    shiftx = np.round(cols/2.0-cx).astype(int)
    shifty = np.round(rows/2.0-cy).astype(int)

    return shiftx,shifty
  
def shift(img,sx,sy):
    rows,cols = img.shape
    M = np.float32([[1,0,sx],[0,1,sy]])
    shifted = cv2.warpAffine(img,M,(cols,rows))
    return shifted
  
def rec_digit(img_path):
  img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
  gray = 255-img
  # применяем пороговую обработку
  (thresh, gray) = cv2.threshold(gray, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)
  
  # удаляем нулевые строки и столбцы
  while np.sum(gray[0]) == 0:
    gray = gray[1:]
  while np.sum(gray[:,0]) == 0:
    gray = np.delete(gray,0,1)
  while np.sum(gray[-1]) == 0:
    gray = gray[:-1]
  while np.sum(gray[:,-1]) == 0:
    gray = np.delete(gray,-1,1)
  rows,cols = gray.shape
  
  # изменяем размер, чтобы помещалось в box 20x20 пикселей
  if rows > cols:
    factor = 20.0/rows
    rows = 20
    cols = int(round(cols*factor))
    gray = cv2.resize(gray, (cols,rows))
  else:
    factor = 20.0/cols
    cols = 20
    rows = int(round(rows*factor))
    gray = cv2.resize(gray, (cols, rows))

  # расширяем до размера 28x28
  colsPadding = (int(math.ceil((28-cols)/2.0)),int(math.floor((28-cols)/2.0)))
  rowsPadding = (int(math.ceil((28-rows)/2.0)),int(math.floor((28-rows)/2.0)))
  gray = np.lib.pad(gray,(rowsPadding,colsPadding),'constant')

  # сдвигаем центр масс
  shiftx,shifty = getBestShift(gray)
  shifted = shift(gray,shiftx,shifty)
  gray = shifted
  
  cv2.imwrite('gray'+ img_path, gray)
  img = gray / 255.0
  img = np.array(img).reshape(-1, 28, 28, 1)
  out = str(np.argmax(model.predict(img)))
  return out

In general, one might wonder if the shift relative to the center of mass really makes any sense at all, especially if we are working with a 20×20 pixel image? There will be a difference, albeit a small one. Nevertheless, we adjusted an arbitrary image to fit the MNIST dataset.

Images with center of mass shift
Images with center of gravity shift
Images before shifting the center of mass, after adding borders to size 28x28
Images before shifting the center of mass, after adding borders to size 28×28

As a result, the model above using the built image preprocessing gives the following result:

Post written for https://github.com/spbu-math-cs/ml-course

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *