Image classification with MNIST data

Jayanti prasad Ph.D
6 min readNov 16, 2022

Figure 1 : Some sample images from MNIST data set

MNIST or Modified National Institute of Standards and Technology

is an important data sets which most machine learning practitioners use for learning and training purpose. This dataset consists of 60,000 training and 10,000 testing images, each of size 28 x 28, of handwritten digits from 0 to 9. This means that the images belong to one of the ten classes. The main objective here is to predict the class of a test images on the basis of a machine learning model (neural network model) trainined on the training datasets. There are many tutorials on internet which demonstrate how it can be done by using a Convolutional Neural Network built with Keras/Tesnorflow but I found most of them not easy and do not explain the flow in the form of a pipeline so here I am presenting a full code for the same exercise with explanation.

One of the difficulties beginners face when starting a machine learning project is not following any standard template to write their codes. It is expected that a machine learning code will be large and complex and so it does help if it is written in terms of functional units which are logical. I generally follow a template which has the following components or units.

  1. Data Loader

There should be one functional unit dedicated to data loading and processing which should read/load data, check/validate data, clean data and finally normalize the data. It is absolutely necessary to know important properties/features of data before going for any machine learning. I generally write a function called ‘get_data’ which exactly does it.

2. Model Building

Once we know the data we can build a machine learning model, here a neural network. This needs tow type of parameters (apart from model) : parameters which depend on the data, for example, shape of the images or the number of classes and, hyper parameters such as number of layers in a neural network etc. Once we have built a model we can print its summary or visualize it and make sure we are getting exactly what we are expecting. Model building is not a very compute intensive and we can run it many times, till we are fully satisfied.

3. Model Training

Once we have built and validated our model now it is time to load the data, or train the model. This is the most compute intensive part so we must validate our model and data before going in this step. Model training has its own parameters & options apart from the data. The most important choices we need to make are as following (for neural network):

  • Loss function
  • Optimizer
  • Number of epochs for training
  • Batch size
  • Validation split

4. Model Testing

Once a model has been trained it is time to test and validate it using the test data set (hold out data set) and we do this by computing certain performance metrics, such as Precision, Recall, Accuracy, Confusion matrix etc. In the present case we compute precision, recall and confusion matrix.

5. Model Prediction

Once we are satisfied with our model training and it performance it is time to try our trained model on some unknown (not used for training) data set. Note that training is expensive but needs to be done just once but we can make predictions / inference whenever we need it since it is lightweight.

In most cases we store the trained model in a file and use that whenever we need it. There are many trained models freely available on internet (for example on huggingface website) which anyone can use.

6. Model deployment

Once we have a trained model we can deploy that as web app or API server which anyone who has access can connect and make inference. Many tech companies like Microsoft, Google, Amazon make their trained models as APIs which users can use by paying some licensing fees. It is quite straightforward to create a Flask based API server for this purpose (which I will cover in an another article).

Note that the 6 steps I have discussed above need not to be linear all the time we may need to go back & forth for fixing many issues which affect the performance. Now let us see how we can impliment all this in a python code. Link for the full code will be provided at the end.

import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ModelCheckpoint

from PIL import Image
import sys
from sklearn.metrics import precision_score, recall_score, confusion_matrix

def get_data ():
"""
This is data loader, it reads and normalizes the data and also creates training
and test data set. It is recommended to write this module separetly so that if the
data source changes, we can update the program easily.

"""
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

x_train = np.expand_dims(x_train, -1)/255
x_test = np.expand_dims(x_test, -1)/255

return x_train, y_train, x_test, y_test


class CNN_Clsf:

"""
It is useful to write a machine learning model in terms of an Object with
properties and methods. Note that here we have a predict method for inference but it need not be
a part of the object. We will have a separte prediction function also.

"""


def __init__(self, input_shape, num_classes):
self.input_shape = input_shape
self.num_classes = num_classes
self.build_model()

def build_model (self, ):
"""
This is the place wheere can build our model.
"""

input_data = layers.Input(shape=self.input_shape, name = "Input-Layer")
x = layers.Conv2D(32, kernel_size=(3, 3), activation='relu', name='Conv2D-I') (input_data)
x = layers.Conv2D(64, (3, 3), activation='relu',name='Conv2D-II') (x)
x = layers.MaxPooling2D(pool_size=(2, 2), name='MaxPool') (x)
x = layers.Dropout(0.25, name='Dropout-I') (x)
x = layers.Flatten(name='Flatten') (x)
x = layers.Dense(128, activation='relu', name='Output-Dense') (x)
x = layers.Dropout(0.5, name='Dropout-II') (x)
output_data = layers.Dense(self.num_classes, activation='softmax') (x)
self.model = Model (inputs=input_data, outputs=output_data, name='Conv2D-Model')
print(self.model.summary())

def fit_model (self, x_train, y_train, epochs):
"""
This is the training part and it takes maximum time/computation. Note that here
we need to specify the number of epochs for training as well as other parameters.
"""

loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True)
optimizer = keras.optimizers.RMSprop()

model_checkpoint = ModelCheckpoint("model.hdf5",
monitor='val_loss',
mode='min',
save_best_only=True,
verbose=1)


self.model.compile(loss=loss,optimizer = optimizer, metrics=["accuracy"])

history = self.model.fit(x_train, y_train, batch_size=64, epochs=epochs, validation_split=0.2,
callbacks=[model_checkpoint])
return history


def predict (self, x_test):
"""
We can use inbuilt prediction method but we must have it separately also.
"""

return self.model.predict (x_test)



def plot_accuracy (history):

fig, axs = plt.subplots (2,1,figsize=(12,12))

axs[0].set_ylabel("Loss")
axs[1].set_ylabel("Accuracy")

axs[1].plot(history.history['accuracy'],'-o',label="Training")
axs[1].plot(history.history['val_accuracy'],'-o',label='Validation')
axs[0].plot(history.history['loss'],'-o',label='Training')
axs[0].plot(history.history['val_loss'],'-o',label='Validation')

axs[0].legend()
axs[1].legend()
plt.legend()
plt.show()


def plot_data (x_train):
rand_ids = np.random.randint (1, x_train.shape[0],[16])
fig, axs = plt.subplots (4,4,figsize=(12,12))

for i,ax in enumerate(axs.flat):
ax.imshow(x_train[i,:,:])
plt.show()




if __name__ == "__main__":
x_train, y_train, x_test, y_test = get_data ()

plot_data(x_train)

num_classes = 10
input_shape = (28, 28, 1)

M = CNN_Clsf (input_shape, num_classes)
history = M.fit_model (x_train, y_train, 10)


plot_accuracy(history)

test_scores = M.model.evaluate(x_test, y_test, verbose=2)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])

y_predict = M.predict (x_test)
y_hat = [ np.argmax (y_predict[i]) for i in range (0, len (y_predict))]

cm = confusion_matrix(y_test, y_hat)

print("confudion matrix\n", cm)

p = precision_score(y_test, y_hat, average='micro')
r = recall_score(y_test, y_hat, average='micro')

print("precision =",p)
print("recall =",r)

Apart from the above loss accurcay plots, we also get the following performance matrices.

Note that as we mentioned in the start that inference part is different from the training and here the code which can used for the inference.

import tensorflow.keras.models as models
from tensorflow import keras
from sklearn.metrics import precision_score, recall_score, confusion_matrix
import numpy as np


if __name__ == "__main__":

# read the trained model

model = models.load_model("model.hdf5")
print(model.summary())

# get the test data
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1)/255
x_test = np.expand_dims(x_test, -1)/255

#make prediction for test data
y_predict = model.predict (x_test)
y_hat = [ np.argmax (y_predict[i]) for i in range (0, len (y_predict))]

# check performance matrices
cm = confusion_matrix(y_test, y_hat)

print("confudion matrix\n", cm)
p = precision_score(y_test, y_hat, average='micro')
r = recall_score(y_test, y_hat, average='micro')

print("precision =",p)
print("recall =",r)

Please post your feedback, comments & suggestions. If you feel like clapping you can do that also :)

--

--