TensorFlow With MNIST: From raw to high Levels - Part 1

Last Update: March 2019

1. Introduction

1.1. Objectives

This series of articles aims at leading the Data Scientist newbie through the mastering of the TensorFlow (TF) ecosystem. These articles starts from the very basic level where the practitioner constructs at hand every part of his/her model ; and goes to higher and higher abstraction levels where he/she can use more advanced tools of this ecosystem like Tensorpack.

The MNIST classification problem is now the hello world! of Deep Learning (DL) and any TensorFlow aspirant MUST implement it once in his/her life.

In this first article, we build from scratch a multi layer perceptron (MLP) using TensorFlow. This MLP is architectured to solve the MNIST classification problem. It is then used to obtain a baseline of performances regarding our problem.

This article aims at providing the reader with the basic elements that can help understand TF internals. Later posts will address more advanced programming and Neural Network (NN) architecturing techniques to improve this baseline.

1.2. Outcomes

At the end of this article, the reader will know:

  • * How to construct from scratch an effective basic NN that produces a descent performance for the MNIST classification problem,
  • * How to select the right performance metrics regarding the problem at hand,
  • * How to use Tensorboard to monitor graphically both the computatation graph and the performance metrics while training the NN

1.3. Organization/Code

This article is created from a Jupyter notebook, thus you can copy-and-paste the cells content and make them run step-by-step while your are reading the paper. In addition, all the code is available here.

2. Create/Prepare Your Environment

2.1. Create A Virtual Environment

Here we assume you are running this notebook on your local PC and not on a compute engine like Google Cloud (GCP) Compute Engine. In that case we highly recommand you to construct a Virtual Environment by applying the steps described here. When done you can simply update your environment using the file requirements.txt available in the code repo.

If you entend to execute this notebook on the GCP, please confer to this article to correctly setup your virtual machine (VM).

Finally, to make a long story short, you can run this program within a classical Data Science ready environment in which you install TensorFlow. The last version of TF will be sufficient. In addition one can also install small utilities like tqdm.

Imports:

In [1]:
from __future__ import print_function

import os
import time
import shutil
import numpy as np ; seed=27 ; np.random.seed(seed)
import pandas as pd
import tensorflow as tf
import tensorflow.contrib.learn as learn

from tqdm import tqdm
from tqdm import tnrange, tqdm_notebook

3. The Data

3.1. Obtain The MNIST Data

TF comes with ready-to-use datasets like the MNIST one. These embedded datasets and their associated loaders details are out-of-scope of this article; however we use here the learn.datasets.mnist.read_data_sets() function to quickly cache the MNIST data locally on our machine. Caching means the dataset is download once and persisted locally on the disk. Then it is a good idea to have access to the Internet when you invoke such loader.

Here it is important to notice that loading of the dataset must be done with the one_hot attribute set to True. Indeed it allows the labels to be automatgically one-hot encoded which is generally a good idea when the model is supposed to classify among several classes (e.g. more than 3 classes):

In [2]:
# Please adjust this path according your environment
mnist_data_dir = os.path.join(os.getcwd(), "../data/MNIST")

if not os.path.exists(mnist_data_dir):
    os.makedirs(mnist_data_dir)

datasets = learn.datasets.mnist.read_data_sets(mnist_data_dir, one_hot=True)
WARNING:tensorflow:From <ipython-input-2-6e27d9fd01d1>:7: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From ÛÛÛÛÛ\opt\anaconda3\envs\c3po-asr-study-venv\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From ÛÛÛÛÛ\opt\anaconda3\envs\c3po-asr-study-venv\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ÛÛÛÛÛ\workspace\article-tf-mlp-mnist-1\../data/MNIST\train-images-idx3-ubyte.gz
WARNING:tensorflow:From ÛÛÛÛÛ\opt\anaconda3\envs\c3po-asr-study-venv\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting ÛÛÛÛÛ\workspace\article-tf-mlp-mnist-1\../data/MNIST\train-labels-idx1-ubyte.gz
WARNING:tensorflow:From ÛÛÛÛÛ\opt\anaconda3\envs\c3po-asr-study-venv\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
Extracting ÛÛÛÛÛ\workspace\article-tf-mlp-mnist-1\../data/MNIST\t10k-images-idx3-ubyte.gz
Extracting ÛÛÛÛÛ\workspace\article-tf-mlp-mnist-1\../data/MNIST\t10k-labels-idx1-ubyte.gz
WARNING:tensorflow:From ÛÛÛÛÛ\opt\anaconda3\envs\c3po-asr-study-venv\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.

3.2. Quick Analysis of The Data

At this step of our session, it is generally a good idea to feel the data at hands by looking at some simple statistics:

How many samples?

The dataset is split in 3 categories: train, test, and validation with different sizes:

In [3]:
nb_train_samples = datasets.train.num_examples
nb_test_samples = datasets.test.num_examples
nb_validation_samples = datasets.validation.num_examples
nb_samples = nb_train_samples + nb_test_samples + nb_validation_samples 

print("nb_train_samples = ", nb_train_samples)
print("nb_test_samples = ", nb_test_samples)
print("nb_validation_samples = ", nb_validation_samples)
print("nb_samples = ", nb_samples)
nb_train_samples =  55000
nb_test_samples =  10000
nb_validation_samples =  5000
nb_samples =  70000

The shape of the data?

In [4]:
print("datasets.train.images.shape = ", datasets.train.images.shape)
print("one image shape: datasets.train.images[0].shape = ", datasets.train.images[0].shape)
print("one image content (extract): datasets.train.images[0][600:620] = ", datasets.train.images[0][600:620])
print("min pixel value = ", min(datasets.train.images[0]))
print("max pixel value = ", max(datasets.train.images[0]))
print("mean of all pixels values = ", np.mean(datasets.train.images[0]))
print("std of all pixels values = ", np.std(datasets.train.images[0]))
print("dtype = ", datasets.train.images[0].dtype)
datasets.train.images.shape =  (55000, 784)
one image shape: datasets.train.images[0].shape =  (784,)
one image content (extract): datasets.train.images[0][600:620] =  [0.         0.         0.         0.         0.         0.54509807
 0.9960785  0.9333334  0.22352943 0.         0.         0.
 0.         0.         0.         0.         0.         0.
 0.         0.        ]
min pixel value =  0.0
max pixel value =  0.9960785
mean of all pixels values =  0.15441678
std of all pixels values =  0.33027557
dtype =  float32

Here we can see that the data are ready to processed by a model. Indeed, they seems to be standardized (e.g. the standard deviation is lower than 1). But, wait a minute. To be really sure of that, one should have some real statistics on all the data... We'll check it in the next section.

Now let us take a look to the labels:

In [5]:
print("datasets.train.labels.shape = ", datasets.train.labels.shape, 
      ", type(datasets.train.labels) = ", type(datasets.train.labels))
print("one label shape = ", datasets.train.labels[0].shape)
print("one label content = ", datasets.train.labels[0])
print("data type = ", type(datasets.train.labels[0]))
print("element data type = ", datasets.train.labels[0].dtype)
datasets.train.labels.shape =  (55000, 10) , type(datasets.train.labels) =  <class 'numpy.ndarray'>
one label shape =  (10,)
one label content =  [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
data type =  <class 'numpy.ndarray'>
element data type =  float64

These labels are instances (e.g. one-hot encoded format) of the following classes:

In [6]:
# manually set the classes (because the labels are already one-hot encoded)
classes = [i for i in range(0,10)]
print("classes = ", classes)
classes =  [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The classes are integers ranging from 0 to 9. But this information is already present in the one-hot encoding format of the labels.

3.3. Global Statistics

To have a global feeling of the health of the Data we can compute quick statistics on the datasets:

In [7]:
print("datasets.train.images std = ", np.std(datasets.train.images))
print("datasets.train.images mean = ", np.mean(datasets.train.images))
print("datasets.train.images median = ", np.median(datasets.train.images))
print()
print("datasets.test.images std = ", np.std(datasets.test.images))
print("datasets.test.images mean = ", np.mean(datasets.test.images))
print("datasets.test.images median = ", np.median(datasets.test.images))
print()
print("datasets.validation.images std = ", np.std(datasets.validation.images))
print("datasets.validation.images mean = ", np.mean(datasets.validation.images))
print("datasets.validation.images median = ", np.median(datasets.validation.images))
datasets.train.images std =  0.3081596
datasets.train.images mean =  0.13070042
datasets.train.images median =  0.0

datasets.test.images std =  0.3104803
datasets.test.images mean =  0.13251467
datasets.test.images median =  0.0

datasets.validation.images std =  0.3075404
datasets.validation.images mean =  0.13022237
datasets.validation.images median =  0.0

3.4. Construct The Features Matrix and Labels Vector

In this section we try to build the features matrix $X$, and the associated $y$ labels vector. However, since we are using TF learn.datasets.mnist data, we know that the objects datasets.train.images, datasets.test.images, and datasets.validation.images are already tensors of floats ; thus they are already the features matrices. Same for the labels data.

WARNING

But for the rest of the article, we want to make sure all floating data share the same datatype: float64 for the features, and int32 for the labels. Thus, we just cast the features and labels matrices to the right datatypes:

In [8]:
# without casting
print("type(datasets.train.images) = ", datasets.train.images.dtype)
print("type(datasets.train.labels) = ", datasets.train.labels.dtype)
print()
print("type(datasets.train.images.astype(float)) = ", datasets.train.images.astype(float).dtype)
print("type(datasets.train.labels.astype(int)) = ", datasets.train.labels.astype(int).dtype)
type(datasets.train.images) =  float32
type(datasets.train.labels) =  float64

type(datasets.train.images.astype(float)) =  float64
type(datasets.train.labels.astype(int)) =  int32

3.5. Summary Data For TensorBoard

During the training we want to visualize some information in Tensorboard. For doing this we declare some summary operations through a function that will derive summary data for a given variable:

In [9]:
# to create summary data for a given variable
def variable_summaries(var):
    """
    Attach a lot of summaries to a Tensor (for TensorBoard visualization).
        var: the variable to which summaries are attached 
    """    
    with tf.name_scope('summaries'):
        mean = tf.reduce_mean(var)
        stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
        max = tf.reduce_max(var)
        min = tf.reduce_min(var)
        
    tf.summary.scalar('mean', mean)
    tf.summary.scalar('stddev', stddev)
    tf.summary.scalar('max', max)
    tf.summary.scalar('min', min)
    tf.summary.histogram('histogram', var)
    

4. The Multi Layer Perceptron (MLP)

As aformentionned we define in this article a simple MLP (e.g. no convolution layer) using only basic layers of neurons. To achive this we try to design our MLP as a reusable class with key methods:

  • __init__: a constructor that just initialize the internal state of the MLP by creating some members data

This primitive design will be improved during the articles of this series.

In [10]:
class NN_MNIST_TF_1h:   # 1h for 1 hidden layer
     
    def __init__(self, img_holder, classes, nb_nodes_h1 = 784):
        
        """
        Construct the MLP using the parameters:
            :img_holder: the placeholder object that contains the features (e.g. image pixels values)
            :classes: the classes of the classification problem. It is an array of integer
            :nb_nodes_h1: the number of units or neurones of the single hidden layer. By default we set it to the 
            number of features
        
        In addition, define other parameters specific to the model:
            W,b: the model weights and biases
            in_layer, hid_layer, out_layer: respectively input, hidden, and output layer
            
        """
                
        self.img_holder = img_holder
        self.classes = classes
        self.nb_classes = len(classes)
        self.nb_features = self.img_holder.shape.as_list()[1] # the shape second argument
        self.nb_nodes_h1 = nb_nodes_h1
                
        # build the Layers
        # W, b are the weights and biases that are fitted agains the training data
        
        with tf.name_scope("input_layer"):
            self.w0 = tf.Variable(tf.random_normal([self.nb_features, self.nb_nodes_h1]), dtype=tf.float32)
            self.b0 = tf.Variable(tf.random_normal([self.nb_nodes_h1]), dtype=tf.float32)
            variable_summaries(self.w0)
            variable_summaries(self.b0)
            self.in_layer = tf.add(tf.matmul(img_holder, tf.cast(self.w0, tf.float64)), tf.cast(self.b0, tf.float64))
            self.in_layer = tf.nn.relu(self.in_layer)

        with tf.name_scope("hidden_layer_1"):
            self.w1 = tf.Variable(tf.random_normal([self.nb_nodes_h1, self.nb_nodes_h1]), dtype=tf.float32)
            self.b1 = tf.Variable(tf.random_normal([self.nb_nodes_h1]), dtype=tf.float32)
            variable_summaries(self.w1)
            variable_summaries(self.b1)
            self.hid_layer = tf.add(tf.matmul(self.in_layer, tf.cast(self.w1, tf.float64)), tf.cast(self.b1, tf.float64))
            self.hid_layer = tf.nn.relu(self.hid_layer)
                        
        with tf.name_scope("output_layer"):
            # Notice that this latest W,b series is used because the output layer is built from scratch. 
            # If we had used the built-in softmax() function, then it should have been no more useful
            self.w2 = tf.Variable(tf.random_normal([self.nb_nodes_h1, self.nb_classes]), dtype=tf.float32)
            self.b2 = tf.Variable(tf.random_normal([self.nb_classes]), dtype=tf.float32)
            variable_summaries(self.w2)
            variable_summaries(self.b2)
            self.out_layer = tf.matmul(self.hid_layer, tf.cast(self.w2, tf.float64)) + tf.cast(self.b2, tf.float64)
            

4.1. What Happened Here?

WARNING

In the previous classe definition, the weights and bias model parameters are defined as having tf.float32 datatype. This type is used because it seems to be the only one accepted on my plateform. Indeed, setting the type to tf.float64 failed. A simple workaround consists to define it has tf.float32 and then cast the parameters to tf.float64 when defining the tf.add operation of the layer.

5. Training the MLP

Now we want to fit the MLP weights and bias against our training dataset of MNIST images. During this training we ask to TF to collect some data that we want to visualize in Tensorboard. These data will provide insights regarding the quality of the training process.

5.1. Prepare The Training: Reset The Default Graph

Believe me this harmless operation can save you a lot of debuging time when using TF. Specially when you work in a notebook and you are never sure about what really hapen behind the scene. Thus, do it systematically please:

In [11]:
# Reset the Graph
tf.reset_default_graph()

5.2. Prepare The Training: Parameters and Hyperparameters

The training process is prepared by defining parameters and hyperparameters that frame the process:

In [12]:
# the classes (see above)
classes = [i for i in range(0,10)]

# image characteristics: size, 
img_height = 28
img_width = 28
img_color_depth = 1
nb_features = img_height * img_width * img_color_depth 

# nb neurons on hidden layer
nb_nodes_h1 = nb_features   # should start with the same size of the input layer

# the training hyperparameters (will be grid or random searched after)
learning_rate = 0.01
num_epochs = 2
batch_size = 100    
num_batches = int(datasets.train.images.shape[0]/batch_size)

# Enable logging, checkpoints
tf.logging.set_verbosity(tf.logging.INFO)
checkpoints_saving_step = 400 # save checkpoints every...
test_report_step_interval = 100 # the step interval where to report testing accuracy

checkpoints_dir = "./checkpoints_dir"
model_version = "1.0.0"

# The step to track the progression
step_var = tf.Variable(0, name="step_var", trainable=False)

5.2. Prepare The Training: The MLP and Its Optimizer

In [13]:
# Placeholders to convey the MNIST images to the model optimizer
img_holder = tf.placeholder(tf.float64, [None, nb_features], name='x-input')
lbl_holder = tf.placeholder(tf.int32, [None, len(classes)], name='y-input')

print("img_holder.name = ", img_holder.name)
print("lbl_holder.name = ", lbl_holder.name)

# The MLP
nn_nmist = NN_MNIST_TF_1h(img_holder = img_holder, classes = classes, nb_nodes_h1 = nb_nodes_h1)

# The loss function and the optimizer
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=nn_nmist.out_layer, labels=lbl_holder))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(loss, global_step=step_var)
img_holder.name =  x-input:0
lbl_holder.name =  y-input:0
WARNING:tensorflow:From <ipython-input-13-bd60e9d87cfb>:12: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

5.3. Prepare The Training: The Training and Testing Metrics

We create a subgraph for evaluating the training and testing accuracies. The accuracy will be also displayed in Tensorboard as accuracy:

In [14]:
# accuracy subgraph
with tf.name_scope('accuracy'):
    correct_prediction = tf.equal(tf.argmax(nn_nmist.out_layer, 1), tf.argmax(lbl_holder, 1), name='correct_prediction')
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32), name = 'accuracy')

5.4. Prepare The Training: The Summaries Writers

Now let us define and configure the summaries writers: the objects responsible to write summaries to the tracking files used by Tensorboard to display data:

In [15]:
# the summaries: associated to subgrahs of the big graph

tf.summary.scalar('loss', loss) # loss summary for tensorboard
tf.summary.scalar('accuracy', accuracy)  # remember the accuracy subgraph

# Merge all the summaries 
merged_op = tf.summary.merge_all()
train_writer = tf.summary.FileWriter(os.getcwd() + '/log_train', graph=tf.get_default_graph())
test_writer = tf.summary.FileWriter(os.getcwd() + '/log_test', graph=tf.get_default_graph())

6. Train and Test

Train and test the model. Also write summaries every test_report_step_interval steps by measuring the accuracy:

In [16]:
with tf.Session() as sess:

    start_time = time.time()
    sess.run(tf.global_variables_initializer())

    # for the checkpoints
    if not os.path.exists(checkpoints_dir):
        os.makedirs(checkpoints_dir)
    
    # the checkpoints saver
    chk_saver = tf.train.Saver()
    
    step = 1
    for epoch in tnrange(num_epochs , desc = "Epochs: "):
        for batch in tnrange(num_batches, desc = "Batchs: (for epoch = {}) ".format(epoch)):
            
            # Report the test accuracy
            # run optimizer on feed_dict = (all) test data
            if step % test_report_step_interval == 0:
                _, summary, step, acc = sess.run([optimizer, merged_op, step_var, accuracy], 
                                                feed_dict={img_holder: datasets.test.images.astype(float), 
                                                            lbl_holder: datasets.test.labels.astype(int)})
                test_writer.add_summary(summary, global_step=step)
                test_writer.flush()
                            
            # Report the training accuracy
            # run optimizer on next batch of train data
            else:
                
                img_batch, lbl_batch = datasets.train.next_batch(batch_size)
                _, summary, step = sess.run([optimizer, merged_op, step_var], 
                                            feed_dict={img_holder: img_batch.astype(float), 
                                                       lbl_holder: lbl_batch.astype(int)})
                train_writer.add_summary(summary, global_step=step)
                train_writer.flush()

            # checkpointing and saving every X steps
            if step % checkpoints_saving_step == 0:
                chk_saver.save(sess, os.path.join(checkpoints_dir, "mnist-mlp-session-last.ckpt"))
                tf.logging.info("[INFO] Saved checkpoints for step = %d", step)
                
    # always checkpointing the last version
    chk_saver.save(sess, os.path.join(checkpoints_dir, "mnist-mlp-session-last.ckpt"))
    tf.logging.info("[INFO] Saved checkpoints for final session state")
        
    end_time = time.time()
    tf.logging.info("[INFO] Trained in: %d", (end_time - start_time)) 
    
    # accuracy: execute the subgraph
    acc_full_value = sess.run(accuracy, feed_dict={img_holder: datasets.test.images.astype(float), lbl_holder: datasets.test.labels.astype(int)})
    acc_str_trunc_value = "%0.2f" % (acc_full_value*100)
    tf.logging.info("[INFO] Success rate (e.g. validation accuracy): %f (or %s%%)", acc_full_value, acc_str_trunc_value)
    
INFO:tensorflow:[INFO] Saved checkpoints for step = 400
INFO:tensorflow:[INFO] Saved checkpoints for step = 800

INFO:tensorflow:[INFO] Saved checkpoints for final session state
INFO:tensorflow:[INFO] Trained in: 135
INFO:tensorflow:[INFO] Success rate (e.g. validation accuracy): 0.956300 (or 95.63%)

6.1. What happened?

At each epoch all the training data are "run" by the optimizer as small batches of size batch_size. Then for each batch the "run" on training or testing data is performed as follows:

  • 1st case: if the current step is a multiple of test_report_step_interval then
    • The model accuracy is evaluated here (e.g. sess.run([..., accuracy]) using all the testing (e.g. feed_dist = datasets.test)
    • When done the summaries are reported using the test_writer summary writer object
  • 2nd case: if the current step is not a multiple of test_report_step_interval then
    • The optimizer graph is evaluated using a small part of the training data: the next batch (e.g. feed_dict = datasets.train.next_batch(batch_size))

6.1.1. Monitoring The Training Progression

In order to visualize the training progression, the tqdm.tnrange function of the tqdm lib is used. This function displays a pretty and Jupyter notebook compliant version of the progression bar. The following Figure illustrates what is displayed in the notebook during the training:

Figure 1. The Training Progression Bar using tqdm.

6.1.2. Monitoring The Training and Testing Progression

Here a design decision is taken ; and it consists to compute both train and validation errors "in parallel". The idea is to evaluate the model accuracy on testing data every X steps. This is performed like this because we want to display both the training and validation error uning Tensorboard (e.g. through train_writer and test_writer) during the training phase that can be long.

At the end of the firt training session, one can find in the same forlder of the notebook two subfolders - namely log_train, and log_test that contains the summary informations necessary for tensorboard. The reader can run Tensorboard as follows:

> tensorboard --logdir=log_train:./log_train,log_test:./log_test

The console will either launch a browser instance on the URL http://127.0.0.1:6006 or invite the user to browse it. Figure 2 illustrates what is displayed by Tensorboard when browsing the previous url:

Figure 2. The Training and Validation Errors in Tensorboard.

But, by clicking around in TensorBoard the reader will realizes that in fact there are many different diagrams that have been generated:

  • 26 scalars diagrams: 1 validation accuracy diagram, 1 training loss diagram, 8 diagrams for the hidden layers weights and biases, and so forth
  • 1 computation graph for the whole model,
  • 12 histograms diagrams: 4 histograms per layer (e.g. input_layer, hidden_layer, and output_layer)
  • ...

The following Figures grid illustrates some of them:

Figure 3. Validation accuracy diagrams. Figure 4. Hidden Layer Weights and Biases diagrams. Figure 5. Model Computation Graph.
Figure 6. Histogram for Hidden Layer.

Interpreting The Diagrams

Displaying many metrics is one thing ; but interpreting them to improve the model training and validation performances is another exercise. Unfortunately it is beyond the scope of this Part 1. The next Parts will deal with this topic.

7. Conclusions

In this article the reader has learned how to: (1) Create from scratch a simple NN (aka MLP) using raw TF by handling himself/herself the weights and biases, (2) Train this MLP using some hyperparameters and defining some metrics like the mean accuracy on the testing data, (3) Display many metrics and parameters using Tensorboard.

It is obvious that we have produced many many code compared to higher level frameworks like Keras that let one implements the same NN using ten times less code. But, here we can see and understand exactly what happens under the cover. Which is not the case with higher level frameworks.

We have also implemented a first (and light) attempt of Object Orientation or encapsulation of the TF code behind a well-defined class that represents the MLP. One should go further in this way to hide all the machinery behind well designed functions or methods.

To finish our simple MLP produced an mean accuracy around 95% (using only 2 epochs and without any fine tuning session) which is a quite acceptable score. Indeed, this MLP has no convulution layer, nor any other CNN optimization mechanism for performance improvement.

The next articles of this series will try to go further and fix many of the aforementioned limitations.