## Part 3 – A simple neural network with TensorFlow

# First steps with TensorFlow – Part 3

## In the third part of the series *"First steps with TensorFlow"* I will show how to build a very simple neural network.

The main purpose will be the same that has been described in First steps with TensorFlow – Part 2, i.e. we want to classify the iris in the iris dataset.

## A *quick* review of neural networks

The full discussion of what neural networks (NN) are and how they work is well beyond the purpose of this blog post. Nonetheless, I will review a number of topics necessary for the comprehension of this post.

Neural networks are developed to mimic the neural connection in a brain. A neural network consists of a number of layers, and each layer consists of a number of units (or neurons). The task of every neuron is to process the information received and then transmit it to the neurons in the next layer.

The most frequent question that people find to ask themselves during the implementation of a NN is how to choose the number of hidden layers and hidden neurons. There is not any precise rule, in general the number of hidden layers depends strongly on the problem, at odds to the input and output layers. In particular:

*Input layer*: The input layer is where the network starts. The number of units in this layer is fixed and corresponds exactly to the number of input features.*Hidden layers*: The number of hidden layer and units per layer are the*free*parameters that one has to fix. There is no rule to decide these two parameters but it depends strongly on the problem.*Output layer*: The output layer is where the network ends and the predictions are given. In the case of a classification problem, like the one we are facing, the number of units in the output layer corresponds exactly to the number of classes.

## Training of a neural network

Once the architecture of a NN has been set, i.e. the number and type of hidden layers has been defined, we can proceed to the training of the neural network.

### Feedforward

The input features are fed to the input layer. From there the network goes through all the hidden layer until the output layer, where the predictions are produced. Given a layer \(i\) and its values \(x_i\), we can write the values \(h_j\) of the next layer \(j\) as follows:

\( h_j = f(W_{j,i}x_i+ b_{j,i})\)where \(f\) is the activation function, \(W_{j,i}\) is the weight matrix and \(b_{j,i}\) the bias. The activation is the function which *activates* a neuron and can have several expressions, but in most the cases is either a Rectified Linear Unit (ReLU) or a logistic function.

Hence, between two adjacent layers there is always a weight matrix which is responsible to *transmit* the information. Therefore for an N-layer NN we have N-1 weight matrices and the *j-th* matrix, i.e. the matrix of the *j-th* layer, will be a function of all the *j-1* matrices in the previous layers.

### Cost function

The output layer delivers a prediction, which must be compared to the real values. This comparison is an estimate of the error. Of course one wants to minimize the error, hence optimize the parameters of the problems, i.e. the weight matrices. One of the best known optimizing algorithms is the gradient descent.

### Backpropagation

The cost function described is an estimate of the error only of the output layer, but the parameters to be optimized are all the N-1 weight matrices. Therefore, the optimization of a NN is pursued not only through the optimization algorithm, but also by so-called **backpropagation**.

The back propagation consists in back propagating the error from the output up to the input layer in order to obtain an estimate of the error for every layer. The error estimates obtained in this way allow to optimize the single weight matrices and therefore the NN *learns* – or converges – quicker.

From the mathematical point of view, this is a consequence of the dependence of the weight matrix on all the weight matrices in the previous layers, hence the estimate of the error of a layer is obtained by simply applying the chain rule for derivatives.

### Importing the data

First of all, we start importing the packages we need and the data. Here I will not describe in detail the process of data import, because it has been already described in First steps with TensorFlow – Part 2, but I attach the code for completeness.

The iris dataset is well known for having: 150 examples, 3 iris classes, 50 examples / class

# Import packages import tensorflow as tf import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn import preprocessing # # Import data data = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data", sep=",", names=["sepal_length", "sepal_width", "petal_length", "petal_width", "iris_class"]) # # Shuffle data data = data.sample(frac=1).reset_index(drop=True) # # Then split `x`, whose columns are normalized to 1, and `y`, one-hot encoded all_x = data[["sepal_length", "sepal_width", "petal_length", "petal_width"]] min_max_scaler = preprocessing.MinMaxScaler() all_x = min_max_scaler.fit_transform(all_x) all_y = pd.get_dummies(data.iris_class) # # ... and split training and test set train_x, test_x, train_y, test_y = train_test_split(all_x, all_y, test_size=1 / 3) # # Check the dimensions print(train_x.shape) print(train_y.shape) print(test_x.shape) print(test_y.shape) # # and define number of features, n_x, and number of classes, n_y n_x = np.shape(train_x)[1] n_y = np.shape(train_y)[1] #

(100, 4) (100, 3) (50, 4) (50, 3) #

### Definition of a NN

First of all we define the placeholders for `x`

and `y`

, the learning rate, and we start defining a graph, as we did for the logistic regression.

For an introduction to graphs and sessions in TensorFlow read the First steps with TensorFlow – Part 1.

# Reset graph tf.reset_default_graph() # # Define learning rate learning_rate = 0.01 # # Start graph definition... g = tf.Graph() # ... and placeholders with g.as_default(): x = tf.placeholder(tf.float32, [None, n_x], name="x") y = tf.placeholder(tf.float32, [None, n_y], name="y") #

Within the graph we define the NN. For this purpose we use the package `tf.contrib.layers`

. In this package one can find several layer types (e.g. fully connected, convolutional, flatten, …).

The NN we are defining will be composed only by fully connected layers.

*Input layer*

We can start defining the input layer. The input layer has `n_x = 4`

units and the number of output units is set to 10, i.e. this is the number of units of the first hidden layer.

# Define the number of neurons for each hidden layer: h1 = 10 h2 = 20 h3 = 10 # # From input to 1st hidden layer with g.as_default(): fully_connected1 = tf.contrib.layers.fully_connected(inputs=x, num_outputs=h1, activation_fn=tf.nn.relu,scope="Fully_Conn1") #

*Hidden layers*

In this case the NN has three hidden layers with 10, 20, and 10 units respectively. Therefore every hidden layer takes as input the output of the previous layer.

# From 1st to 3rd hidden layer, through the 2nd with g.as_default(): fully_connected2 = tf.contrib.layers.fully_connected(inputs=fully_connected1, num_outputs=h2, activation_fn=tf.nn.relu,scope="Fully_Conn2") fully_connected3 = tf.contrib.layers.fully_connected(inputs=fully_connected2, num_outputs=h3, activation_fn=tf.nn.relu,scope="Fully_Conn3") #

*Output layer*

Eventually the output layer takes as input the output of the third hidden layer and makes the prediction. Therefore this layer has `n_y = 3`

units.

# From 3rd hidden layer to output with g.as_default(): prediction = tf.contrib.layers.fully_connected(inputs=fully_connected3, num_outputs=n_y, activation_fn=tf.nn.softmax,scope="Out") #

At last, we define the cost function

# Cost function with g.as_default(): cost = tf.losses.softmax_cross_entropy(onehot_labels=y, logits=prediction,scope="Cost_Function") #

Moreover, we define the accuracy estimator and the Adagrad optimizer.

# Accuracy estimator and optimizer with g.as_default(): correct_prediction = tf.equal(tf.argmax(prediction, 1, name="Argmax_Pred"), tf.argmax(y, 1, name="Y_Pred"), name="Correct_Pred") accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32, name="Cast_Corr_Pred"), name="Accuracy") optimizer = tf.train.AdagradOptimizer(learning_rate, name="Optimizer").minimize(cost) #

The graph that we have defined is quite complicated, since TensorFlow adds hidden ops to every layer of the NN. Using the command `g.get_operations()`

one can check all the registered operations.

### Training of a NN

Once the definition of the graph is completed, we can start a session and optimize. Since we are using a notebook it is better to use an interactive session, see First steps with TensorFlow – Part 1 for details.

# Start the session sess = tf.InteractiveSession(graph = g) #

# Initialize variables init = tf.global_variables_initializer() sess.run(init) #

Let’s optimize

# Train for a number of epochs training_epochs = 3000 for epoch in range(training_epochs): _, c = sess.run([optimizer, cost], feed_dict={x: train_x, y: train_y}) #

Now we can see how many correct predictions we have

correct_prediction.eval({x: test_x, y: test_y}) #

array([ True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, False, True, True, True], dtype=bool) #

And finally evaluate the accuracy

# Evaluate accuracy print("Accuracy:", accuracy.eval({x: test_x, y: test_y})) #

Accuracy: 0.98 #

Hint: try changing the number of epochs to see how the results change.

Once the optimization is finished it might is useful and interesting to first display the trainable layers and then to save the weights

# Display layers layers = {v.op.name: v for v in tf.trainable_variables()} print(layers) #

{'Fully_Conn3/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA9080, 'Out/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA90B8, 'Fully_Conn1/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABD42BA8, 'Fully_Conn2/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABD75A58, 'Out/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA93C8, 'Fully_Conn3/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA9048, 'Fully_Conn1/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABD4E0B8, 'Fully_Conn2/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABD75A20} #

# Display weights weights = [layer for layer in tf.trainable_variables() if layer.op.name == 'Fully_Conn2/weights'][0] weights_list = weights.eval() #

… Or displaying the output of the middle layers

# Display output of the 1st hidden layer training_epochs = 3000 for epoch in range(training_epochs): out_fc1 = sess.run([fully_connected1], feed_dict={x: train_x, y: train_y}) #

### A NN in TensorBoard

We now have a general understanding of how a NN in Tensorflow is implemented. The next step is to display the graph, accuracy and loss in TensorBoard.

The following code is what we have described until now, with the summary to register the results in TensorBoard.

# Define a path where the TensorBoard file will be saved logs_path = "./demo/nn/" # # Same code as above g_tb = tf.Graph() # with g_tb.as_default(): x = tf.placeholder(tf.float32, [None, n_x], name="x") y = tf.placeholder(tf.float32, [None, n_y], name="y") with tf.name_scope('Neural_Nt'): fully_connected1 = tf.contrib.layers.fully_connected(inputs=x, num_outputs=10, activation_fn=tf.n