First steps with TensorFlow – Part 3

Part 3 – A simple neural network with TensorFlow

First steps with TensorFlow – Part 3

In the third part of the series "First steps with TensorFlow" I will show how to build a very simple neural network.

The main purpose will be the same that has been described in First steps with TensorFlow – Part 2, i.e. we want to classify the iris in the iris dataset.

A quick review of neural networks

The full discussion of what neural networks (NN) are and how they work is well beyond the purpose of this blog post. Nonetheless, I will review a number of topics necessary for the comprehension of this post.

Neural networks are developed to mimic the neural connection in a brain. A neural network consists of a number of layers, and each layer consists of a number of units (or neurons). The task of every neuron is to process the information received and then transmit it to the neurons in the next layer.

The most frequent question that people find to ask themselves during the implementation of a NN is how to choose the number of hidden layers and hidden neurons. There is not any precise rule, in general the number of hidden layers depends strongly on the problem, at odds to the input and output layers. In particular:

  • Input layer: The input layer is where the network starts. The number of units in this layer is fixed and corresponds exactly to the number of input features.
  • Hidden layers: The number of hidden layer and units per layer are the free parameters that one has to fix. There is no rule to decide these two parameters but it depends strongly on the problem.
  • Output layer: The output layer is where the network ends and the predictions are given. In the case of a classification problem, like the one we are facing, the number of units in the output layer corresponds exactly to the number of classes.

Training of a neural network

Once the architecture of a NN has been set, i.e. the number and type of hidden layers has been defined, we can proceed to the training of the neural network.


The input features are fed to the input layer. From there the network goes through all the hidden layer until the output layer, where the predictions are produced. Given a layer \(i\) and its values \(x_i\), we can write the values \(h_j\) of the next layer \(j\) as follows:

\( h_j = f(W_{j,i}x_i+ b_{j,i})\)

where \(f\) is the activation function, \(W_{j,i}\) is the weight matrix and \(b_{j,i}\) the bias. The activation is the function which activates a neuron and can have several expressions, but in most the cases is either a Rectified Linear Unit (ReLU) or a logistic function.

Hence, between two adjacent layers there is always a weight matrix which is responsible to transmit the information. Therefore for an N-layer NN we have N-1 weight matrices and the j-th matrix, i.e. the matrix of the j-th layer, will be a function of all the j-1 matrices in the previous layers.

Cost function

The output layer delivers a prediction, which must be compared to the real values. This comparison is an estimate of the error. Of course one wants to minimize the error, hence optimize the parameters of the problems, i.e. the weight matrices. One of the best known optimizing algorithms is the gradient descent.


The cost function described is an estimate of the error only of the output layer, but the parameters to be optimized are all the N-1 weight matrices. Therefore, the optimization of a NN is pursued not only through the optimization algorithm, but also by so-called backpropagation.

The back propagation consists in back propagating the error from the output up to the input layer in order to obtain an estimate of the error for every layer. The error estimates obtained in this way allow to optimize the single weight matrices and therefore the NN learns – or converges – quicker.

From the mathematical point of view, this is a consequence of the dependence of the weight matrix on all the weight matrices in the previous layers, hence the estimate of the error of a layer is obtained by simply applying the chain rule for derivatives.

Importing the data

First of all, we start importing the packages we need and the data. Here I will not describe in detail the process of data import, because it has been already described in First steps with TensorFlow – Part 2, but I attach the code for completeness.

The iris dataset is well known for having: 150 examples, 3 iris classes, 50 examples / class

# Import packages
import tensorflow as tf
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
# Import data
data = pd.read_csv("", sep=",",
                   names=["sepal_length", "sepal_width", "petal_length", "petal_width", "iris_class"])
# Shuffle data
data = data.sample(frac=1).reset_index(drop=True)
# Then split `x`, whose columns are normalized to 1, and `y`, one-hot encoded
all_x = data[["sepal_length", "sepal_width", "petal_length", "petal_width"]]
min_max_scaler = preprocessing.MinMaxScaler()
all_x = min_max_scaler.fit_transform(all_x)
all_y = pd.get_dummies(data.iris_class)
# ... and split training and test set
train_x, test_x, train_y, test_y = train_test_split(all_x, all_y, test_size=1 / 3)
# Check the dimensions
# and define number of features, n_x, and number of classes, n_y
n_x = np.shape(train_x)[1]
n_y = np.shape(train_y)[1]
(100, 4)
(100, 3)
(50, 4)
(50, 3)

Definition of a NN

First of all we define the placeholders for x and y, the learning rate, and we start defining a graph, as we did for the logistic regression.

For an introduction to graphs and sessions in TensorFlow read the First steps with TensorFlow – Part 1.

# Reset graph
# Define learning rate
learning_rate = 0.01
# Start graph definition...
g = tf.Graph()
# ... and placeholders
with g.as_default():
    x = tf.placeholder(tf.float32, [None, n_x], name="x")
    y = tf.placeholder(tf.float32, [None, n_y], name="y")

Within the graph we define the NN. For this purpose we use the package tf.contrib.layers. In this package one can find several layer types (e.g. fully connected, convolutional, flatten, …).

The NN we are defining will be composed only by fully connected layers.

Input layer

We can start defining the input layer. The input layer has n_x = 4 units and the number of output units is set to 10, i.e. this is the number of units of the first hidden layer.

# Define the number of neurons for each hidden layer:
h1 = 10
h2 = 20
h3 = 10
# From input to 1st hidden layer
with g.as_default():
    fully_connected1 = tf.contrib.layers.fully_connected(inputs=x, num_outputs=h1, 

Hidden layers

In this case the NN has three hidden layers with 10, 20, and 10 units respectively. Therefore every hidden layer takes as input the output of the previous layer.

# From 1st to 3rd hidden layer, through the 2nd
with g.as_default():
    fully_connected2 = tf.contrib.layers.fully_connected(inputs=fully_connected1, num_outputs=h2, 
    fully_connected3 = tf.contrib.layers.fully_connected(inputs=fully_connected2, num_outputs=h3, 

Output layer

Eventually the output layer takes as input the output of the third hidden layer and makes the prediction. Therefore this layer has n_y = 3 units.

# From 3rd hidden layer to output
with g.as_default():
    prediction = tf.contrib.layers.fully_connected(inputs=fully_connected3, num_outputs=n_y, 

At last, we define the cost function

# Cost function
with g.as_default():
   cost = tf.losses.softmax_cross_entropy(onehot_labels=y, logits=prediction,scope="Cost_Function")

Moreover, we define the accuracy estimator and the Adagrad optimizer.

# Accuracy estimator and optimizer
with g.as_default():
    correct_prediction = tf.equal(tf.argmax(prediction, 1, name="Argmax_Pred"), tf.argmax(y, 1, name="Y_Pred"), 
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32, name="Cast_Corr_Pred"), name="Accuracy")

    optimizer = tf.train.AdagradOptimizer(learning_rate, name="Optimizer").minimize(cost)

The graph that we have defined is quite complicated, since TensorFlow adds hidden ops to every layer of the NN. Using the command g.get_operations() one can check all the registered operations.

Training of a NN

Once the definition of the graph is completed, we can start a session and optimize. Since we are using a notebook it is better to use an interactive session, see First steps with TensorFlow – Part 1 for details.

# Start the session
sess = tf.InteractiveSession(graph = g)
# Initialize variables
init = tf.global_variables_initializer()

Let’s optimize

# Train for a number of epochs
training_epochs = 3000
for epoch in range(training_epochs):
    _, c =[optimizer, cost], feed_dict={x: train_x,
                                                  y: train_y})

Now we can see how many correct predictions we have

correct_prediction.eval({x: test_x, y: test_y})
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False,  True,  True,  True], dtype=bool)

And finally evaluate the accuracy

# Evaluate accuracy
print("Accuracy:", accuracy.eval({x: test_x, y: test_y}))
Accuracy: 0.98

Hint: try changing the number of epochs to see how the results change.

Once the optimization is finished it might is useful and interesting to first display the trainable layers and then to save the weights

# Display layers
layers = { v for v in tf.trainable_variables()}
{'Fully_Conn3/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA9080, 
'Out/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA90B8, 
'Fully_Conn1/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABD42BA8, 
'Fully_Conn2/weights': tensorflow.python.ops.variables.Variable object at 0x00000200ABD75A58, 
'Out/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA93C8, 
'Fully_Conn3/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABDA9048, 
'Fully_Conn1/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABD4E0B8, 
'Fully_Conn2/biases': tensorflow.python.ops.variables.Variable object at 0x00000200ABD75A20}
# Display weights
weights = [layer for layer in tf.trainable_variables() if == 'Fully_Conn2/weights'][0]
weights_list =  weights.eval()

… Or displaying the output of the middle layers

# Display output of the 1st hidden layer
training_epochs = 3000
for epoch in range(training_epochs):
    out_fc1 =[fully_connected1], feed_dict={x: train_x,
                                                      y: train_y})

A NN in TensorBoard

We now have a general understanding of how a NN in Tensorflow is implemented. The next step is to display the graph, accuracy and loss in TensorBoard.

The following code is what we have described until now, with the summary to register the results in TensorBoard.

# Define a path where the TensorBoard file will be saved
logs_path = "./demo/nn/"
# Same code as above
g_tb = tf.Graph()
with g_tb.as_default():
    x = tf.placeholder(tf.float32, [None, n_x], name="x")
    y = tf.placeholder(tf.float32, [None, n_y], name="y")
    with tf.name_scope('Neural_Nt'):
        fully_connected1 = tf.contrib.layers.fully_connected(inputs=x, num_outputs=10,