First steps with TensorFlow

TensorFlow is everywhere these days, it is apparently becoming the library of choice for deep learning applications, and, due to recent advances in hardware technology (TPU performance), might even gain more momentum in the near future.

The main driver for using TensorFlow is to build deep learning systems, and for an experienced developer it is tempting to dive right into the advanced stuff like CNNs and RNNs. TensorFlow offers support of the most common deep learning architectures out of the box and a lot of additional resources are available online. Playing with these things can be extremely fun, see, for example, the famous article by Andrej Karpathy on character RNNs (a TensorFlow implementation of character RNNs is here).

Things can get frustrating, however, if you want to move beyond prefabricated examples and try your own modifications and new ideas: Example code is cluttered with parsing command-line options, instrumenting TensorBoard, and so on. The odds are that module and function names have changed since the example code was written. TensorFlow documentation of, say, RNNs on the other hand assumes that you already have a deep understanding of the library and common usage patterns.

At that point for me at least it was time to get back to the bare bones and to understand all the moving parts of TensorFlow first.

Hello World

Let’s get our hands dirty and run a simple computation in TensorFlow, i.e. calculating the sum of two floats. We initialize TensorFlow by importing the module:

import tensorflow as tf
#

Performing a computation in TensorFlow is slightly different from doing the same computation in plain python. One first needs to define the structure ("graph") of the computation, then start a TensorFlow environment ("session") for the graph, and finally execute the graph in the context of the session. We define the input parameters and their associated data types.

param_x = tf.placeholder(dtype=tf.float32)
param_y = tf.placeholder(dtype=tf.float32)
#

We define the addition operation on param_x and param_y using the TensorFlow built-in function tf.add().

op_x_plus_y = tf.add(param_x, param_y)
#

Now the computation is defined and we create a TensorFlow session

sess = tf.Session()
#

and use it to evaluate op_x_plus_y passing to it the values 20 for param_x and 1.1 from param_y:

result = sess.run(op_x_plus_y, feed_dict={param_x: 20, param_y: 1.1})
#

As you can see the evaluation is triggered by the method sess.run() and the input parameters are passed as a python dictionary feed_dict. We print the result

result
#
21.1
#

and see that TensorFlow got the calculation right: $$x = 20$$, $$y = 1.1$$, $$x + y = 21.1$$

Finally, in order to make this a proper "Hello, World!" example, we create a slightly more sophisticated variant.

magic_numbers = tf.placeholder(dtype=tf.int32, shape=[None])
offset = tf.placeholder(dtype=tf.int32)
magic_numbers_plus_offset = magic_numbers + offset
#
magic_numbers_plus_offset = sess.run(magic_numbers_plus_offset, feed_dict={offset: 10, magic_numbers: [62, 91, 98, 98, 101, 34, 22, 77, 101, 104, 98, 90, 23]})
[chr(i) for i in magic_numbers_plus_offset]
#
['H', 'e', 'l', 'l', 'o', ',', ' ', 'W', 'o', 'r', 'l', 'd', '!']
#

This example contains two novelties

• The parameter shape=[None] in the first call to tf.placeholder, which indicates that the input is one-dimensional array of unknown size.
• The add operation is created using the overloaded operator + rather than tf.add()

We skipped the creation of a session because the session from the previous example was still active.

Basic TensorFlow mechanics

The hello world example was brushing over the building blocks of the TensorFlow runtime environment: graphs, sessions and devices.

Graphs

Any computation in TensorFlow needs to be defined in the context of a graph. In the examples above we did not notice the presence of a graph because the tf.placeholder() and tf.add() statements were implicitly using the default graph. Accordingly the session created by tf.Session() was associated with the default graph. A graph in the TensorFlow sense defines the set of computations which can be performed by a TensorFlow program. The term is misleading insofar as a TensorFlow graph may actually be a collection of disjoint graphs which can be executed independently.

tf.get_default_graph()
#
tensorflow.python.framework.ops.Graph at 0x23286632748;
#

We can get all the operations (nodes and edges) of the graph

tf.get_default_graph().get_operations()
#
[tf.Operation 'Placeholder' type=Placeholder,
tf.Operation 'Placeholder_1' type=Placeholder,
tf.Operation 'Placeholder_2' type=Placeholder,
tf.Operation 'Placeholder_3' type=Placeholder,
#

As we can see, we have inadvertently cluttered the default graph with the operations from the two previous examples. We therefore clean up the default graph

tf.reset_default_graph()
tf.get_default_graph().get_operations()
#
[]
#

and redo the first example above in a separate graph

g1 = tf.Graph()
#
with g1.as_default():
param_x = tf.placeholder(dtype=tf.float32, name = 'x')
param_y = tf.placeholder(dtype=tf.float32, name = 'y')
op_x_plus_y = tf.add(param_x, param_y, name = 'x_plus_y')
#

The statement

with g1.as_default():
#

makes g1 the default graph during the execution of the subsequent block and thus ensures that the tf.placeholder() and tf.add() operations are added to g1 as intended

g1.get_operations()
#
[tf.Operation 'x' type=Placeholder,
tf.Operation 'y' type=Placeholder,
#

whereas the default graph remains empty

tf.get_default_graph().get_operations()
#
[]
#

Note that this time we have given the operations names in order to identify them more easily in the output, e.g. in

tf.placeholder(dtype=tf.float32, name = 'x')
#

It is not necessary to name operations, but it can be useful when visualizing the graph with TensorBoard or when saving and restoring a session.

NB: For me as a veteran Java developer who likes things to be made explicit this whole approach of passing the graph around in the background feels rather weird. The complete story about which graph is associated with an operation is even more complicated. See the definition of _get_graph_from_inputs in ops.py if you are interested.

Sessions

Before executing the add operation defined in the new graph g1 we will take a closer look at TensorFlow sessions.

"A Session places the graph ops onto Devices, such as CPUs or GPUs, and provides methods to execute them."

Thus, a session is operating on a graph. When we created a session above using

sess = tf.Session()
#

We did not specify a graph and it was associated automatically with the default graph. We now explicitly create a session for graph g1 and execute our operation inside that session

with tf.Session(graph = g1) as sess:
result = sess.run(op_x_plus_y, feed_dict={param_x: 20, param_y: 1.1})
#
result
#
21.1
#

If you just want to evaluate a single operation, then instead of sess.run(op_x_plus_y, ...) you can also write op_x_plus_y.eval(..., session=sess)

with tf.Session(graph = g1) as sess:
op_x_plus_y.eval(feed_dict={param_x: 20, param_y: 1.1}, session=sess)
#

The last argument can be skipped if you precede it with the following line of code

tf.InteractiveSession(graph = g1)
#
tensorflow.python.client.session.InteractiveSession at 0x23286667908;
#

which creates a default session an magically places it in the context.

op_x_plus_y.eval(feed_dict={param_x: 20, param_y: 1.1})
#
21.1
#

The InteractiveSession allows for notational convenience which almost makes us forget that we are using TensorFlow

tf.InteractiveSession()
x = tf.constant(20.0)
y = tf.constant(1.1)
(x+y).eval()
#
21.1
#

or even

(x + 1.1).eval()
#
21.1
#

Did you notice that we were implicitly using the default graph again? Anyway, notational convenience comes at a cost: It becomes harder to understand what is happening behind the scenes. Whereas this might be acceptable in an ad-hoc scripting environment like a Jupyter notebook, it is a dangerous path in general.

Devices

In order to execute an operation the session places it in a device. We usually leave it up to the session to choose the devices on which a graph is executed (if there is more than one anyway). If we want to know which device (e.g. cpu or gpu) was used we can activate device placement logging (does not work in Jupyter):

with tf.Session(graph = graph, config=tf.ConfigProto(log_device_placement=True)) as sess:
...
#

In the output we will see one of the following identifiers

"/cpu:0": The CPU of your machine.
"/gpu:0": The GPU of your machine, if you have one.
"/gpu:1": The second GPU of your machine, etc.
#

Placeholders, constants, variables

So far we have taken a purely functional view of TensorFlow computations, with immutable inputs declared by tf.placeholder() an operation declared by tf_add(). As a procedural system TensorFlow also has mutable state which can be declared by tf.Variable(). In addition to placeholders and variables there are constants, defined by tf.constant(). Like everything else in TensorFlow variables and constants are vectorized, i.e. they can be (zero-dimensional) primitive values, (one-dimensional) vectors, or multi-dimensional arrays.

A variable in TensorFlow is initialized with a default value which is typically adjusted by some optimization procedure during graph execution.

graph = tf.Graph()
with graph.as_default():
x = tf.Variable(tf.random_normal(shape = []))
#

The call to tf.random_normal() does not actually initialize anything, it just defines the initialization procedure. Its execution is typically triggered at the start of the session using tf.global_variables_initializer():

with tf.Session(graph = graph) as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(x))
#
-1.24331
#

Note that the type of the variable x is derived from the return type of tf.random_normal(), which is float32 by default.

x.dtype
#
tf.float32_ref
#

TensorFlow is specifically built to perform optimization tasks, e.g. finding optimal weights in a neural network, using gradient descent. As an initial example we will use it to find the minimum of a quadratic function

$$f_c(x) = (x – c)^2$$

for a given value of c. We first define the function f

graph = tf.Graph()
with graph.as_default():
x = tf.Variable(tf.zeros(dtype=tf.float32, shape=[]), trainable=True)
c = tf.placeholder(dtype=tf.float32)
f = (x - c) ** 2
#

and evaluate it with with $$x = 0$$ and $$c = 3.5$$

with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
print("x = {}".format(sess.run(x)))
print("f(x) = {}".format(sess.run(f, feed_dict={c: 3.5})))
#
x = 0.0
f(x) = 12.25
#

We have seen all of this in the previous examples, except for the parameter trainable=True, which will be explained below.

optim = tf.train.GradientDescentOptimizer(learning_rate=0.01)
#

The GradientDescentOptimizer is a powerful utility class which, among other things, can determine analytic derivatives. We will use it to calculate the derivative

$$\frac{d}{d x} f_c(x) = 2 \cdot (x – c)$$

with $$x = 0$$ and $$c = 3.5$$

with graph.as_default():
#
[(tf.Tensor 'gradients/sub_grad/tuple/control_dependency:0' shape=() dtype=float32,
tensorflow.python.ops.variables.Variable at 0x232866820b8]
#

grads_and_vars is a tensor (function) which contains all variables on which the tensor f depends (directly or indirectly) and their corresponding derivatives. Or, to be more precise:

• derivatives of f refers to the tensors representing the symbolic derivatives of f
• all variables means all variables which were marked as trainable=True during creation. In fact, trainable=True is the default and was only added for clarity.

We get the symbolic derivative of f

with graph.as_default():
#

and evaluate it at $$x = 0$$ and $$c = 3.5$$

with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
dfdx_0 = sess.run(dfdx, feed_dict={c: 3.5})
print("df/dx = {}".format(dfdx_0))
#
df/dx = -7.0
#

Gradient descent works by shifting x by $$-\frac{d}{d x} f_c(x) \cdot \eta$$, where $$\eta = 0.01$$ is the learning rate. The GradientDescentOptimizer has a utility method to create the step function (tensor)

with graph.as_default():
#

If we apply it once

with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
dfdx_0 = sess.run(train_step, feed_dict={c: 3.5})
print("x = {}".format(sess.run(x)))
#
x = 0.07000000029802322
#

x is shifted from $$0$$ to $$0-\frac{d}{d x} f_c(x) \cdot \eta = 7 * 0.01 = 0.07$$ as expected.

with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
dfdx_0 = sess.run(train_step, feed_dict={c: 3.5})
print("x = {}".format(sess.run(x)))
#
x = 0.07000000029802322
#

Instead of hand-holding the optimizer like this we can create one function that calculates and applies the gradient

with graph.as_default():
#

and apply it 100 times

with tf.Session(graph=graph) as sess:
sess.run(tf.global_variables_initializer())
for i in range(100):
sess.run(train_step, feed_dict={c: 3.5})
print(sess.run(x))
#
3.47928
#

to get close to the optimum, which is at $$3.5$$.

Awesome. We have successfully completed our first gradient descent optimization! Almost everything you will do with TensorFlow is based on gradient descent. Part 2 of this tutorial will show a first practical example.