## Part 1 – Basics

# First steps with TensorFlow

## TensorFlow is everywhere these days, it is apparently becoming the library of choice for deep learning applications, and, due to recent advances in hardware technology (TPU performance), might even gain more momentum in the near future.

The main driver for using TensorFlow is to build deep learning systems, and for an experienced developer it is tempting to dive right into the advanced stuff like CNNs and RNNs. TensorFlow offers support of the most common deep learning architectures out of the box and a lot of additional resources are available online. Most popular language models that drive the recent advances in Natural Language Processing are available for Tensorflow. For example for transformer architectures. Playing with these things can be extremely fun, see, for example, the famous article by Andrej Karpathy on character RNNs (a TensorFlow implementation of character RNNs is here).

Things can get frustrating, however, if you want to move beyond prefabricated examples and try your own modifications and new ideas: Example code is cluttered with parsing command-line options, instrumenting TensorBoard, and so on. The odds are that module and function names have changed since the example code was written. TensorFlow documentation of, say, RNNs on the other hand assumes that you already have a deep understanding of the library and common usage patterns.

At that point for me at least it was time to get back to the bare bones and to understand all the moving parts of TensorFlow first.

## Hello World

Let’s get our hands dirty and run a simple computation in TensorFlow, i.e. calculating the sum of two floats. We initialize TensorFlow by importing the module:

import tensorflow as tf #

Performing a computation in TensorFlow is slightly different from doing the same computation in plain python. One first needs to define the structure ("graph") of the computation, then start a TensorFlow environment ("session") for the graph, and finally execute the graph in the context of the session. We define the input parameters and their associated data types.

param_x = tf.placeholder(dtype=tf.float32) param_y = tf.placeholder(dtype=tf.float32) #

We define the addition operation on `param_x`

and `param_y`

using the TensorFlow built-in function `tf.add()`

.

op_x_plus_y = tf.add(param_x, param_y) #

Now the computation is defined and we create a TensorFlow session

sess = tf.Session() #

and use it to evaluate `op_x_plus_y`

passing to it the values `20`

for `param_x`

and `1.1`

from `param_y`

:

result = sess.run(op_x_plus_y, feed_dict={param_x: 20, param_y: 1.1}) #

As you can see the evaluation is triggered by the method `sess.run()`

and the input parameters are passed as a python dictionary `feed_dict`

. We print the result

result #

21.1 #

and see that TensorFlow got the calculation right: \( x = 20 \), \( y = 1.1 \), \( x + y = 21.1 \)

Finally, in order to make this a proper "Hello, World!" example, we create a slightly more sophisticated variant.

magic_numbers = tf.placeholder(dtype=tf.int32, shape=[None]) offset = tf.placeholder(dtype=tf.int32) magic_numbers_plus_offset = magic_numbers + offset # magic_numbers_plus_offset = sess.run(magic_numbers_plus_offset, feed_dict={offset: 10, magic_numbers: [62, 91, 98, 98, 101, 34, 22, 77, 101, 104, 98, 90, 23]}) [chr(i) for i in magic_numbers_plus_offset] #

['H', 'e', 'l', 'l', 'o', ',', ' ', 'W', 'o', 'r', 'l', 'd', '!'] #

This example contains two novelties

- The parameter
`shape=[None]`

in the first call to`tf.placeholder`

, which indicates that the input is one-dimensional array of unknown size. - The add operation is created using the overloaded operator
`+`

rather than`tf.add()`

We skipped the creation of a session because the session from the previous example was still active.

## Basic TensorFlow mechanics

The hello world example was brushing over the building blocks of the TensorFlow runtime environment: *graphs*, *sessions* and *devices*.

### Graphs

Any computation in TensorFlow needs to be defined in the context of a *graph*. In the examples above we did not notice the presence of a graph because the `tf.placeholder()`

and `tf.add()`

statements were implicitly using the default graph. Accordingly the session created by `tf.Session()`

was associated with the default graph. A *graph* in the TensorFlow sense defines the set of computations which can be performed by a TensorFlow program. The term is misleading insofar as a TensorFlow graph may actually be a collection of disjoint graphs which can be executed independently.

tf.get_default_graph() #

tensorflow.python.framework.ops.Graph at 0x23286632748; #

We can get all the operations (nodes and edges) of the graph

tf.get_default_graph().get_operations() #

[tf.Operation 'Placeholder' type=Placeholder, tf.Operation 'Placeholder_1' type=Placeholder, tf.Operation 'Add' type=Add, tf.Operation 'Placeholder_2' type=Placeholder, tf.Operation 'Placeholder_3' type=Placeholder, tf.Operation 'add' type=Add] #

As we can see, we have inadvertently cluttered the default graph with the operations from the two previous examples. We therefore clean up the default graph

tf.reset_default_graph() tf.get_default_graph().get_operations() #

[] #

and redo the first example above in a separate graph

g1 = tf.Graph() # with g1.as_default(): param_x = tf.placeholder(dtype=tf.float32, name = 'x') param_y = tf.placeholder(dtype=tf.float32, name = 'y') op_x_plus_y = tf.add(param_x, param_y, name = 'x_plus_y') #

The statement

with g1.as_default(): #

makes `g1`

the default graph during the execution of the subsequent block and thus ensures that the `tf.placeholder()`

and `tf.add()`

operations are added to `g1`

as intended

g1.get_operations() #

[tf.Operation 'x' type=Placeholder, tf.Operation 'y' type=Placeholder, tf.Operation 'x_plus_y' type=Add] #

whereas the default graph remains empty

tf.get_default_graph().get_operations() #

[] #

Note that this time we have given the operations names in order to identify them more easily in the output, e.g. in

tf.placeholder(dtype=tf.float32, name = 'x') #

It is not necessary to name operations, but it can be useful when visualizing the graph with TensorBoard or when saving and restoring a session.

NB: For me as a veteran Java developer who likes things to be made explicit this whole approach of passing the graph around in the background feels rather weird. The complete story about which graph is associated with an operation is even more complicated. See the definition of `_get_graph_from_inputs`

in ops.py if you are interested.

### Sessions

Before executing the add operation defined in the new graph `g1`

we will take a closer look at TensorFlow *sessions*.

"*A Session places the graph ops onto Devices, such as CPUs or GPUs, and provides methods to execute them.*"

Thus, a session is operating on a graph. When we created a session above using

sess = tf.Session() #

We did not specify a graph and it was associated automatically with the default graph. We now explicitly create a session for graph `g1`

and execute our operation inside that session

with tf.Session(graph = g1) as sess: result = sess.run(op_x_plus_y, feed_dict={param_x: 20, param_y: 1.1}) # result #

21.1 #

If you just want to evaluate a single operation, then instead of `sess.run(op_x_plus_y, ...)`

you can also write `op_x_plus_y.eval(..., session=sess)`

with tf.Session(graph = g1) as sess: op_x_plus_y.eval(feed_dict={param_x: 20, param_y: 1.1}, session=sess) #

The last argument can be skipped if you precede it with the following line of code

tf.InteractiveSession(graph = g1) #

tensorflow.python.client.session.InteractiveSession at 0x23286667908; #

which creates a default session an magically places it in the context.

op_x_plus_y.eval(feed_dict={param_x: 20, param_y: 1.1}) #

21.1 #

The `InteractiveSession`

allows for notational convenience which almost makes us forget that we are using TensorFlow

tf.InteractiveSession() x = tf.constant(20.0) y = tf.constant(1.1) (x+y).eval() #

21.1 #

or even

(x + 1.1).eval() #

21.1 #

Did you notice that we were implicitly using the default graph again? Anyway, notational convenience comes at a cost: It becomes harder to understand what is happening behind the scenes. Whereas this might be acceptable in an ad-hoc scripting environment like a Jupyter notebook, it is a dangerous path in general.

### Devices

In order to execute an operation the session places it in a *device*. We usually leave it up to the session to choose the devices on which a graph is executed (if there is more than one anyway). If we want to know which device (e.g. cpu or gpu) was used we can activate device placement logging (does not work in Jupyter):

with tf.Session(graph = graph, config=tf.ConfigProto(log_device_placement=True)) as sess: ... #

In the output we will see one of the following identifiers

"/cpu:0": The CPU of your machine. "/gpu:0": The GPU of your machine, if you have one. "/gpu:1": The second GPU of your machine, etc. #

### Placeholders, constants, variables

So far we have taken a purely functional view of TensorFlow computations, with immutable inputs declared by `tf.placeholder()`

an operation declared by `tf_add()`

. As a procedural system TensorFlow also has mutable state which can be declared by `tf.Variable()`

. In addition to placeholders and variables there are constants, defined by `tf.constant()`

. Like everything else in TensorFlow variables and constants are vectorized, i.e. they can be (zero-dimensional) primitive values, (one-dimensional) vectors, or multi-dimensional arrays.

A variable in TensorFlow is initialized with a default value which is typically adjusted by some optimization procedure during graph execution.

graph = tf.Graph() with graph.as_default(): x = tf.Variable(tf.random_normal(shape = [])) #

The call to `tf.random_normal()`

does not actually initialize anything, it just defines the initialization procedure. Its execution is typically triggered at the start of the session using `tf.global_variables_initializer()`

:

with tf.Session(graph = graph) as sess: sess.run(tf.global_variables_initializer()) print(sess.run(x)) #

-1.24331 #

Note that the type of the variable `x`

is derived from the return type of `tf.random_normal()`

, which is `float32`

by default.

x.dtype #

tf.float32_ref #

### Gradient descent

TensorFlow is specifically built to perform optimization tasks, e.g. finding optimal weights in a neural network, using gradient descent. As an initial example we will use it to find the minimum of a quadratic function

\( f_c(x) = (x – c)^2 \)for a given value of `c`

. We first define the function `f`

graph = tf.Graph() with graph.as_default(): x = tf.Variable(tf.zeros(dtype=tf.float32, shape=[]), trainable=True) c = tf.placeholder(dtype=tf.float32) f = (x - c) ** 2 #

and evaluate it with with \( x = 0 \) and \( c = 3.5 \)

with tf.Session(graph=graph) as sess: sess.run(tf.global_variables_initializer()) print("x = {}".format(sess.run(x))) print("f(x) = {}".format(sess.run(f, feed_dict={c: 3.5}))) #

x = 0.0 f(x) = 12.25 #

We have seen all of this in the previous examples, except for the parameter `trainable=True`

, which will be explained below.

optim = tf.train.GradientDescentOptimizer(learning_rate=0.01) #

The `GradientDescentOptimizer`

is a powerful utility class which, among other things, can determine analytic derivatives. We will use it to calculate the derivative

with \( x = 0 \) and \( c = 3.5 \)

with graph.as_default(): grads_and_vars = optim.compute_gradients(f) grads_and_vars #

[(tf.Tensor 'gradients/sub_grad/tuple/control_dependency:0' shape=() dtype=float32, tensorflow.python.ops.variables.Variable at 0x232866820b8] #

`grads_and_vars`

is a tensor (function) which contains all variables on which the tensor `f`

depends (directly or indirectly) and their corresponding derivatives. Or, to be more precise:

*derivatives*of`f`

refers to the tensors representing the symbolic derivatives of`f`

*all variables*means all variables which were marked as`trainable=True`

during creation. In fact,`trainable=True`

is the default and was only added for clarity.

We get the symbolic derivative of `f`

with graph.as_default(): dfdx = grads_and_vars[0][0] #

and evaluate it at \( x = 0 \) and \( c = 3.5 \)

with tf.Session(graph=graph) as sess: sess.run(tf.global_variables_initializer()) dfdx_0 = sess.run(dfdx, feed_dict={c: 3.5}) print("df/dx = {}".format(dfdx_0)) #

df/dx = -7.0 #

Gradient descent works by shifting `x`

by \(-\frac{d}{d x} f_c(x) \cdot \eta\), where \(\eta = 0.01\) is the learning rate. The `GradientDescentOptimizer`

has a utility method to create the step function (tensor)

with graph.as_default(): train_step = optim.apply_gradients(grads_and_vars) #

If we apply it once

with tf.Session(graph=graph) as sess: sess.run(tf.global_variables_initializer()) dfdx_0 = sess.run(train_step, feed_dict={c: 3.5}) print("x = {}".format(sess.run(x))) #

x = 0.07000000029802322 #

`x`

is shifted from \(0\) to \(0-\frac{d}{d x} f_c(x) \cdot \eta = 7 * 0.01 = 0.07\) as expected.

with tf.Session(graph=graph) as sess: sess.run(tf.global_variables_initializer()) dfdx_0 = sess.run(train_step, feed_dict={c: 3.5}) print("x = {}".format(sess.run(x))) #

x = 0.07000000029802322 #

Instead of hand-holding the optimizer like this we can create one function that calculates and applies the gradient

with graph.as_default(): train_step = tf.train.GradientDescentOptimizer(0.025).minimize(f) #

and apply it 100 times

with tf.Session(graph=graph) as sess: sess.run(tf.global_variables_initializer()) for i in range(100): sess.run(train_step, feed_dict={c: 3.5}) print(sess.run(x)) #

3.47928 #

to get close to the optimum, which is at \(3.5\).

Awesome. We have successfully completed our first gradient descent optimization! Almost everything you will do with TensorFlow is based on gradient descent. Part 2 of this tutorial will show a first practical example.

See also First steps with TensorFlow – Part 2 | Logistic Regression