Theano is a numerical computation library for Python. It is a common choice for implementing neural network models as it allows you to efficiently define, optimize and evaluate mathematical expressions, including multi-dimensional arrays (numpy.ndaray)
.
Theano Python
Theano makes it possible to attain high speeds that give a tough competition to hand-crafted C implementations for problems involving large amounts of data. It can take advantage of recent GPUs which makes it perform better than C on a CPU by considerable orders of magnitude under certain circumstances.
Theano has got an amazing compiler which can do various optimizations of varying complexity. A few of such optimizations are:
- Arithmetic simplification (e.g:
--x -> x; x + y - x -> y
) - Using memory aliasing to avoid calculation
- Constant folding
- Merging similar subgraphs, to avoid redundant calculation
- Loop fusion for elementwise sub-expressions
- GPU computations
You can see the full list of optimizations here.
Why Theano Python Library?
Typically we manipulate matrices using numpy package, so what make Theano better than any such package!
Theano is a sort of hybrid between numpy and sympy, an attempt is made to combine the two into one powerful library. Let’s have a look at some of its advantages over others:
- Stability Optimization: Theano can find out some unstable expressions and can use more stable means to evaluate them
- Execution Speed Optimization: As mentioned earlier, theano can make use of recent GPUs and execute parts of expressions in your CPU or GPU, making it much faster than Python
- Symbolic Differentiation: Theano is smart enough to automatically create symbolic graphs for computing gradients
Well, enough of theory, let’s start working on the example part.
Theano Tutorial
To start working with Theano, install it using PIP as shown in below image.
Theano Expression into Callable objects
With Theano, we can convert expressions into callable objects. Let’s see a code snippet:
1 2 3 4 5 6 7 8 9 |
import theano from theano import tensor x = tensor.dscalar() y = tensor.dscalar() z = x + y f = theano.function([x,y], z) print(f(1.5, 2.5)) |
When we run this, we get the following output:
Now, let us try to understand what happened in above program:
- We start by declaring two symbolic floating-point scalars or variables
- Then, we created a simple expression to sum two numbers
- After the expression, we convert the expression into a callable object that takes
(x,y)
as input and returns a value forz
after computation - Finally, we call the function with some parameters and print the results
Logistic Function
Let’s have a look at rather more elaborate example than just adding two numbers. Let’s try to compute the logistic curve, which is given by:
If we plot a graph for this equation, it will look like:
Logistic function is applied to each element of matrix. Let’s write a code snippet to demonstrate this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import theano from theano import tensor # declare a variable x = tensor.dmatrix('x') # create the expression s = 1 / (1 + tensor.exp(-x)) # convert the expression into a callable object which takes # a matrix as parameter and returns s(x) logistic = theano.function([x], s) # call the function with a test matrix and print the result print(logistic([[0, 1], [-1, -2]])) |
When we run the script, we can see the ouput as:
Everything works fine, the output looks same as expected. Now let’s have a closer look at the functions.
Closer look at Theano Function
Theano functions help in interacting with the symbolic graph. They allow theano to build the computational graph and optimize it.
A typical theano function looks like this:
1 2 3 |
f= theano.function([x],y) |
Here x is the list of input variables, and y is the list of output variables. Let’s check out how this feature is of great use.
Calculating multiple results at once
Let’s say we have to compute the elementwise difference, absolute difference and difference squared between two matrices ‘x’ and ‘y’. Doing this at the same time optimizes program with significant duration as we don’t have to go to each element again and again for each operation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import theano from theano import tensor # declare variables x, y = tensor.dmatrices('x', 'y') # create simple expression for each operation diff = x - y abs_diff = abs(diff) diff_squared = diff**2 # convert the expression into callable object f = theano.function([x, y], [diff, abs_diff, diff_squared]) # call the function and store the result in a variable result= f([[1, 1], [1, 1]], [[0, 1], [2, 3]]) # format print for readability print('Difference: ') print(result[0]) print('Absolute Difference: ') print(result[1]) print('Squared Difference: ') print(result[2]) |
When we run this program, we can see the output as multiple results being printed:
Using Theano Gradient function
Let’s try some more useful and sophisticated functions as we move towards a minimal training example. Here we’ll try to find out the derivative of an expression with respect to a parameter
We’ll compute the gradient of the logistic function defined above, which can be plotted as:
Let’s demonstrate the working for Gradient with an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import numpy import theano from theano import tensor from theano import pp # declare variable x = tensor.dmatrix('x') #create a simple expression for logistic function s = tensor.sum(1 / (1 + tensor.exp(-x))) # create expression to compute gradient of s with respect to x gs = tensor.grad(s, x) # create callable object dlogistic = theano.function([x], gs) # call the function and print the results print(dlogistic([[0, 1], [-1, -2]])) |
When we run this program, we can see the output as:
In this way Theano can be used for doing efficient symbolic differentiation (as the expression returned by tensor.grad will be optimized during compilation), even for function with many inputs
Let’s put things together into a simple training example to understand theano better!
Minimal Training Theano Example
Let’s try and train something using theano. We will be using gradient descent to train weights in W so that we get better results from the model than existing (0.9):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import theano import numpy # declare variables x = theano.tensor.fvector('x') target = theano.tensor.fscalar('target') W = theano.shared(numpy.asarray([0.2, 0.7]), 'W') # create expressions y = (x * W).sum() cost = theano.tensor.sqr(target - y) gradients = theano.tensor.grad(cost, [W]) W_updated = W - (0.1 * gradients[0]) updates = [(W, W_updated)] # create a callable object from expression f = theano.function([x, target], y, updates=updates) # call the function and print results for i in range(10): result = f([1.0, 1.0], 20.0) print(result) |
When we run this program, we can see the output as:
The second input variable ‘target’ will act as the target value we use for training:
1 2 3 |
target = theano.tensor.fscalar('target') |
We need a cost function to train the model, which is usually squared distance from target value
1 2 3 |
cost = theano.tensor.sqr(target - y) |
Next, we need to calculate partial gradients for the parameters to be updated with respect to the cost function. As we have seen that in earlier example, Theano will do that for us. We simply call the grad function with required arguments:
1 2 3 |
gradients = theano.tensor.grad(cost, [W]) |
Now let’s define a variable for the updated version of the parameter. As we know in gradient descent the updated value equals learning rate times gradient subtracted from existing value.
Assuming learning rate(alpha) = 0.1:
1 2 3 |
W_updated = W - (0.1 * gradients[0]) |
Next we have to define a Theano function again, with a couple of changes:
1 2 3 |
f = theano.function([x, target], y, updates=updates) |
When the function is called, it takes in values for x and target and returns the value for y as output, and Theano performs all the updates in the update list.
Now we repeatedly call the function, in order to train, 10 times in this example to be specific. Typically, training data contains different values, but for sake of this example we use the same values x=[1.0, 1.0] and target=20 each time to check things work correctly.
In the output above, notice how the target value is moving closer to 20 (target value) in each step.
Theano Neural Network Summary
In this post, we discovered the Theano Python library for efficient numerical computation.
We learned that it is a foundation library used for deep learning research and development and that it can be used directly to create deep learning models or by convenient libraries built on top of it, such as Lasagne and Keras.