Decoding Tensors 1 ! Introduction

Jayanti prasad Ph.D
6 min readAug 26, 2019

Tensors are fundamental objects which have been in use in physics and engineering for a long time. Now machine learning practitioners are also using these objects to manipulate various type of data such as text, images, audio, video etc.

In this article we will discuss what exactly tensors are and why need them ?There are many definitions of tensors and that makes it difficult for beginners to have some handle on tensors. Here I will give a brief tutorial about tensors and I believe that after reading this tutorial no one should have confusion about the tensors.

We all are familiar with variables, which are like boxes which can be used to hold stuff (let us say numbers). A vector can be considered a set of boxes which can be labeled as V1, V2, V3 etc. So a vector in 3-dimensions can be written as :

V = (V1, V2, V3)

We can generalize it in n-dimensions :

V = (V1, V2, V3, …, V_n)

It is more useful to represent a vector with an ‘indexed’ object, such as ‘V_i’

where ‘i’ can vary between 1 to n. So we can define a vector as an object with one index that can vary between 1 and the number of dimensions. We will limit here our self to machine learning only and will not discuss some very interesting properties of vectors such as ‘norm’, ‘scalar product’, ‘linear transformation’ etc.

Now let us consider a set of boxes V1, V2, V3 and every box has smaller boxes (b1, b2) inside so we can label all the smaller boxes as :

b11, b12, b21, b22, b31, b32.

So b22 is the second smaller box inside the V2 bigger box. We can easily identify these objects as elements of a matrix. A matrix can be defined as an object with 2 indices. We can also understand a matrix as a collection of vectors which can be called the ‘rows’ or the ‘columns’. Matrices can be used to represent many things, including images since they be considered as a set of numbers — for every position (x,y) we have in the picture we have a number (for black and white).

Matrices play extremely important in linear systems such as set of linear equations:

2x + 3y = 13

3x — 2y = 0

can be written as A X = Y, where

A = |2 3|

|3 -2|

Basically a matrix, or rather the elements of a matrix, can be represented by an object with 2 indices, a(i, j), where both the indices need not to vary over the same range. In the above examples we have given, the bigger boxes need not to have the same numbers of the smaller boxes as the number of bigger boxes we have.

The range over which the first and second indices of a matrix can vary is called its dimensionality. For example, if for a(i, j), i can vary between 1 to 3 and j between 1 to 2 then dimensionality will be 3 x 2. In general, the first index represents the number of rows in the matrix and the second the number of columns.

We can easily note that ordering is important in matrices, 3 x 2 is not the same as 2 x 3. The first is like a collection of 3 big boxes with 2 small boxes inside each and the second is like to 2 big boxes with 3 small boxes inside each. Basically, matrix represents an arrangement which can be used to arrange objects in 2-dimensions. Let us summarise — what matrices are ?

Matrices are objects with two indices which can vary over different range. Matrices can also be considered as a set of vectors. Now let us come to tensors.

We can collect a set of any number (M) vectors of the same length (dimensionality, N) and form a matrix of dimensions : M X N. Tensors originate in many different ways and one of those is from the matrices.

We can arrange a set of matrices of the same dimensionality in the form of a tensor.

T = (T(1), T(2), T(3)) = (a (i,j), b(i,j), c_(i,j))

Where all the i’s vary over the same range and all the js vary over the same range. Let us consider in the above example i can vary over range 1 to 2 and j can vary over the range 1 to 3 so the elements of T(1), T(2), and T(3) will be:

T(1) = T(111), T(112), T(113), T(121), T(122), T(123)

T(2) = T(211), T(212), T(213), T(221), T(222), T(223)

T(3) = T(311), T(312), T(313), T(321), T(322), T(323)

This is how a tensor is defined. Note that it can be a matrix with every element of the matrix a vector or a vector with every element of it as a matrix. Again a tensor is nothing but some way to arrange objects (numbers).

Tensors have their origin in Physics and have some remarkable properties and discussing about those will take us too far. So we can summarize that a tensor is an object with multiple indices and every index can vary over different range. The number of indices on a tensor is called its rank. In the above example, T is a rank 3 tensor since every component of it has three indices on it.

The number of elements a tensor can hold can be computed by multiplying the range over which its indices vary (dimensionality). In the above example our tensor T has three indices : T(ijk) where i vary from 1 to 3, j vary from 1 to 2 and k vary from 1 to 3 so the number if elements — 3 x 2 x 3 = 18.

Just like vectors and matrices support mathematical operations and so there must be compatibility between the tensors being operated. For example, we cannot add two matrices with different dimensionality we cannot add two tensors with different dimensionality. We can also notice that just like we can rearrange matrices — we can re arrange tensors also.

Now we know that a tensor is an object with multiple indices which vary over different range. Note that this is quite different than what we generally see in Physics, where in most case indices vary over the same range. For example, in metric tensor g(i, j) ‘i’ and ‘j’ bother vary from 0 to 3.

Another difference tensors in machine learning have from the tensors in physics is that in machine learning we have just one type of indices. In physics we can have a third rank tensor as T^i_{jk} where one index is up and two are down or in some other way. This up and down (called covariant and contra variant) is extremely important when tensors are transformed from one coordinate system to another. In machine learning we do not have to bother about these subtle issues.

Let us decode the tensors of machine learning. We have a bunch of 1000 video clips, each 2 minutes long with 15 frame second quality, every frame has resolution 400 X 600 and the clips are coloured (with RGB scheme) what type (rank) tensor we will need to represent this data ? Let us calculate :

Number of clips = 1000

Number of frames in a clip = 2 X 60 X 15 = 1800

Number of rows (pixels along the Y-axis ) in a frame = 400

Number of columns (pixels along the Y-axis) in frame = 600

Number of colors = 3

From the above we can see that this data must be represented by a

rank 5 tenors: T(i, j, k, l, m)

where i vary from 1 to 1000, from 1 to 1800, k from 1 to 400,

l from 1 to 600 and m from 1 to 3.

Total number of elements in this tensor = 1000 X 18000 X 400 X 600 X 3

From this example it is clear that why we need tensors in machine learning. Basically, they can be used to represent any form of higher dimensional data. Machine learning libraries have classes for tensors which have properties and methods and we can use those to pass any data (text, images, audio, video etc.,). Tensors can be passed to functions and we can also apply mathematical operations on tensors.

I end this by asking a question that how many elements a rank ’n’ tensor will have in ’n’ dimensions ? If you know the answer fine, if not then you have to read from the start !

Do not forget to like and share this article if find useful and post comments in case you have any.

--

--