Decoding Tensors 3: PyTorch functions
In the first and second parts of the series on tensors we discussed about the general properties of tensors and how they are implemented in one of the most popular machine learning frameworks Pytorch, respectively.
In this part we will discuss how these tensors are consumed in Pytorch, or in general in machine learning. In brief, there are two consumers of tensors named — activation functions and loss functions. A function is nothing but an object which maps a set of values to another set of values. We all are familiar with standard mathematical functions such as sin(x), cos(x), exp(x), log(x), etc. We can write infinite numbers of other functions using these standard functions.
In what follows we will assume that ‘x’ is a tensor — this is not an approximation but is a generalisation — a number, vector, matrix or anything can be considered a tensor. The activation function takes a tensor of arbitrary rank and map that to another tensor of the same rank. Basically every element of the tensor is transformed in the same way.
Here we will discuss four most common activation functions:
a) Rectified Linear Unit (ReLU)
b) Sigmoid
c) tanh
d) Softmax
The purpose of an activation function is to introduce non-linearity between the input and output variables. There is this belief that with a sequence of linear transformations followed by non-linear activations we can represent any arbitrary (mathematical) relationship, that is exactly the goal of (supervised) machine learning — to identify the relationship between a set of inputs and outputs.
Note that there are a large number of activation functions and we can see the full list with the commands:
>>>import torch.nn.functional as F
>>>print(dir(F))
By definition loss functions quantify the mismatch between the true output and expected output. This mismatch in general is a scalar quantity. We all are familiar with chi-square which is an example of a loss function. Loss functions are generally used near the last layer in a neural network.
Here we will discuss the following common loss functions:
a) Mean Square Error
b) L1 loss
c) Cross Entropy
d) Binary Cross Entropy
e) Cosine Embedding
So let us begin with the activation functions:
Activation Functions
a) Rectified Linear Unit (ReLU)
This is represented by a function which gives output zero if its input is negative and gives the same output when the input is positive.
Now let us create a rank 1 tensor (vector) and apply ‘relu’ on it and the shape of the input and output of the relu are the same.
>>>import torch>>>import numpy as np>>>import torch.nn.functional as F>>>X=np.arange(-10,10,0.5)>>>X.shape
(40,)
>>>T=torch.tensor(X)>>>T.shape
torch.Size([40])
>>>y=F.relu(T)>>>y.shapetorch.Size([40])
See below the RELU function.
b) Sigmoid
This function is written in the following way:
sigma(x) = 1/(1+exp(-x))
Note that for x = — infinity we get sigma(x) = 0 and for x=+ infinity we get it 1. So this function can map any value between -infinity to infinity in the range (0,1). We can easily see that for x=0 its value is 0.5 Let us try this function in Pytorch.
>>>import numpy as np>>>import torch.nn.functional as F>>>X=np.arange(-10,10,0.5)>>>T=torch.tensor(X)>>>y=F.sigmoid(T)>>>y.shapetorch.Size([40])
See below the sigmoid function:
c) tanh
As expected this is represented by function:
tanh(x)
The main property of this function to note is that it can map any value between -infinity to infinity between in the range (-1,1)
>>>import torch.nn.functional as F>>>import numpy as np>>>X=np.arange(-10,10,0.5)>>>T=torch.tensor(X)>>>y=F.tanh(T)>>>y.shape
torch.Size([40])
See the tanh function below:
d) Softmax
This is very useful function used in classification. Basically it can transform any set of values to probabilities. It is defined as :
softmax(x) = exp(-x_i) /sum_j (exp(-x_j))
This function is slightly different from the other activation functions we have used. As we can see that in denominator it takes other elements also into consideration when mapping an element of a tensor.
The value of softmax function is always between (0,1) and the sum of the all elements of the tensor given by the softmax function is 1. Let us see this:
>>>import torch.nn.functional as F>>>T = torch.rand(100,10)>>>T.sum()tensor(507.4105)>>>y = F.softmax(T)>>>y.sum()>>>tensor(100.0000)
Note that in this case we have considered a second rank tensor — It is like a set of 100 vectors with each having dimensions 10. It is shown below.
B. Loss functions
As we have discussed above that loss functions need at least two input tensors and compute the mismatch between those. So let us look at some of the common loss functions:
a) L1 loss
This function adds the absolute difference between the elements of the input tensors.
>>>from torch.nn.modules import loss>>>import torch>>>T1=torch.rand(10,8)>>>T2=torch.rand(10,8)>>>l1_loss=loss.L1Loss()>>>y1=l1_loss(T1,T2)>>>y1tensor(0.3243)
b) Mean Square Error (MSE) loss
This loss function add the square of the difference between the elements of the input tensor. This is most common loss function and is close to what is done in chi-square minimisation also.
>>>from torch.nn.modules import loss>>>import torch>>>T1 = torch.rand(10,8)>>>T2 = torch.rand(10,8)>>>mse_loss = loss.MSELoss()>>>y = mse_loss(T1,T2)>>>ytensor(0.1845)
c) Cross Entropy:
The last two loss functions named L1 and MSE loss were easy to understand and did not violate any common sense — take two similar tensors as input and return a number or scalar. Cross entropy in Pytorch does not work this way.
Let us try:
>>>from torch.nn.modules import loss>>>T1 = torch.rand(10,8)>>>T2 = torch.rand(10,8)>>>ce_loss = loss.CrossEntropyLoss()>>>y = ce_loss(T1,T2)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in __call__
result = self.forward(*input, **kwargs)
File “/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py”, line 904, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File “/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py”, line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File “/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py”, line 1790, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Float for argument #2 ‘target’
This is odd !
Theoretically criss entropy does take two similar inputs (which are probability distributions) and returns a number using the formula:
S(p, q) = -sum_i ( p_i log_2 (q_i) )
So what went wrong !
In Pytorch cross entropy is constructed to quantify the mismatch between
the true class and predicted class on an object. Pytorch Cross Entropy function expect the predicted label in the form of a vector with as many components as the labels we have:
>>>from torch.nn.modules import loss>>>T1=torch.tensor([1.0,0.9,11.0], dtype=torch.float)>>>T2=torch.tensor([2], dtype=torch.long)>>>ce_loss=loss.CrossEntropyLoss()>>>T1.shapetorch.Size([3])>>> T2.shapetorch.Size([1])>>> T1=T1[None,:]>>> y=ce_loss(T1,T2)>>> ytensor(8.6784e-05)
Note that the first argument (T1) to the cross entropy is always expected to be a set of vectors with type ‘float’ and must have dimensions [m, n], where ‘m’ is the size of the batch and ’n’ is the dimensionality of the vector which must be equal to the number of classes. What is interesting is that Cross Entropy function does not expect T1 to be a probability distribution (it neither needs to be between 0 and 1 nor normalised) and so what T1 represents needs to taken care separately.
See below where the values of T1 are between -10 and 10.
>>>from torch.nn.modules import loss>>>import numpy as np>>>ce_loss=loss.CrossEntropyLoss()>>>X = np.arange(-10,10,0.5)>>>T1 = torch.tensor(X, dtype=torch.float)>>>T1.shape>>>torch.Size([40])>>>T1 = T1[None,:]>>>T2.shapetorch.Size([1])>>>y=ce_loss(T1,T2)>>>ytensor(19.4328)
d) Binary Cross Entropy :
The use of Binary Cross Entropy is quite different than Cross Entropy. Here the second argument could be only 0 & 1 and the first argument be a probability if it is not we will have problem.
>>>import torch.nn as nn>>>import torch.nn.functional as F>>>import torch>>>T1 = torch.randn(3, requires_grad=True)>>>T2 = torch.empty(3).random_(2)>>>T1=T1[None,:]>>>bce_loss= nn.BCELoss()>>>y=bce_loss(T1,T2)
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 489, in __call__
result = self.forward(*input, **kwargs)
File “/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py”, line 504, in forward
return F.binary_cross_entropy(input, target, weight=self.weight, reduction=self.reduction)
File “/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py”, line 2027, in binary_cross_entropy
input, target, weight, reduction_enum)
RuntimeError: Assertion `x >= 0. && x <= 1.’ failed. input value should be between 0~1, but got -2.914456 at /Users/administrator/nightlies/pytorch-1.0.0/wheel_build_dirs/wheel_3.6/pytorch/aten/src/THNN/generic/BCECriterion.c:60
Now if we convert T1 to probability (0–1) then everything will be fine.
>>>import torch.nn.functional as F>>>import torch.nn as nn>>>T1 = torch.randn(3, requires_grad=True)>>>T2 = torch.empty(3).random_(2)>>>bce_loss = nn.BCELoss()>>>T1 = T1[None,:]>>>T1 = F.softmax(T1)>>>y = bce_loss(T1,T2)>>>y>>>tensor(0.5056, grad_fn=<BinaryCrossEntropyBackward>)>>>
Here we have discussed a set of activation functions and loss functions used in PyTorch. If you have found this useful please like & share and if you have comments post below. I will posting more articles on machine learning so keep checking.