Loss Functions| Cost Functions in Machine Learning

Every Machine Learning algorithm (Model) Learns by process of optimizing loss functions (or Error/Cost functions). Loss functions are the functions that deal with the evaluation of how accurate the given prediction is made. If the prediction is made far away from the actual or true value i.e. prediction deviates more from actual value, then the loss function gives high numeric value. For model to produce good prediction, it must have low deviation from actual value i.e. low loss. We use some optimization techniques like gradient descent algorithm, to reduce the loss in our prediction.

There are several loss functions in machine learning. Now, the question arises, can we use any of loss function in our machine learning algorithm? The answer is no. No in the sense, if we use loss function randomly, then we may face problems in calculation of loss as well as might produce some error if loss function is more sensitive of outlier. So it is important to know about the loss function before using them to calculate the loss in our prediction. It should be selected on the basis of machine learning algorithm we are using. There are several factors that govern the selection of Loss function for your problem like, algorithm you are using, ease of evaluation of probability and derivative, presence of outlier etc.

Depending upon the type of evaluation model i.e. classification or regression, the loss functions can also be divided into two types: classification or regression loss function (actually there is not such classification. we made this classification for the ease of understanding only).

In classification we try to predict the class or label of any supplied tuple (set of features) on the basis of given dataset for modeling. It means categorical value (eg male or female, dead or alive etc.) is predicted.

In regression we predict the continuous value for any given set of feature on the basic of given dataset for modeling.

· Classification Loss Functions

Some of the classification loss functions are:

1. Hinge loss/SVM loss

Hinge Loss is a loss function that is used for the training classifier models in machine learning. More precisely, it is used for maximum-margin classification algorithm (i.e. SVM).

Let, T be the target output such that T = (-1 or +1) and classifier score be Y, then hinge loss for the prediction is given as,

Hinge Loss
source: Machine Learning talks

It should be noted that y is not a class label but is raw (i.e. numeric output) given by classifiers decision surface.

For example, Linear SVM: Y = W.X + b; (W,b) are the weight and biases which are parameters of Hyperplane and X is feature to classify.

Interpretation for Hinge Loss:

We can see that if T and Y are of same sign (i.e. classified in right class) and |Y|>= 1, then loss, L(Y) =0. It means the classification accuracy is high. On the other hand the loss, L(Y) gradually increases if T and Y are of opposite sign (Wrong classification) and if they have same sign but |Y|<1 (called as Low Margin error).

2. Cross-entropy Loss/ Negative Log Likelihood

Cross-entropy loss/ Negative log Likelihood, is loss function that measures the probability prediction of a classification model whose output is a probability value between 0 and 1. Cross-entropy loss increases as the predicted probability diverges from the True value or actual label. A perfect model would have a log loss of 0 and it’s high value suggests high error in our predictive model.

The general Mathematical expression for Cross-entropy Loss is,

Cross-entropy loss
source:Machine Learning Talks

Where, M= total number of classes to be classified e.g. if we have class label as cat, dog, rat, then M=3.

Y= binary indicator (1 or 0) if the class label ‘c’ is correctly classified for an observation ‘o’.

P= Predicted probability for an observation ‘o’ is of class ‘c’.

· Regression Loss Functions

Some of the Regression Loss Function is:

1. Mean Square Error/Quadratic Loss/L2 Loss (MSE):

It is given as the average of squared difference between the actual value and predicted value by learning model in regression.

Mathematically it is given as,

mean squared error
source:Machine Learning Talks

Where, T is true value i.e. actual value and Y predicted value.

The optimization of MSE is done by using gradient descent algorithm. It is more sensitive to outlier (as it includes squared difference) than MAE.

2. Mean Absolute Error/L1 Loss (MAE):

It is given as the average of absolute difference between the actual value and predicted value by learning model in regression.

Mathematically it is given as,

Mean absolute error
source: Machine Learning Talks

Where, T is true value i.e. actual value and Y predicted value.

The optimization of MAE is done by using gradient descent algorithm.

3. Huber loss:

This Loss function is commonly used for regression problems.

Mathematically Huber loss is given as,

Huber loss
source:Machine Learning Talks

Where, δ gives deviation from mean.

machine learning blogs

Search This Blog