Skip to main content

Naive bayes Classifier Tutorials

Naive Bayes Classifier Tutorial


What is Naive Bayes classifier?


Naive Bayes Classifier is a classification technique based on Bayes’ Theorem. It is base on the principle that the predictors are independent of each other. In simple words, we can say that the Naive Bayes classifier assumes that the presence of a particular feature in a class is independent(Unrelated) with the presence of any other feature in the same class. Let's understand this concept by an example, suppose a fruit may be considered to be an orange if it is orange in color, approximately round, and about 2.5 inches in diameter. Here we can see that all of these properties independently contribute to the probability that this fruit is orange, even if these features depend on each other. This is the reason, why it is known as ‘Naive’. (Naive meaning: Unaffected).

Naive Bayes algorithm is simple to understand and easy to build. It do not contain any complicated iterative parameter estimation. We can use naive bayes classifier in small data set as well as with the large data set that may be highly sophisticated classification. 

Naive Bayes classifier is based on the Bayes theorem of probability. Bayes theorem can used for calculating posterior probability P(y|X) from P(y), P(X) and P(X|y). The mathematical equation for Bayes Theorem is,

Mathematical equation of Bayes Theorem
Mathematical equation of Bayes Theorem
From the equation, we have,

  • X is the feature vector represented as,
    feature vector
    feature vector
  • P(y|X) is the posterior probability (A posterior probability, in Bayesian statistics, is the revised or updated probability of an event occurring after taking into consideration new information ~ Investopedia) of class (y, target) given  predictor (X, attributes).
  • P(y) is the prior probability(probability as assessed before making reference to certain relevant observationsclass.
  • P(X|y) is the probability(Likelihood) of predictor given class.
  • P(X) is the prior probability of predictor.


Since Naive Bayes classifier assumes the independence of predictors (features), so for independent features, we calculate the output probability using Bayes theorem  as,
Independent Probability
Independent Probability
Which can be represented as,
Independent Probability

Independent Probability

Since the denominator is constant, we can write,

Now, To create a Naive Bayes classifier model, we find the probability of given set of inputs for all possible values of the class variable y and pick up the output with maximum probability. This can be expressed mathematically as:
Max probability
Max probability
So, finally, we are left with the task of calculating P(y) and P(xi | y).

NOTE:  P(y) is also called class probability and P(xi | y) is called conditional probability.

How Naive Bayes classifier works?


Let’s understand the working and algorithm of Naive Bayes Classifier using an example. Below is the training data set for playing golf under different circumstances. We have different features as Outlook, Temperature,Humidity, Windy and we are given label as play golf under different situations of those features. We need to predict whether to play or not for new test data(that we provide) by using naive bayes classification algorithm. Let’s do it step by step and learn this algorithm.






OUTLOOK

TEMPERATURE

HUMIDITY

WINDY

PLAY GOLF
0
Rainy
Hot
High
False
No

1
Rainy
Hot
High
True
No

2
Overcast
Hot
High
False
Yes

3
Sunny
Mild
High
False
Yes

4
Sunny
Cool
Normal
False
Yes

5
Sunny
Cool
Normal
True
No

6
Overcast
Cool
Normal
True
Yes

7
Rainy
Mild
High
False
No

8
Rainy
Cool
Normal
False
Yes

9
Sunny
Mild
Normal
False
Yes

10
Rainy
Mild
Normal
True
Yes

11
Overcast
Mild
High
True
Yes

12
Overcast
Hot
Normal
False
Yes

13
Sunny
Mild
High
True
No



Here, The attributes are : Outlook, Windy, Temperature and humidity. And the class (or Target) is Play Golf.


Step 1: Convert the given training data set into a frequency table



Step 2: Create Likelihood table (or you can say a probability table) by finding the probabilities.

Calculation Table
Calculation Table

In those tables we have calculated both P(y) (i.e. P(yes) and P(no)) and P(xi | y) (e.g. p(humidity,high|yes)).

Step 3: Now, apply Naive Bayesian equation to calculate the posterior probability for each class. The class with the highest posterior probability will be the outcome of prediction.



Lets Suppose our test data be, test = (Sunny, Hot, Normal, False). For this we need to predict whether it will be okay to play golf or not.

Let's Calculate:


Probability of playing golf:

Probability of not playing golf:
Here, we can see that in both of the probabilities there is a common factor p(test), so we can ignore it. Thus we get the calculation as follows,

 and,
 

To convert these numbers into actual probabilities, we normalize them as follows,

 and,
From above calculations, we see that


Thus, prediction for golf played is 'Yes'.


What are the Pros and Cons of Naive Bayes Classifier?

Pros:
  1. Naive Bayes Classifier is simple to understand, easy and fast to predict class of test data set. 
  2. It perform quite well in multi class prediction.
  3. It perform well in case of categorical input variables compared to numerical variable(s).

Cons:

  1. The model will assign a 0 (zero) probability and will be unable to make a prediction, If categorical variable has a category (in test data set), which was not present in training data set. This type of error is often known as “Zero Frequency”.
  2. Another limitation with the Naive Bayes is the assumption of independence . In real life, it is almost impossible that we get a set of predictors which are completely independent to each other.

Applications of Naive Bayes Classifier

  • Real time Prediction: Naive Bayes Classifier is an eager (not a lazy learner) learning classifier and it is sure fast. Therefore, it could be used for making real time predictions.
  • Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable.
  • Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers widely used in text classification due to better result in multi class problems and independence rule.It have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identification between ham and spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative sentiments in comments and review)

Comments

Popular posts from this blog

Understanding KNN(K-nearest neighbor) with example

Understanding KNN(K-nearest neighbor) with example.  It is probably, one of the simplest but strong supervised learning algorithms used for classification as well regression purposes. It is most commonly used to classify the data points that are separated into several classes, in order to make prediction for new sample data points. It is a non-parametric and lazy learning algorithm. It classifies the data points based on the similarity measure (e.g. distance measures, mostly Euclidean distance). Assumption of KNN : K- NN algorithm is based on the principle that, “the similar things exist closer to each other or Like things are near to each other.” In this algorithm ‘K’ refers to the number of neighbors to consider for classification. It should be odd value.  The value of ‘K’ must be selected carefully otherwise it may cause defects in our model. If the value of ‘K’ is small then it causes Low Bias, High variance i.e. over fitting of model. In the same way if ‘K’ is very large then it l

What are various Data Pre-Processing techniques? What is the importance of data pre-processing?

What is Data Pre-Processing? What is the importance of data pre-processing? The real-world data are susceptible to high noise, contains missing values and a lot of vague information, and is of large size. These factors cause degradation of quality of data. And if the data is of low quality, then the result obtained after the mining or modeling of data is also of low quality. So, before mining or modeling the data, it must be passed through the series of quality upgrading techniques called data pre-processing. Thus, data pre-processing can be defined as the process of applying various techniques over the raw data (or low quality data) in order to make it suitable for processing purposes (i.e. mining or modeling). What are the various Data Pre-Processing Techniques? Fig: Methods of Data Pre-Processing source: Fotolia Once we know what data pre-processing actually does, the question might arise how is data processing done? Or how it all happens? The answer is obvious; there are series o

Supervised Machine Learning

Supervised Machine Learning What Is Supervised Learning?  It is the machine learning algorithm that learns from labeled data. After the data is analyzed and learned, the algorithm determines which label should be given to new data supplied by the user based on pattern and associating the patterns to the unlabeled new data. Supervised Learning algorithm has two categories i.e Classification & Regression Classification predicts the class or category in which the data belongs to. e.g.: Spam filtering and detection, Churn Prediction, Sentiment Analysis, image classification. Regression predicts a numerical value based on previously observed data. e.g.: House Price Prediction, Stock Price Prediction. Classification Classification is one of the widely and mostly used techniques for determining class the dependent belongs to base on the one or more independent variables. For simple understanding, what classification algorithm does is it simply makes a decision boundary between data points