naive bayes from scratch

In our previous Machine Learning From Scratch series, we have covered Linear Regression and Logistic Regression. Today, we will be covering all details about Naive Bayes Algorithm from scratch. 

Naive Bayes is a classification algorithm based on the “Bayes Theorem”. So let’s get introduced to the Bayes Theorem first.

naive bayes from scratch

Bayes Theorem is used to find the probability of an event occurring given the probability of another event that has already occurred. Here B is the evidence and A is the hypothesis. Here P(A) is known as prior, P(A/B) is posterior, and P(B/A) is the likelihood.


The name Naive is used because the presence of one independent feature doesn’t affect (influence or change the value of) other features. The most important assumption that Naive Bayes makes is that all the features are independent of each other. Being less prone to overfitting, Naive Bayes algorithm works on Bayes theorem to predict unknown data sets.

Youth High No Fair No
Youth High No Excellent No
Middle Age High No Fair Yes
Senior Medium No Fair Yes
Senior Low Yes Fair Yes
Senior Low Yes Excellent No
Middle Age Low Yes Excellent Yes
Youth Medium No Fair No
Youth Low Yes Fair Yes
Senior Medium Yes Fair Yes
Youth Medium Yes Excellent Yes
Middle Age Medium No Excellent Yes
Middle Age High Yes Fair Yes
Senior Medium No Excellent No

We are given a table that contains a dataset about age, income, student, credit-rating, buying a computer, and their respective features. From the above dataset, we need to find whether a youth student with medium income having a fair credit rating buys a computer or not.

I.e. B = (Youth, Medium, Yes, Fair)

In the above dataset, we can apply the Bayesian theorem. 


  • A = ( Yes / No ) under buying computer
  • B = ( Youth, Medium, Student, Fair)

So, P(A/B) means the probability of buying a computer given that conditions are “Youth age”, “Medium Income”, “Student”, and “fair credit-rating”. 


Before starting, we assume that all the given features are independent of each other.

Step 1: Calculate probabilities of buying a computer from above dataset
Buy Computer Count Probability
Yes 9 9/14
No 5 5/14
Total 14  
Step 2: Calculate probabilities under Credit-rating buying a computer from the above dataset

Let’s understand how we calculated the above probabilities. From the table we can see that there are 8 fair credit ratings among which 6 buy computers and 2 don’t buy. Similarly, 6 have excellent credit ratings among which 3 buy computers and 3 don’t. As a whole 9 (6+3)  buy computers and 5 (2+5) don’t.

P(fair / Yes) means the probability of credit rating being fair when someone buys a computer. Hence, P(fair / Yes) = P( fair buying computer) / P ( total number of buying computer) i.e. 6/9.

Step 3: Calculate probabilities under Student buying a computer from the above dataset
naive bayes from scratch
naive bayes from scratch
Step 4: Calculate probabilities under Income level  buying a computer from the above dataset

P( High / Yes ) = 2/9       P( Mid / Yes ) = 4/9               P( Low / Yes ) = 3/9

P( High / No ) = 2/5        P( Mid / No ) = 2/5                 P( Low / No ) =  1/5

Step 5: Calculate probabilities under age level  buying a computer from the above dataset

P( Youth / Yes ) = 2/9       P( Mid / Yes ) = 4/9               P( Senior / Yes ) = 3/9

P( Youth / No ) = 3/5        P( Mid / No ) = 0                    P( Senior / No ) =  ⅖


 We have,

 B = ( Youth, Medium, Student, Fair)

 P(Yes) * P(B / Yes )  = P(Yes) * P( Youth / Yes ) * P( Mid / Yes) * P( S Yes / Yes) * P( Fair / Yes)

                  = 9/14 * 2/9 * 4/9 * 6/9 * 6/9

                  = 0.02821

P(No) * P(B / No ) = P(No) * P( Youth / No ) * P( Mid / No) * P( S Yes / No) * P( Fair / No)

                 = 5/14 * 3/5 * 2/5 * 2/5

                 = 0.0068 

P( B ) = P (Youth) * P( Mid) * P(Student Yes) * P( Fair)

          =  5/14 * 6/14 * 7/14 * 8/14

          = 0.04373


 P(Yes / B) = P(Yes) * P(B / Yes )  / P(B)

                  = 0.02821 / 0.04373

                  = 0.645

 P(No / B) = P(No) * P(B / No )  / P(B)

                  = 0.0068 / 0.04373

                  = 0.155

Here,  P(Yes / B) is greater than  P(No / B) i.e posterior Yes is greater than posterior No. So the class B ( Youth, Mid, yes, fair) buys a computer.


Classify whether a given person is a male or a female based on the measured features. The features include height, weight, and foot size.
naive bayes from scratch

Now, defining a dataframe which consists if above provided data.

Creating another data frame containing the feature value of height as 6 feet, weight as 130 lbs and foot size as 8 inches. using Naive Bayes from scratch we are trying to find whether the gender is male or female.

Calculating the total number of males and females and their probabilities i.e priors:

Calculating mean and variance of male and female of the feature height, weight and foot size.

naive bayes from scratch
  • posterior (male) = P(male)*P(height|male)*P(weight|male)*P(foot size|male) / evidence
  • posterior (female) = P(female)*P(height|female)*P(weight|female)*P(foot size|female) / evidence
  • Evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) + P(female) * P(height|female) * P(weight|female)*P(foot size|female)

The evidence may be ignored since it is a positive constant. (Normal distributions are always positive.)

naive scratch from scratch
Fig: Nominal distribution formula
Calculation of P(height | Male )
  • mean of  the height of male = 5.855
  • variance ( Square of S.D.)  of the height of a male is square of 3.5033e-02
  • and x i.e. given height is 6 feet
  • Substituting the values in the above equation we get  P(height | Male ) = 1.5789


  • P(weight|male) = 5.9881e-06
  • P(foot size|male) = 1.3112e-3
  • P(height|female) = 2.2346e-1
  • P(weight|female) = 1.6789e-2
  • P(foot size|female) = 2.8669e-1

Posterior (male)*evidence = P(male)*P(height|male)*P(weight|male)*P(foot size|male) = 6.1984e-09

Posterior (female)*evidence = P(female)*P(height|female)*P(weight|female)*P(foot size|female)= 5.3778e-04


Since Posterior (female)*evidence > Posterior (male)*evidence, the sample is female.


Though we have very small dataset, we are dividing the dataset into train and test do that it can be used in other model prediction. We are importing gnb() from sklearn and we are training the model with out dataset.

Now, our model is ready. Let’s use this model to predict on new data.


We have come to an end of Naive Bayes from Scratch. If you have any queries, feedback, or suggestions then you can leave a comment or mail us on [email protected] We will see you in the next tutorial. Stay safe !! Happy Coding !!!


About Diwas Pandey

Highly motivated, strong drive with excellent interpersonal, communication, and team-building skills. Motivated to learn, grow and excel in Data Science, Artificial Intelligence, SEO & Digital Marketing

View all posts by Diwas Pandey →


  1. fantastic submit, very informative. I’m wondering why the other experts of this sector do not notice
    this. You must proceed your writing. I am confident, you’ve a huge readers’
    base already!

Leave a Reply

Your email address will not be published. Required fields are marked *