Regression is the method which measures the average relationship between two or more continuous variables in term of the response variable and feature variables. In other words, regression analysis is to know the nature of the relationship between two or more variables to use it for predicting the most likely value of dependent variables for a given value of independent variables. Linear regression is a mostly used regression algorithm.
For more concrete understanding, let’s say there is a high correlation between day temperature and sales of tea and coffee. Then the salesman might wish to know the temperature for the next day to decide for the stock of tea and coffee. This can be done with the help of regression.
The variable, whose value is estimated, predicted, or influenced is called a dependent variable. And the variable which is used for prediction or is known is called an independent variable. It is also called explanatory, regressor, or predictor variable.
LINEAR REGRESSION
Linear Regression is a supervised method that tries to find a relation between a continuous set of variables from any given dataset. So, the problem statement that the algorithm tries to solve linearly is to best fit a line/plane/hyperplane (as the dimension goes on increasing) for any given set of data.
This algorithm use statistics on the training data to find the best fit linear or straight-line relationship between the input variables (X) and output variable (y). Simple equation of Linear Regression model can be written as:
Y=mX+c ;Here m and c are calculated on training
In the above equation, m is the scale factor or coefficient, c being the bias coefficient, Y is the dependent variable and X is the independent variable. Once the coefficient m and c are known, this equation can be used to predict the output value Y when input X is provided.
Mathematically, coefficients m and c can be calculated as:
m = sum((X(i) - mean(X)) * (Y(i) - mean(Y))) / sum( (X(i) - mean(X))^2 )
c = mean(Y) - m * mean(X)

As you can see, the red point is very near the regression line; its error of prediction is small. By contrast, the yellow point is much higher than the regression line and therefore its error of prediction is large. The best-fitting line is the line that minimizes the sum of the squared errors of prediction. Source
LINEAR REGRESSION FROM SCRATCH
We will build a linear regression model to predict the salary of a person on the basis of years of experience from scratch. You can download the dataset from the link given below. Let’s start with importing required libraries:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
We are using dataset of 30 data items consisting of features like years of experience and salary. Let’s visualize the dataset first.
dataset = pd.read_csv('salaries.csv')
#Scatter Plot
X = dataset['Years of Experience']
Y = dataset['Salary']
plt.scatter(X,Y,color='blue')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.title('Salary Prediction Curves')
plt.show()

Finally, we have calculated the unknown coefficient m as b1 and c as b0. Here we have b1 = 9449.962321455077 and b0 = 25792.20019866869.
Let’s visualize the best fit line from scratch. Code is available below.
Now let’s predict the salary Y by providing years of experience as X:
def predict(x):
return (b0 + b1 * x)
y_pred = predict(6.5)
print(y_pred)
Output: 87216.95528812669
LINEAR REGRESSION USING SKLEARN
from sklearn.linear_model import LinearRegression
X = dataset.drop(['Salary'],axis=1)
Y = dataset['Salary']
reg = LinearRegression() #creating object reg
reg.fit(X,Y) # Fitting the Data set
Let’s visualize the best fit line using Linear Regression from sklearn. Code is available below.

Now let’s predict the salary Y by providing years of experience as X:
y_pred = reg.predict([[6.5]])
y_pred
Output: 87216.95528812669
CONCLUSION
We need to able to measure how good our model is (accuracy). There are many methods to achieve this but we would implement Root mean squared error and coefficient of Determination (R² Score).
- Try Model with Different error metric for Linear Regression like Mean Absolute Error, Root mean squared error.
- Try algorithm with large data set, imbalanced & balanced dataset so that you can have all flavors of Regression.
CONTRIBUTORS:
- Diwas Pandey
- Sunil Ghimire
- Abhishek chougule
Hello! I just would like to give a huge thumbs up for the great info you have here on this post. I will be coming back to your blog for more soon.
I get pleasure from, cause I found just what I used to be having a look for.
You’ve ended my 4 day lengthy hunt! God Bless you man. Have a nice day.
Bye
I love your blog.. very nice colors & theme. Did
you create this website yourself or did you hire someone to do it for you?
Plz respond as I’m looking to create my own blog and
would like to know where u got this from.
thanks