Hey, folks! In this article, we will be focusing on Correlation Regression analysis to find the correlation between variables in Python.
So, let us begin!
What is Correlation Regression Analysis?
Correlation Regression Analysis is an important step in the process of data pre-processing for modeling of datasets. For any dataset, it is very important to depict the relationship between the variables and understand the effect of variables on the overall prediction of the data as well as the target/response variable.
This is when, Correlation Regression Analysis comes into picture.
Correlation Analysis helps us analyze the below aspects of dataβ
- Relationship between the independent variables i.e. information depicted by them and their correlation.
- Effect of the independent variables on the dependent variable.
It is crucial for any developer to understand the correlation between the independent variables.
Correlation ranges from 0 to 1. A high correlation between the two variables depicts that both the variables represent the same information.
Thus, it gives rise to multicollinearity and we can drop either of those variables.
Having understood the concept of Correlation, let us now try to implement it practically in the upcoming section.
Finding Correlation between variables
Let us first start with importing the dataset. You can find the dataset here. We have loaded the dataset into the environment using the read_csv() function.
Further, we have segregated all the numeric variables of the dataset and stored them. Because, correlation works only on numeric data. We have applied the corr()
function to depict the correlation between the variables through the correlation matrix.
1 2 3 4 5 6 |
import pandas data = pandas.read_csv("Bank_loan.csv") #Using Correlation analysis to depict the relationship between the numeric/continuous data variables numeric_col = ['age',employ','address','income','debtinc','creddebt','othdebt'] corr = data.loc[:,numeric_col].corr() print(corr) |
Output:
We can use seaborn.heatmap() function to visualize the correlation data in the range of 0 to 1 as shown belowβ
1 |
sn.heatmap(corr, annot=True) |
Output:
Conclusion
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to Python, Stay tuned @ Python with JournalDev and till then, Happy Learning!! π