Hey, folks! Today we will be unveiling a very interesting module of Python — Seaborn Module and will be understanding its contribution to Data Visualizations.
Need of Seaborn module
Data visualization is the representation of the data values in a pictorial format. Visualization of data helps in attaining a better understanding and helps draw out perfect conclusions from the data.
Python Matplotlib library provides a base for all the data visualization modules present in Python. Python Seaborn module is built over the Matplotlib module and provides functions with better efficiency and plot features inculcated in it.
With Seaborn, data can be presented with different visualizations and different features can be added to it to enhance the pictorial representation.
Visualizing Data with Python Seaborn
In order to get started with data visualization with Seaborn, the following modules need to be installed and imported in the Python environment.
Note: I have linked the above modules(in the bullets) with the article links for reference.
Further, we need to install and load the Python Seaborn module into the environment.
1 2 |
pip install seaborn import seaborn |
Now that we have installed and imported the Seaborn module in our working environment, Let us get started with Data visualizations in Seaborn.
Statistical Data Visualization with Seaborn
Python Seaborn module helps us visualize and depict the data in statistical terms i.e. understanding of the relationship between data values with the help of the following plots:
- Line Plot
- Scatter Plot
Let us understand each of them in detail in the upcoming sections.
Seaborn Line Plot
Seaborn Line Plot depicts the relationship between the data values amongst a set of data points. Line Plot helps in depicting the dependence of a data variable/value over the other data value.
The seaborn.lineplot() function
plots a line out of the data points to visualize the dependence of a data variable over the other parametric data variable.
Syntax:
1 |
seaborn.lineplot(x,y) |
Example 1:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.lineplot(data['hp'],data['cyl']) plt.show() |
Output:
Example 2:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.lineplot(data['hp'],data['cyl'],hue=data['am'],style=data['am']) plt.show() |
In the above example, we have depicted the relationship between various data values using the parameter hue
and style
to depict the relationship between them using different plotting styles.
Output:
Seaborn Scatter Plot
Seaborn Scatter plot too helps depicts the relationship between various data values against a continuous/categorical data value(parameter).
Scatter plot is extensively used to detect outliers in the field of data visualization and data cleansing. The outliers is the data values that lie away from the normal range of all the data values. Scatter plot helps in visualizing the data points and highlight the outliers out of it.
Syntax:
1 |
seaborn.scatterplot() |
The seaborn.scatterplot()
function plots the data points in the clusters of data points to depict and visualize the relationship between the data variables. While visualizing the data model, we need to place the dependent or the response variable values against the y-axis and independent variable values against the x-axis.
Example 1:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.scatterplot(data['hp'],data['cyl']) plt.show() |
Output:
Example 2:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.scatterplot(data['hp'],data['cyl'],hue=data['am'],style=data['am']) plt.show() |
With the parameters ‘hue
‘ and ‘style
‘, we can visualize multiple data variables with different plotting styles.
Output:
Categorical Data visualization with Seaborn and Pandas
Before getting started with the categorical data distribution, it is necessary for us to understand certain terms related to data analysis and visualization.
- Continuous variable: It is a data variable that contains continuous and numeric values. For example: Age is a continuous variable whose value can lie between 1 – 100
- Categorical variable: It is a data variable containing discrete values i.e. in the form of groups or categories. For example: Gender can be categorized into two groups– ‘Male’, ‘Female’ and ‘Others’.
Having understood the basic terminologies, let us dive into the visualization of categorical data variables.
Box Plot
Seaborn Boxplot is used to visualize the categorical/numeric data variable and is extensively used to detect outliers in the data cleansing process.
The seaborn.boxplot() method
is used create a boxplot for a particular data variable. The box structure represents the main quartile of the plot.
Syntax:
1 |
seaborn.boxplot() |
The two lines represent the lower and the upper range. Any data point that lies below the lower range or above the upper range is considered as an outlier.
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.boxplot(data['mpg']) plt.show() |
Output:
In the above boxplot, the data point lying above the upper range is marked as a data point and considered as an outlier to the dataset.
Boxen Plot
Seaborn Boxenplot resembles the boxplot but has a slight difference in the presentation of the plot.
The seaborn.boxenplot() function
plots the data variable with enlarged inter quartile blocks depicting a detailed representation of the data values.
Syntax:
1 |
seaborn.boxenplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.boxenplot(data['hp']) plt.show() |
Output:
Violin Plot
Seaborn Violin Plot is used to represent the underlying data distribution of a data variable across its data values.
Syntax:
1 |
seaborn.violinplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.violinplot(data['hp']) plt.show() |
Output:
SwarmPlot
Seaborn Swarmplot gives a better picture in terms of the description of the relationship amongst categorical data variables.
The seaborn.swarmplot() function
creates a swarm of data points around the data values that happen to represent a relationship between the two categorical data variables/columns.
Syntax:
1 |
seaborn.swarmplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.swarmplot(data['am'],data['cyl']) plt.show() |
Output:
Estimation of categorical data using Seaborn
In the field of data analysis and visualization, we often require data plots that help us estimate the frequency or count of certain survey/re-searches, etc. The following plots are useful to serve the same purpose:
- Barplot
- Pointplot
- Countplot
1. Barplot
Seaborn Barplot represents the data distribution among the data variables as a frequency distribution of the central tendency values.
Syntax:
1 |
seaborn.barplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.barplot(data['cyl'],data['carb']) plt.show() |
Output:
2. Pointplot
Seaborn Pointplot is a combination of Statistical Seaborn Line and Scatter Plots. The seaborn.pointplot() function
represents the relationship between the data variables in the form of scatter points and lines joining them.
Syntax:
1 |
seaborn.pointplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.pointplot(data['carb'],data['cyl']) plt.show() |
Output:
3. Countplot
Seaborn Countplot represents the count or the frequency of the data variable passed to it. Thus it can be considered as a Univariate Data distribution plot.
Syntax:
1 |
seaborn.countplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.countplot(data['carb']) plt.show() |
Output:
Univariate distribution using Seaborn Distplot
The Seaborn Distplot is extensively used for univariate data distribution and visualization i.e. visualizing the data values of a single data variable.
The seaborn.distplot() function
depicts the data distribution of a continuous variable. It is represented as histogram along with a line.
Syntax:
1 |
seaborn.distplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.distplot(data['mpg']) plt.show() |
Output:
Bivariate distribution using Seaborn Kdeplot
Seaborn Kdeplot depicts the statistical probability distribution representation of multiple continuous variables altogether.
Syntax:
1 |
seaborn.kdeplot() |
Example:
1 2 3 4 5 6 7 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") res = sn.kdeplot(data['mpg'],data['qsec']) plt.show() |
Output:
Setting different backgrounds using Seaborn
The seaborn.set() function
can be used to set different background to the plots such as ‘dark‘, ‘whitegrid‘, ‘darkgrid‘, etc.
Syntax:
1 |
seaborn.set(style) |
Example:
1 2 3 4 5 6 7 8 |
import seaborn as sn import matplotlib.pyplot as plt import numpy as np import pandas data = pandas.read_csv("C:/mtcars.csv") sn.set(style="darkgrid",) res = sn.lineplot(data['mpg'],data['qsec']) plt.show() |
Output:
Conclusion
Thus, Seaborn module helps in visualizing the data using different plots according to the purpose of visualization.