In this tutorial, we’ll go over the steps to plot a histogram in R. A histogram is a graphical representation of the values along with its range. It is similar to a bar plot and each bar present in a histogram will represent the range and height of the specified value.
R offers standard function hist() to plot the histogram in Rstudio. It also offers function geom_density() to plot histogram using ggplot2.
Advantages of Histograms
- A histogram provides the distribution of the data, frequency of the data along with its range.
- It is an easier way to visualize large data sets.
- The histogram also shows the skewness of the data.
Types of Histogram plots in R
Based on the distribution of the data, a histogram exhibits many different shapes. In this section, we will try to understand the different types of histogram shapes and their meaning.
The major types of histogram distributions are,
- Normal distribution.
- Right skewed distribution.
- Left skewed distribution.
- Bimodal distribution
Basic Histogram in R
In this section, we will plot a simple histogram using the ‘airquality’ data set.
Execute the below code to plot this simple histogram.
1 2 3 4 5 |
#this code imports the dataset from the R(built-in data sets) datasets::airquality #creates the simple histogram hist(airquality$Temp, xlab = 'Temparature', ylab='Frequency', main='Simple histogram plot', col="yellow", border="black") |
Normal distribution
A normal distribution in the histogram is the ideal bell-shaped plot, which contains less or no random data.
This distribution shows that the majority of the values are concentrated at the center range.
However, the remaining data points will end up as a tail in both sides as you can see in the below plot.
Execute the below code to create the histogram which shows the normal distribution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
#imports the default dataset which is present in R data("iris") #reads the data head(iris, 5) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa #creates the histogram bins based on 'sepal length' hist(iris$Sepal.Width, xlab = 'Sepal width', ylab = 'frequency', main='normal distribution of the data', col="brown") |
Left or Negatively Skewed Histogram in R
In this section, we will plot the left or negetive skewed histogram.
Negative skewed: If the histogram distribution shows the values which are concentrated on the right side and the tail will be on the left side or on the negative value side, then it is called as negatively of left-skewed distribution.
Execute the below code to create a negetive skewed histogram in Rstudio.
Dataset: google play store dataset by kaggle
1 2 3 4 5 6 |
#imports the csv file df<- read.csv("googleplaystore.csv") #reads the data df #plots the histogram which is negetively or left skewed hist(df$Rating, xlab = 'Ratings', ylab = 'Frequency', main = 'Negetive or left skewed distribution', col="brown") |
Right or Positively skewed Histogram
In this section, we will plot the right or positively skewed histogram.
Positive skewed: If the histogram’s distribution shows that the values are concentrated on the left side and tail is on the right side of the plot, then such distribution is called positively or right-skewed histogram distribution.
Execute the below code to plot the right or positively skewed histogram.
1 2 3 4 5 |
#imports the data from the R's default dataset named 'attenu'. datasets::attenu #plots the right or posiively skewed distribution hist(attenu$accel, xlab = 'attenu', ylab = 'Frequency', main = 'Right or positively skewed distribution', col="brown") |
Bimodal Distribution of the data plotted using Histogram
In this section, we will plot a bimodal distribution of the data.
Bimodal distribution: Bimodal distribution is a type of histogram distribution, where you can witness two data peaks.
In the below graph, the x value ‘quakes’ represent the quakes data distribution.
Execute the below code to plot the bimodal distribution.
1 2 3 4 5 |
#imports the data from the R's default dataset named 'quakes' datasets::quakes #plots the bimodal histogram distribution hist(quakes$depth, xlab = 'Quakes', ylab = 'Frequency', main = 'Bimodal distribution', col="brown") |
Plotting a Histogram using ggplot2 in R.
As you know ggplot2 is the most used visualization package in R.ggplot2 offers great themes and functions to create visually appealing graphs.
In this section, we will plot the histogram of the values present in the ‘diamonds’ data set, which is present in R by default.
Execute the below code to plot the histogram using ggplot2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#install the required packages install.packages('ggplot2') install.packages('dplyr') install.packages('ggthemes') #import the required libraries library(ggplot2) library(dplyr) library(ggthemes) #shows the data head(diamonds) #plots the histogram ggplot(diamonds, aes(carat))+geom_histogram() #changes the bin width ggplot(diamonds, aes(carat))+geom_histogram(binwidth = 0.01) #adds the fill element and x,y and main labels of the graph ggplot(diamonds, aes(carat, fill=cut))+geom_histogram()+labs(x='carats', y=' Frequency of carats')+ggtitle("Distribution of diamonds's carat by cut values") #chnages the theme for attractive graph ggplot(diamonds, aes(carat, fill=cut))+geom_histogram()+labs(x='carats', y=' Frequency of carats')+ggtitle("Distribution of diamonds's carat by cut values")+theme_classic() |
Conclusion
The histogram is similar to a bar plot, which represents the distribution of data along with their range.
R offers built-in functions such as hist() to plot the graph in basic R and geom_histogram() to plot the graph using ggplot2 in R.
The histogram has many types. The major ones are normal distribution, positively skewed, negatively skewed, and bimodal distribution.
In this tutorial all these plot types are explained and plotting using ggplot2 is also illustrated in the end.
I hope, you have understood the histogram plotting and usage of different types of histograms.
Try practicing with different datasets. For any queries, just post it in the comments section. keep going!!!