Bar plots in R are the most frequently used plots in elementary statistics. These consist of horizontal or vertical bars representing a certain quantity associated with each entity in the dataset. For example, attendance days or students, no. of cars sold per model, no. of votes per party and so on.

R has some very useful functions to generate and customize bar plots. To illustrate these let us use the R’s in-built mtcars dataset, which is a collection of observations for 32 different cars across 11 different vehicular properties.

This dataset is in the form of a data frame. More information about the dataset can be viewed in its R documentation page here. The documentation can also be accessed through your R console using ?mtcars.

Basic Bar Plot in R

Let us take a look at the first 6 entries in the mtcars set.

The gear column indicates how many gears are available in the car. Let us observe the column individually.

We can plot the frequency of the gears (how many cars are there with 3, 4 or 5 gears) in a bar plot. This is done by means of the barplot() function in R.

Basic Bar Plot

But as you can see, this bar plot doesn’t reveal any information about the data in question. it simply plotted a bar for each one of the entries and left it to the user to understand the plot.

Bar Plots Using Data Frequency

A barplot is the most meaningful when we can represent the data in frequency classes, i.e., how many data entries fall under each class. It would really help us to know how many cars have 3 gears or 5 gears and get a comprehensive comparison of these numbers.

For this purpose, we need to plot the frequency of the mtcars$gear rather than the whole column. We use the table() function for this purpose.

Now we can plot the table into a barplot easily.

Barplot With Frequency

This looks far more useful than the previous plot. It shows us that the maximum number of cars in the dataset has 3 gears and very few have 5 gears. Let us now annotate these graphs for making them even more useful.

Let us try plotting the frequency of the cyl attribute which indicates how many cylinders a specific car has.

Bar Plots in R Using More Than One Variable

It is also possible to split up these bar plots into sub-bars based upon any other categorical variable in the dataset. Two such variables in the outset are cyl – referring to the cylinders of a car, and am – referring to the transmission. Let us try splitting the bar plots using these attributes – making a table for both cyl and am together. This is commonly known as a stacked bar plot.

Bar Plot With Categories

Beautifying and Adding More Info

However, to make this graph more informative and more aesthetic, we need to change a few attributes of the barplot() function.

  1. To have a separate bar for each am value (0 and 1), set the beside attribute to TRUE.
  2. You can change the barplot to horizontally oriented from the default vertical orientation by setting horiz attribute to TRUE.
  3. The title and axis labels are added in a similar manner to the plot() function.
  4. The names of each class can be specified using names.arg as a vector.
  5. Color is added using the col attribute.

Adding a Legend to Bar Plots in R

Let us take this a step further and add a legend to the above code. Just paste the line after the code above.

The pch 15 value refers to getting a filled square in the legend. We are displaying this legend on the top of our already plotted graph.

Barplot With Legend

Thus we have created bar plots using the R graphics. Much more functionality can be added using the ggplot package which we will discuss in our further tutorials.

By admin

Leave a Reply

%d bloggers like this: