In this tutorial, we’ll be going over how to create time series plots in R. Time series data refers to data points that represent a particular variable changing over different points of time. It can be thought of as a sequence of data that was recorded at regular time intervals.
Time series data is widely used in stock market analysis, weather analysis, market trend analysis and any other scenarios where data variations with time are important.
R has several packages to perform time-series plotting and analysis tasks. Let us begin by acquiring some standard time series data for our work.
Acquiring Data
Several data scientists and organizations have open-sourced time series datasets that could be directly downloaded to the R environment. Two of these sources are:
The packages can be installed into your R environment using install.packages("packagename")
command. Other relevant instructions are present on the websites give above.
Let us proceed with some data from the tsdl package for illustrating time series plotting.
Viewing Time Series Data
The tsdl package has numerous data series across several categories. Let us try accessing some of these sets. The first step is to load the package into memory.
1 2 |
library(tsdl) tsdl |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
Time Series Data Library: 648 time series Frequency Subject 0.1 0.25 1 4 5 6 12 13 52 365 Total Agriculture 0 0 37 0 0 0 3 0 0 0 40 Chemistry 0 0 8 0 0 0 0 0 0 0 8 Computing 0 0 6 0 0 0 0 0 0 0 6 Crime 0 0 1 0 0 0 2 1 0 0 4 Demography 1 0 9 2 0 0 3 0 0 2 17 Ecology 0 0 23 0 0 0 0 0 0 0 23 Finance 0 0 23 5 0 0 20 0 2 1 51 Health 0 0 8 0 0 0 6 0 1 0 15 Hydrology 0 0 42 0 0 0 78 1 0 6 127 Industry 0 0 9 0 0 0 2 0 1 0 12 Labour market 0 0 3 4 0 0 17 0 0 0 24 Macroeconomic 0 0 18 33 0 0 5 0 0 0 56 Meteorology 0 0 18 0 0 0 17 0 0 12 47 Microeconomic 0 0 27 1 0 0 7 0 1 0 36 Miscellaneous 0 0 4 0 1 1 3 0 1 0 10 Physics 0 0 12 0 0 0 4 0 0 0 16 Production 0 0 4 14 0 0 28 1 1 0 48 Sales 0 0 10 3 0 0 24 0 9 0 46 Sport 0 1 1 0 0 0 0 0 0 0 2 Transport and tourism 0 0 1 1 0 0 12 0 0 0 14 Tree-rings 0 0 34 0 0 0 1 0 0 0 35 Utilities 0 0 2 1 0 0 8 0 0 0 11 Total 1 1 300 64 1 1 240 3 16 21 648 > |
Let us try choosing a time-series for our plotting. We first create a subset of the above dataset using the subset function for the respective category.
1 |
crime <-subset(tsdl,'Crime') |
Now, in order to access the time series, we need to index the data frame created above. This particular time series represents the number of monthly armed robberies in Boston from Jan 1965 to Oct 1977.
1 |
crime[[2]] |
1 2 3 4 5 6 7 8 9 10 11 12 |
> crime[[2]] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1966 41 39 50 40 43 38 44 35 39 35 29 49 1967 50 59 63 32 39 47 53 60 57 52 70 90 1968 74 62 55 84 94 70 108 139 120 97 126 149 1969 158 124 140 109 114 77 120 133 110 92 97 78 1970 99 107 112 90 98 125 155 190 236 189 174 178 1971 136 161 171 149 184 155 276 224 213 279 268 287 1972 238 213 257 293 212 246 353 339 308 247 257 322 1973 298 273 312 249 286 279 309 401 309 328 353 354 1974 327 324 285 243 241 287 355 460 364 487 452 391 1975 500 451 375 372 302 316 398 394 431 431 |
We now create a time series object from this data frame using the function.
1 2 |
series <- ts(crime[[2]]) series |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
Time Series: Start = 1 End = 118 Frequency = 1 [1] 41 39 50 40 43 38 44 35 39 35 29 49 50 59 63 32 39 47 53 60 [21] 57 52 70 90 74 62 55 84 94 70 108 139 120 97 126 149 158 124 140 109 [41] 114 77 120 133 110 92 97 78 99 107 112 90 98 125 155 190 236 189 174 178 [61] 136 161 171 149 184 155 276 224 213 279 268 287 238 213 257 293 212 246 353 339 [81] 308 247 257 322 298 273 312 249 286 279 309 401 309 328 353 354 327 324 285 243 [101] 241 287 355 460 364 487 452 391 500 451 375 372 302 316 398 394 431 431 attr(,"source") [1] McCleary & Hay (1980) attr(,"description") [1] Monthly Boston armed robberies Jan.1966-Oct.1975 Deutsch and Alt (1977) attr(,"subject") [1] Crime |
The ts() function converts a numeric vector into a time series object. The syntax is as follows:
1 |
ts(vector, start, end, frequencY) |
You can choose to convert only a part of the time series instead of the whole series by selecting the start and endpoints from the whole series.
We can retrieve only the crime data from 1970 January to 1972 December using the following command:
1 2 3 4 5 6 7 |
> shortseries <-ts(crime[[2]], start=c(1970,1), end=c(1983,12)) > shortseries Time Series: Start = 1970 End = 1983 Frequency = 1 [1] 41 39 50 40 43 38 44 35 39 35 29 49 50 59 |
The frequency option indicates how often the observations are to be made. 1 indicates annual, 4 indicates quarterly and so on. By default, frequency takes one observation per year by calculating the mean of all observations.
If we need more fine-grained observations, we need to specify 12 as the frequency (one observation every month).
Creating Time Series Plots in R
R provides plot.ts() function to plot time-series graphs. Let us re-examine our series data.
1 2 |
series <- ts(crime[[2]]) plot.ts(series) |
Since this series was not specified with a start and end date, the plot will just display the observation number instead of the year number.
We are now going to redefine the series object with starting and ending dates and frequency set to 12.
1 2 |
series <- ts(crime[[2]], start =c(1966,1), end=c(1975,12),frequency = 12) plot.ts(series) |
Decomposing Time Series
It is possible to further analyze the time series by using decomposition. These additional pieces of information can be separately plotted as 3 different plots along with the observed plot:
- Seasonal: How patterns repeat over certain intervals of time
- Trend: The general direction of the time series progress – whether rising or falling.
- Random: The inherent irregularity present in the data when the trend and seasonality are removed.
This information can be derived from a series using the decompose()
function as follows.
1 |
decseries <-decompose(series) |
The result is a list of all the above components of the series. These can be plotted using a plot() function directly.
1 |
plot(decseries) |
From the graph, it can be observed that there is a seasonality in the crimes being performed, and the trend is generally on the rise.
Time series plots are an important means of data analysis for sequential and time-varying data. R functionalities like those mentioned above make the tasks easier.