We saw a few historical examples of time series data and analysis in the first part of a whole tutorial from beginning to end on time series analysis:
Time Series and Machine Learning – An introduction
This is the second part, so let’s get into the math and statistics of time series.
Also, the diagrams will be hand-drawn by me, so maybe a little different from the graphs in other articles. 🙂
Time series classification
They are classified in 3 ways :
- Discrete & Continuous
- Deterministic & Non-deterministic
- Stationary & Non-stationary
Short definitions of these can be as follows:
Discrete: observations are taken at specific times, usually equally spaced.
Continuous: observations are taken continuously through time.
Deterministic: If it can be determined/predicted exactly.
Non-deterministic: (aka stochastic) Exact predictions are possible, and future values have a probability distribution based on past values.
Stationary: If there is no systematic change in the mean, the variance, or other periodic properties.
Non-stationary: If properties of one period is different from the other.
Components of a time series
The hierarchy of a time series function consist of the following:
- A random element/irregularity also called noise. This can’t be predicted in any manner and is always present in some manner.
- Systematic component: Has two components –
- Trend
- Periodic elements
- Short term periodic component
- Long term periodic component
Mathematical Models for time-series
In statistics, a model is the representation of the system is an unknown function in terms of a known functions or variables.
There are two classical time series models:
- Additive : Yt = Tt + St +Ct + It
- Multiplicative : Yt = Tt * St * Ct * It
Here:
- Yt = the time series function
- Tt = trend
- St = seasonality
- Ct = cyclical
- It = random components
Methods of trend enumeration/trend component determination
So, one of the most important methods to learn is with respect to the trend of your series. Generally, there are two different reasons for studying the trend:
- to eliminate the trend from the series
- to study the trend and attempt to forecast future behavior of said trend.
There are four methods for the determination of trend component:
- Freehand curve fitting
- Method of Semi-averages
- Fitting mathematical curves
- Moving Averages method
We’ll go over three of these methods to explain the concept well.
1. Method of semi-averages
Assumption: The underlying trend is linear.
The whole data is divided into two parts with respect to time.
Then we compute the arithmetic mean for each part and plot these two averages against the mid values of the respective periods covered by each part.
The line obtained on joining these two points is the required trend line and maybe extended both ways to estimate intermediate or future values.
Two points are ((m+1)/2, mean(y1))
and ((3m+1)/2, mean(y2)).
Equation of the straight line :
2. Fitting mathematical curves
Assumption : The trend is of polynomial form – yt = Tt + It,
where, Tt = { a + bt , a + bt + ct2 , etc... }
We try to estimate the nearest polynomial by solving the first differential and the hessian matrix. It is a calculation that will be quite difficult for me to type here, but you can read about in some of the books mentioned in the recommended section.
We are also able to similarly fit exponential curves of the form – Yt = abt
by taking a logarithm of this and then fitting a second-degree curve to the logarithm.
Demerits : quite tedious to perform. Also, it completely ignores seasonal, cyclic and irregular fluctuations.
Method of Moving Averages
Let us consider a time series { yt | t = 1,2,3,...}.
The k points weighted moving average value is defined as :
Here, sum(wj) = 1
, and M[w1,w2,w3,...]
is called the k points moving average operator. The K Points Weighted Moving Average corresponding to Yt may be defined as:
For example, if we have a time series {Xt}, which changes base and scale as: yt = (xt - a )/ b.
Ending Note
Those are the absolute basics of time series, but there’s still another section on math to cover before we go on to working on a real dataset.
If you have any questions, mention them in the comments. Book
mark the website, and keep yourself updated. Here’s the third part of the series, so check that out:
Time Series & Machine Learning – Autocorrelation, Heteroskedasticity, ARMA, ARIMA and more