11  SARIMA Model Building and Diagnostics

11.1 Model building and selection

The richness of the class of (S)AR(I)MA models is a double-edged sword – they are flexible but also difficult to use:

  • ARMA has two orders: AR order p, MA order q;
  • ARIMA has one more order: order of difference d;
  • seasonal ARMA has another two orders: seasonal AR order P, seasonal MA order Q;
  • SARIMA has one more order: order of seasonal difference D.

Now we need sensible model-building strategies and model adequacy criteria. Thus we will honor the principle of parsimony: models should be as simple as possible but not simpler.

“All models are wrong but some are useful”

Also, recall that over-fitting and/or over-differencing leads to overly complicated models and increased variance of estimates. Thus we should first plot the data. Then we may transform it to make the variance constant if necessary and also possibly remove deterministic components (trends, seasonality) if appropriate. Now we need to model the stochastic component, which can be achieved with

  • differencing appropriately (until stationary, not too much, use unit root tests if uncertain);
  • starting with a simple model (white noise);
  • performing diagnostics of residuals (acf, pacf, Box test);
  • identifying orders and modifying the model (increase model complexity) to remedy;
  • continuing until approximately white noise;
  • possibly checking the model by slightly overfitting.

Last, but not least, we use the model for prediction. But a question might arise whether we should differentiate or not:

  • if the series is non-stationary, differencing may make it stationary;
  • if the series is stationary, differencing will preserve stationarity but complicate autocorrelation (non-invertibility).

Recall that the random walk is an AR sequence with Φ(z)=1z, hence Φ has a unit root (Φ(1)=0). The presence of a unit root among the roots of the AR polynomial implies non-stationarity and thus the need to differentiate. Methods exist for hypothesis testing about unit roots with hypotheses:

  • null hypothesis H0: 1 is a root;
  • alternative H1: all roots our outside the unit circle.

One can use, for example, the Dickey-Fuller test or Phillips-Perron test, which use non-standard distributions of the test statistics (non-stationary data under the null hypothesis) and in R they can be used with PP.test, or adf.test and pp.test in the package tseries.

11.3 Model order selection

We need an objective, which would quantitatively measure model adequacy because surely bigger models provide a better fit – likelihood or least squares are always better in more complex models. However, there is a price to pay as a more complex model increases the variance of estimates. Also, we need to find a good compromise between fit and variance and penalize for complexity – that is to maximize (model fit)(model complexity penalty), or minimize (model error)+(model complexity penalty). To perform this optimization, we search in a set of candidate models to optimize the selection criterion. Now it is needed to choose selection criteria – one of the most widespread is the Akaike’s Information Criterion (AIC), which is defined as AIC=2l(β^)+2r, where

  • β^ denotes collectively all parameters (AR, MA coefficients, white noise variance, possibly mean and regression coefficients);
  • r=dim(β) is the number of parameters (e.g. r=1+1+p+q for an ARMA(p,q) model with mean);
  • l(β)=logf(X1,,Xn;β) is the log-likelihood.

Our goal is then to select the model that has the smallest value of AIC (within a set of candidate models, e.g., with orders up to some limits).

Caution

Note that we use selection criteria for models estimated from the same data, in particular, do not compare models with different orders of differencing!

The use of pure AIC is discouraged as it does not penalize large models enough. Hence the corrected AIC, denoted as AICc, is recommended AICc=2l(β^)+2rnnr1.

Also one can use BIC (Bayesian Information Criterion, or Schwarz’s selection rule), which is defined as BIC=2l(β^)+rlogn, which again strictly penalizes large models and as such tends to select smaller models.

Tip

As a rule of thumb, AIC or AICc is recommended to use for forecasting, whereas BIC is better used for model estimation.

no
yes
Plot the data.
Identify unusual observations.
Understand patterns
If necessary, use Box-Cox transformation
to stabilize variance
Select the model order yourself
Use automated algorithm
Use auto.arima to find the best
ARIMA model for your time series
If necessary, differentiate the data
until it appears stationary.
Use unit-root tests if you are unsure
Plot ACF/PACF of the differentiated data
and try to determine possible
candidate models
Try your chosen model/s and use AICc to search for a better model
Check the residuals from your
chosen model by plotting the ACF
of the residuals, and doing a
portmanteau test of the residuals
Do the residuals look like white noise?
Calculate forecasts