Documentation Center

  • Trials
  • Product Updates

Cointegration and Error Correction

Introduction to Cointegration Analysis

Integration and Cointegration

A univariate time series yt is integrated if it can be brought to stationarity through differencing. The number of differences required to achieve stationarity is called the order of integration. Time series of order d are denoted I(d). Stationary series are denoted I(0).

An n-dimensional time series yt is cointegrated if some linear combination β1y1t + … + βnynt of the component variables is stationary. The combination is called a cointegrating relation, and the coefficients β = (β1 , … , βn)′ form a cointegrating vector. Cointegration is usually associated with systems of I(1) variables, since any I(0) variables are trivially cointegrated with other variables using a vector with coefficient 1 on the I(0) component and coefficient 0 on the other components. The idea of cointegration can be generalized to systems of higher-order variables if a linear combination reduces their common order of integration.

Cointegration is distinguished from traditional economic equilibrium, in which a balance of forces produces stable long-term levels in the variables. Cointegrated variables are generally unstable in their levels, but exhibit mean-reverting "spreads" (generalized by the cointegrating relation) that force the variables to move around common stochastic trends. Cointegration is also distinguished from the short-term synchronies of positive covariance, which only measures the tendency to move together at each time step. Modification of the VAR model to include cointegrated variables balances the short-term dynamics of the system with long-term tendencies.

Cointegration and Error Correction

The tendency of cointegrated variables to revert to common stochastic trends is expressed in terms of error-correction. If yt is an n-dimensional time series and β is a cointegrating vector, then the combination βyt−1 measures the "error" in the data (the deviation from the stationary mean) at time t−1. The rate at which series "correct" from disequilibrium is represented by a vector α of adjustment speeds, which are incorporated into the VAR model at time t through a multiplicative error-correction term αβyt−1.

In general, there may be multiple cointegrating relations among the variables in yt, in which case the vectors α and β become matrices A and B, with each column of B representing a specific relation. The error-correction term becomes AByt−1 = Cyt−1. Adding the error-correction term to a VAR model in differences produces the vector error-correction (VEC) model:

If the variables in yt are all I(1), the terms involving differences are stationary, leaving only the error-correction term to introduce long-term stochastic trends. The rank of the impact matrix C determines the long-term dynamics. If C has full rank, the system yt is stationary in levels. If C has rank 0, the error-correction term disappears, and the system is stationary in differences. These two extremes correspond to standard choices in univariate modeling. In the multivariate case, however, there are intermediate choices, corresponding to reduced ranks between 0 and n. If C is restricted to reduced rank r, then C factors into (nonunique) n-by-r matrices A and B with C = AB′, and there are r independent cointegrating relations among the variables in yt.

By collecting differences, a VEC(q) model can be converted to a VAR(p) model in levels, with p = q+1:

Conversion between VEC(q) and VAR(p) representations of an n-dimensional system are carried out by the functions vectovar and vartovec using the formulas:

Because of the equivalence of the two representations, a VEC model with a reduced-rank error-correction coefficient is often called a cointegrated VAR model. In particular, cointegrated VAR models can be simulated and forecast using standard VAR techniques.

The Role of Deterministic Terms

The cointegrated VAR model is often augmented with exogenous terms Dx:

Variables in x may include seasonal or interventional dummies, or deterministic terms representing trends in the data. Since the model is expressed in differences ∆yt, constant terms in x represent linear trends in the levels of yt and linear terms represent quadratic trends. In contrast, constant and linear terms in the cointegrating relations have the usual interpretation as intercepts and linear trends, although restricted to the stationary variable formed by the cointegrating relation. Johansen [56] considers five cases for AB´yt−1 + Dx which cover the majority of observed behaviors in macroeconomic systems:

CaseForm of AByt − 1 +  DxModel Interpretation
H2AByt − 1There are no intercepts or trends in the cointegrating relations and there are no trends in the data. This model is only appropriate if all series have zero mean.
H1*A(Byt − 1 + c0)There are intercepts in the cointegrating relations and there are no trends in the data. This model is appropriate for nontrending data with nonzero mean.
H1A(Byt − 1+c0) + c1There are intercepts in the cointegrating relations and there are linear trends in the data. This is a model of deterministic cointegration, where the cointegrating relations eliminate both stochastic and deterministic trends in the data.
H*A(Byt − 1 + c0 + d0t) + c1There are intercepts and linear trends in the cointegrating relations and there are linear trends in the data. This is a model of stochastic cointegration, where the cointegrating relations eliminate stochastic but not deterministic trends in the data.
HA(Byt − 1 + c0 + d0t) + c1 + d1tThere are intercepts and linear trends in the cointegrating relations and there are quadratic trends in the data. Unless quadratic trends are actually present in the data, this model may produce good in-sample fits but poor out-of-sample forecasts.

In Econometrics Toolbox™, deterministic terms outside of the cointegrating relations, c1 and d1, are identified by projecting constant and linear regression coefficients, respectively, onto the orthogonal complement of A.

Cointegration Modeling

Integration and cointegration both present opportunities for transforming variables to stationarity. Integrated variables, identified by unit root and stationarity tests, can be differenced to stationarity. Cointegrated variables, identified by cointegration tests, can be combined to form new, stationary variables. In practice, it must be determined if such transformations lead to more reliable models, with variables that retain an economic interpretation.

Generalizing from the univariate case can be misleading. In the standard Box-Jenkins [15] approach to univariate ARMA modeling, stationarity is an essential assumption. Without it, the underlying distribution theory and estimation techniques become invalid. In the corresponding multivariate case, where the VAR model is unrestricted and there is no cointegration, choices are less straightforward. If the goal of a VAR analysis is to determine relationships among the original variables, differencing loses information. In this context, Sims, Stock, and Watson [89] advise against differencing, even in the presence of unit roots. If, however, the goal is to simulate an underlying data-generating process, integrated levels data can cause a number of problems. Model specification tests lose power due to an increase in the number of estimated parameters. Other tests, such as those for Granger causality, no longer have standard distributions, and become invalid. Finally, forecasts over long time horizons suffer from inconsistent estimates, due to impulse responses that do not decay. Enders [32] discusses modeling strategies.

In the presence of cointegration, simple differencing is a model misspecification, since long-term information appears in the levels. Fortunately, the cointegrated VAR model provides intermediate options, between differences and levels, by mixing them together with the cointegrating relations. Since all terms of the cointegrated VAR model are stationary, problems with unit roots are eliminated.

Cointegration modeling is often suggested, independently, by economic theory. Examples of variables that are commonly described with a cointegrated VAR model include:

  • Money stock, interest rates, income, and prices (common models of money demand)

  • Investment, income, and consumption (common models of productivity)

  • Consumption and long-term income expectation (Permanent Income Hypothesis)

  • Exchange rates and prices in foreign and domestic markets (Purchasing Power Parity)

  • Spot and forward currency exchange rates and interest rates (Covered Interest Rate Parity)

  • Interest rates of different maturities (Term Structure Expectations Hypothesis)

  • Interest rates and inflation (Fisher Equation)

Since these theories describe long-term equilibria among the variables, accurate estimation of cointegrated models may require large amounts of low-frequency (annual, quarterly, monthly) macroeconomic data. As a result, these models must consider the possibility of structural changes in the underlying data-generating process during the sample period.

Financial data, by contrast, is often available at high frequencies (hours, minutes, microseconds). The mean-reverting spreads of cointegrated financial series can be modeled and examined for arbitrage opportunities. For example, the Law of One Price suggests cointegration among the following groups of variables:

  • Prices of assets with identical cash flows

  • Prices of assets and dividends

  • Spot, future, and forward prices

  • Bid and ask prices

Identifying Single Cointegrating Relations

The Engle-Granger Test for Cointegration

Modern approaches to cointegration testing originated with Engle and Granger [34]. Their method is simple to describe: regress the first component y1t of yt on the remaining components of yt and test the residuals for a unit root. The null hypothesis is that the series in yt are not cointegrated, so if the residual test fails to find evidence against the null of a unit root, the Engle-Granger test fails to find evidence that the estimated regression relation is cointegrating. Note that you can write the regression equation as , where is the cointegrating vector and c0 is the intercept. A complication of the Engle-Granger approach is that the residual series is estimated rather than observed, so the standard asymptotic distributions of conventional unit root statistics do not apply. Augmented Dickey-Fuller tests (adftest) and Phillips-Perron tests (pptest) can not be used directly. For accurate testing, distributions of the test statistics must be computed specifically for the Engle-Granger test.

The Engle-Granger test is implemented in Econometrics Toolbox by the function egcitest. To demonstrate its use, load MacKinnon's data [70] on the term-structure of Canadian interest rates:

load Data_Canada
Y = Data(:,3:end); % Interest rate data

figure
plot(dates,Y,'LineWidth',2)
xlabel('Year')
ylabel('Percent')
names = series(3:end);
legend(names,'location','NW')
title('{\bf Canadian Interest Rates, 1954-1994}')
axis tight
grid on

The plot shows evidence of cointegration among the three series, which move together with a mean-reverting spread. To test for cointegration, we compute both the τ (t1) and z (t2) Dickey-Fuller statistics, which egcitest compares to tabulated values of the Engle-Granger critical values:

[h,pValue,stat,cValue] = egcitest(Y,'test',{'t1','t2'})
h =

     0     1


pValue =

    0.0526    0.0202


stat =

   -3.9321  -25.4538


cValue =

   -3.9563  -22.1153

The τ test fails to reject the null of no cointegration, but just barely, with a p-value only slightly above the default 5% significance level, and a statistic only slightly above the left-tail critical value. The z test does reject the null of no cointegration.

The test regresses y1 = Y(:,1) on Y2 = Y(:,2:end) and (by default) an intercept c0. The residual series is [y1 Y2]*betac0 = y1 –  Y2*b – c0. Regression coefficients c0 and b are returned in a fifth output argument (together with other regression statistics). You can use the regression coefficients to examine the hypothesized cointegrating vector beta = [1; -b]:

[~,~,~,~,reg] = egcitest(Y,'test','t2');

c0 = reg.coeff(1);
b = reg.coeff(2:3);
beta = [1;-b];
COrd = get(gca,'ColorOrder');
set(gca,'NextPlot','ReplaceChildren','ColorOrder',...
    circshift(COrd,3))
plot(dates,Y*beta-c0,'LineWidth',2)
title('{\bf Cointegrating Relation}')
axis tight
legend off;
grid on

The combination appears relatively stationary, as the test confirms.

Estimate VEC Model Parameters

Once a cointegrating relation has been determined, remaining VEC model coefficients can be estimated by ordinary least-squares. Suppose, for example, that a model selection procedure has indicated the adequacy of q = 2 lags in a VEC(q) model, and we wish to estimate:

$$\Delta y_t = \alpha(\beta^\prime y_{t-1}+c_0)+\displaystyle\sum_{i=1}^2B_i\Delta y_{t-i}+c_1+\varepsilon_t.$$

Since c0 and $\beta$ = [1; -b] have already been determined, we conditionally estimate $\alpha$ , B1, B2, and c1 by first forming the required lagged differences before performing the regression:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
[~,~,~,~,reg] = egcitest(Y,'test','t2');
c0 = reg.coeff(1);
b = reg.coeff(2:3);
beta = [1;-b];

q = 2;
[numObs,numDims] = size(Y);
tBase = (q+2):numObs; % Commensurate time base, all lags
T = length(tBase); % Effective sample size
YLags = lagmatrix(Y,0:(q+1)); % Y(t-k) on observed time base
LY = YLags(tBase,(numDims+1):2*numDims);
% Y(t-1) on commensurate time base

% Form multidimensional differences so that
% the kth numDims-wide block of
% columns in DelatYLags contains (1-L)Y(t-k+1):

DeltaYLags = zeros(T,(q+1)*numDims);
for k = 1:(q+1)
    DeltaYLags(:,((k-1)*numDims+1):k*numDims) = ...
               YLags(tBase,((k-1)*numDims+1):k*numDims) ...
             - YLags(tBase,(k*numDims+1):(k+1)*numDims);
end

DY = DeltaYLags(:,1:numDims); % (1-L)Y(t)
DLY = DeltaYLags(:,(numDims+1):end); % [(1-L)Y(t-1),...,(1-L)Y(t-q)]

% Perform the regression:
X = [(LY*beta-c0),DLY,ones(T,1)];
P = (X\DY)'; % [alpha,B1,...,Bq,c1]
alpha = P(:,1);
B1 = P(:,2:4);
B2 = P(:,5:7);
c1 = P(:,end);

% Display model coefficients
alpha,b,c0,B1,B2,c1
alpha =

   -0.6336
    0.0595
    0.0269


b =

    2.2209
   -1.0718


c0 =

   -1.2393


B1 =

    0.1649   -0.1465   -0.0416
   -0.0024    0.3816   -0.3716
    0.0815    0.1790   -0.1528


B2 =

   -0.3205    0.9506   -0.9514
   -0.1996    0.5169   -0.5211
   -0.1751    0.6061   -0.5419


c1 =

    0.1516
    0.1508
    0.1503

We also estimate the residual covariance matrix for purposes of simulation and forecasting:

res = DY-X*P';
EstCov = cov(res);

Simulation and Forecasting

Once model coefficients have been estimated, the underlying data-generating process can be simulated. For example, the following code generates a single Monte Carlo forecast path for a horizon 10 years beyond the data:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
[~,~,~,~,reg] = egcitest(Y,'test','t2');
c0 = reg.coeff(1);
b = reg.coeff(2:3);
beta = [1; -b];
q = 2;
[numObs,numDims] = size(Y);
tBase = (q+2):numObs; % Commensurate time base, all lags
T = length(tBase); % Effective sample size
DeltaYLags = zeros(T,(q+1)*numDims);
YLags = lagmatrix(Y,0:(q+1)); % Y(t-k) on observed time base
LY = YLags(tBase,(numDims+1):2*numDims);
for k = 1:(q+1)
    DeltaYLags(:,((k-1)*numDims+1):k*numDims) = ...
               YLags(tBase,((k-1)*numDims+1):k*numDims) ...
             - YLags(tBase,(k*numDims+1):(k+1)*numDims);
end

DY = DeltaYLags(:,1:numDims); % (1-L)Y(t)
DLY = DeltaYLags(:,(numDims+1):end); % [(1-L)Y(t-1),...,(1-L)Y(t-q)]
X = [(LY*beta-c0),DLY,ones(T,1)];
P = (X\DY)'; % [alpha,B1,...,Bq,c1]
alpha = P(:,1);
B1 = P(:,2:4);
B2 = P(:,5:7);
c1 = P(:,end);
res = DY-X*P';
EstCov = cov(res);

numSteps = 10;

% Preallocate:
YSim = zeros(numSteps,numDims);
eps = zeros(numSteps,numDims);

% Specify q+1 presample values:
YSim(1,:) = Y(end-2,:);
YSim(2,:) = Y(end-1,:);
YSim(3,:) = Y(end,:);

% Simulate numSteps postsample values:
rng('default'); % For reproducibility
for t = 4:numSteps+3

    eps(t,:) = mvnrnd([0 0 0],EstCov,1); % Normal innovations

    YSim(t,:) = YSim(t-1,:) ...
                + YSim(t-1,:)*beta*alpha'...
                + (YSim(t-1,:)-YSim(t-2,:))*B1'...
                + (YSim(t-2,:)-YSim(t-3,:))*B2'...
                + (alpha*c0 + c1)'...
                + eps(t,:);

end

% Plot sample and forecast path:
plot(dates,Y,'LineWidth',2)
xlabel('Year')
ylabel('Percent')
title('{\bf Forecast Path}')
hold on
D = dates(end);
plot(D:(D+numSteps),YSim(3:end,:),'-.','LineWidth',2)
Ym = min([Y(:);YSim(:)]);
YM = max([Y(:);YSim(:)]);
fill([D D D+numSteps D+numSteps],[Ym YM YM Ym],'b','FaceAlpha',0.1)
axis tight
grid on
hold off

As described in VAR Model Forecasting, Simulation, and Analysis, the mean and standard deviation of multiple realizations of the forecast path can be used to generate mean forecasts with confidence intervals. Alternatively, the VEC model can be converted to a VAR representation using vectovar. vgxpred and vgxsim can be used to generate forecasts.

Limitations of the Engle-Granger Test

The Engle-Granger method has several limitations. First of all, it identifies only a single cointegrating relation, among what might be many such relations. This requires one of the variables, $y_{1t}$ , to be identified as "first" among the variables in $y_t$ . This choice, which is usually arbitrary, affects both test results and model estimation. To see this, permute the three interest rates in the Canadian data and estimate the cointegrating relation for each choice of a "first" variable:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
P0 = perms([1 2 3]);
[~,idx] = unique(P0(:,1));
    % Rows of P0 with unique regressand y1
P = P0(idx,:); % Unique regressions
numPerms = size(P,1);

% Preallocate:
T0 = size(Y,1);
H = zeros(1,numPerms);
PVal = zeros(1,numPerms);
CIR = zeros(T0,numPerms);

% Run all tests:
for i = 1:numPerms

    YPerm = Y(:,P(i,:));
    [h,pValue,~,~,reg] = egcitest(YPerm,'test','t2');
    H(i) = h;
    PVal(i) = pValue;
    c0i = reg.coeff(1);
    bi = reg.coeff(2:3);
    betai = [1;-bi]
    CIR(:,i) = YPerm*betai-c0i;

end

% Display the test results:
H,PVal
betai =

    1.0000
   -2.2209
    1.0718


betai =

    1.0000
   -0.6029
   -0.3472


betai =

    1.0000
   -1.4394
    0.4001


H =

     1     1     0


PVal =

    0.0202    0.0290    0.0625

For this data, two regressands identify cointegration while the third regressand fails to do so. Asymptotic theory indicates that the test results will be identical in large samples, but the finite-sample properties of the test make it cumbersome to draw reliable inferences.

A plot of the identified cointegrating relations shows the previous estimate (Cointegrating relation 1), plus two others. There is no guarantee, in the context of Engle-Granger estimation, that the relations are independent: Plot the cointegrating relations:

COrd = get(gca,'ColorOrder');
set(gca,'NextPlot','ReplaceChildren','ColorOrder',...
    circshift(COrd,3))
plot(dates,CIR,'LineWidth',2)
title('{\bf Multiple Cointegrating Relations}')
legend(strcat({'Cointegrating relation  '}, ...
     num2str((1:numPerms)')),'location','NW');
axis tight
grid on

Another limitation of the Engle-Granger method is that it is a two-step procedure, with one regression to estimate the residual series, and another regression to test for a unit root. Errors in the first estimation are necessarily carried into the second estimation. The estimated, rather than observed, residual series requires entirely new tables of critical values for standard unit root tests.

Finally, the Engle-Granger method estimates cointegrating relations independently of the VEC model in which they play a role. As a result, model estimation also becomes a two-step procedure. In particular, deterministic terms in the VEC model must be estimated conditionally, based on a predetermined estimate of the cointegrating vector.

Identifying Multiple Cointegrating Relations

The Johansen Test for Cointegration

The Johansen test for cointegration addresses many of the limitations of the Engle-Granger method. It avoids two-step estimators and provides comprehensive testing in the presence of multiple cointegrating relations. Its maximum likelihood approach incorporates the testing procedure into the process of model estimation, avoiding conditional estimates. Moreover, the test provides a framework for testing restrictions on the cointegrating relations B and the adjustment speeds A in the VEC model.

At the core of the Johansen method is the relationship between the rank of the impact matrix C = AB′ and the size of its eigenvalues. The eigenvalues depend on the form of the VEC model, and in particular on the composition of its deterministic terms (see The Role of Deterministic Terms). The method infers the cointegration rank by testing the number of eigenvalues that are statistically different from 0, then conducts model estimation under the rank constraints. Although the method appears to be very different from the Engle-Granger method, it is essentially a multivariate generalization of the augmented Dickey-Fuller test for unit roots. See, e.g., [32].

The Johansen test is implemented in Econometrics Toolbox by the function jcitest. To demonstrate its use, we return to the data on the term-structure of Canadian interest rates. The function's calling syntax, and the structure of its output arguments, are best illustrated by running multiple tests in a single function call. Here, for example, we test for the cointegration rank using the default H1 model with two different lag structures:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
[h,pValue,stat,cValue] = jcitest(Y,'model','H1','lags',1:2);
************************
Results Summary (Test 1)

Data: Y
Effective sample size: 39
Model: H1
Lags: 1
Statistic: trace
Significance level: 0.05


r  h  stat      cValue   pValue   eigVal   
----------------------------------------
0  1  35.3442   29.7976  0.0104   0.3979  
1  1  15.5568   15.4948  0.0490   0.2757  
2  0  2.9796    3.8415   0.0843   0.0736  

************************
Results Summary (Test 2)

Data: Y
Effective sample size: 38
Model: H1
Lags: 2
Statistic: trace
Significance level: 0.05


r  h  stat      cValue   pValue   eigVal   
----------------------------------------
0  0  25.8188   29.7976  0.1346   0.2839  
1  0  13.1267   15.4948  0.1109   0.2377  
2  0  2.8108    3.8415   0.0937   0.0713  

The default "trace" test assesses null hypotheses H(r) of cointegration rank less than or equal to r against the alternative H(n), where n is the dimension of the data. The summaries show that the first test rejects a cointegration rank of 0 (no cointegration) and just barely rejects a cointegration rank of 1, but fails to reject a cointegration rank of 2. The inference is that the data exhibit 1 or 2 cointegrating relationships. With an additional lag in the model, the second test fails to reject any of the cointegration ranks, providing little by way of inference. This example illustrates the importance of determining a reasonable lag length for the VEC model (as well as the general form of the model) before testing for cointegration.

Because the Johansen method, by its nature, tests multiple rank specifications for each specification of the remaining model parameters, results from jcitest are returned in the form of dataset arrays, indexed by null rank and test number. For example, the output h has the form:

h
h = 

          r0       r1       r2   
    t1    true     true     false
    t2    false    false    false

Column headers indicate tests r0, r1, and r2, respectively, of H(0), H(1), and H(2) against H(3). Row headers t1 and t2 indicate the two separate tests (two separate lag structures) specified by the input parameters. To access, for example, the result for the second test at null rank r = 0, use dataset indexing:

h20 = h.r0(2)
h20 =

     0

Estimate VEC Model Parameters Using jcitest

In addition to testing for multiple cointegrating relations, jcitest produces maximum likelihood estimates of VEC model coefficients under the rank restrictions on B. Estimation information is returned in an optional fifth output argument, and can be displayed by setting an optional input parameter. For example, the following estimates a VEC(2) model of the data, and displays the results under each of the rank restrictions r = 0, r = 1, and r = 2:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
[~,~,~,~,mles] = jcitest(Y,'model','H1','lags',2,...
    'display','params');
****************************
Parameter Estimates (Test 1)

r = 0
------
B1 =
   -0.1848    0.5704   -0.3273
    0.0305    0.3143   -0.3448
    0.0964    0.1485   -0.1406

B2 =
   -0.6046    1.6615   -1.3922
   -0.1729    0.4501   -0.4796
   -0.1631    0.5759   -0.5231

c1 =
    0.1420
    0.1517
    0.1508


r = 1
------
A =
   -0.6259
   -0.2261
   -0.0222

B =
    0.7081
    1.6282
   -2.4581

B1 =
    0.0579    1.0824   -0.8718
    0.1182    0.4993   -0.5415
    0.1050    0.1667   -0.1600

B2 =
   -0.5462    2.2436   -1.7723
   -0.1518    0.6605   -0.6169
   -0.1610    0.5966   -0.5366

c0 =
    2.2351

c1 =
   -0.0366
    0.0872
    0.1444


r = 2
------
A =
   -0.6259    0.1379
   -0.2261   -0.0480
   -0.0222    0.0137

B =
    0.7081   -2.4407
    1.6282    6.2883
   -2.4581   -3.5321

B1 =
    0.2438    0.6395   -0.6729
    0.0535    0.6533   -0.6107
    0.1234    0.1228   -0.1403

B2 =
   -0.3857    1.7970   -1.4915
   -0.2076    0.8158   -0.7146
   -0.1451    0.5524   -0.5089

c0 =
    2.0901
   -3.0289

c1 =
   -0.0104
    0.0137
    0.1528

The mles output is a dataset array of structure arrays, with each structure containing information for a particular test under a particular rank restriction. Since both dataset and structure arrays use similar indexing, you can access the dataset and then the structure using dot notation. For example, to access the rank 2 matrix of cointegrating relations:

B = mles.r2.paramVals.B
B =

    0.7081   -2.4407
    1.6282    6.2883
   -2.4581   -3.5321

Compare Approaches to Cointegration Analysis

Comparing inferences and estimates from the Johansen and Engle-Granger approaches can be challenging, for a variety of reasons. First of all, the two methods are essentially different, and may disagree on inferences from the same data. The Engle-Granger two-step method for estimating the VEC model, first estimating the cointegrating relation and then estimating the remaining model coefficients, differs from Johansen's maximum likelihood approach. Secondly, the cointegrating relations estimated by the Engle-Granger approach may not correspond to the cointegrating relations estimated by the Johansen approach, especially in the presence of multiple cointegrating relations. It is important, in this context, to remember that cointegrating relations are not uniquely defined, but depend on the decomposition $C = AB^\prime$ of the impact matrix.

Nevertheless, the two approaches should provide generally comparable results, if both begin with the same data and seek out the same underlying relationships. Properly normalized, cointegrating relations discovered by either method should reflect the mechanics of the data-generating process, and VEC models built from the relations should have comparable forecasting abilities.

As the following shows in the case of the Canadian interest rate data, Johansen's H1* model, which is the closest to the default settings of egcitest, discovers the same cointegrating relation as the Engle-Granger test, assuming a cointegration rank of 2:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
[~,~,~,~,reg] = egcitest(Y,'test','t2');
c0 = reg.coeff(1);
b = reg.coeff(2:3);
beta = [1; -b];

[~,~,~,~,mles] = jcitest(Y,'model','H1*');
BJ2 = mles.r2.paramVals.B;
c0J2 = mles.r2.paramVals.c0;

% Normalize the 2nd cointegrating relation with respect to
%  the 1st variable, to make it comparable to Engle-Granger:
BJ2n = BJ2(:,2)/BJ2(1,2);
c0J2n = c0J2(2)/BJ2(1,2);

% Plot the normalized Johansen cointegrating relation together
%  with the original Engle-Granger cointegrating relation:

COrd = get(gca,'ColorOrder');

plot(dates,Y*beta-c0,'LineWidth',2,'Color',COrd(4,:))
hold on
plot(dates,Y*BJ2n+c0J2n,'--','LineWidth',2,'Color',COrd(5,:))
legend('Engle-Granger OLS','Johansen MLE','Location','NW')
title('{\bf Cointegrating Relation}')
axis tight
grid on
hold off
************************
Results Summary (Test 1)

Data: Y
Effective sample size: 40
Model: H1*
Lags: 0
Statistic: trace
Significance level: 0.05


r  h  stat      cValue   pValue   eigVal   
----------------------------------------
0  1  38.8360   35.1929  0.0194   0.4159  
1  0  17.3256   20.2619  0.1211   0.2881  
2  0  3.7325    9.1644   0.5229   0.0891  

Testing Cointegrating Vectors and Adjustment Speeds

A separate Econometrics Toolbox function, jcontest, uses the Johansen framework to test linear constraints on cointegrating relations B and adjustment speeds A, and estimates VEC model parameters under the additional constraints. Constraint testing allows you to assess the validity of relationships suggested by economic theory.

Constraints imposed by jcontest take one of two forms. Constraints of the form RA = 0 or RB = 0 specify particular combinations of the variables to be held fixed during testing and estimation. These constraints are equivalent to parameterizations A = Hφ or B = Hφ, where H is the orthogonal complement of R (in MATLAB®, null(R')) and φ is a vector of free parameters. The second constraint type specifies particular vectors in the column space of A or B. The number of constraints that jcontest can impose is restricted by the rank of the matrix being tested, which can be inferred by first running jcitest.

Test Cointegrating Vectors

Tests on B answer questions about the space of cointegrating relations. The column vectors in B, estimated by jcitest, do not uniquely define the cointegrating relations. Rather, they estimate a space of cointegrating relations, given by the span of the vectors. Tests on B allow you to determine if other potentially interesting relations lie in that space. When constructing constraints, interpret the rows and columns of the n-by- r matrix B as follows:

  • Row i of B contains the coefficients of variable $y_{it}$ in each of the r cointegrating relations.

  • Column j of B contains the coefficients of each of the n variables in cointegrating relation j.

One application of jcontest is to pretest variables for their order of integration. At the start of any cointegration analysis, trending variables are typically tested for the presence of a unit root. These pretests can be carried out with combinations of standard unit root and stationarity tests such as adftest, pptest, kpsstest, or lmctest. Alternatively, jcontest lets you carry out stationarity testing within the Johansen framework. To do so, specify a cointegrating vector that is 1 at the variable of interest and 0 elsewhere, and then test to see if that vector is in the space of cointegrating relations. The following tests all of the variables in Y a single call:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
[h0,pValue0] = jcontest(Y,1,'BVec',{[1 0 0]',[0 1 0]',[0 0 1]'})
h0 =

     1     1     1


pValue0 =

   1.0e-03 *

    0.3368    0.1758    0.1310

The second input argument specifies a cointegration rank of 1, and the third and fourth input arguments are a parameter/value pair specifying tests of specific vectors in the space of cointegrating relations. The results strongly reject the null of stationarity for each of the variables, returning very small p-values.

Another common test of the space of cointegrating vectors is to see if certain combinations of variables suggested by economic theory are stationary. For example, it may be of interest to see if interest rates are cointegrated with various measures of inflation (and, via the Fisher equation, if real interest rates are stationary). In addition to the interest rates already examined, Data_Canada.mat contains two measures of inflation, based on the CPI and the GDP deflator, respectively. To demonstrate the test procedure (without any presumption of having identified an adequate model), we first run jcitest to determine the rank of B, then test the stationarity of a simple spread between the CPI inflation rate and the short-term interest rate:

y1 = Data(:,1); % CPI-based inflation rate
YI = [y1,Y];

% Test if inflation is cointegrated with interest rates:
[h,pValue] = jcitest(YI);
% Test if y1 - y2 is stationary:
[hB,pValueB] = jcontest(YI,1,'BCon',[1 -1 0 0]')
************************
Results Summary (Test 1)

Data: YI
Effective sample size: 40
Model: H1
Lags: 0
Statistic: trace
Significance level: 0.05


r  h  stat      cValue   pValue   eigVal   
----------------------------------------
0  1  58.0038   47.8564  0.0045   0.5532  
1  0  25.7783   29.7976  0.1359   0.3218  
2  0  10.2434   15.4948  0.2932   0.1375  
3  1  4.3263    3.8415   0.0376   0.1025  

hB =

     1


pValueB =

    0.0242

The first test provides evidence of cointegration, and fails to reject a cointegration rank r = 1. The second test, assuming r = 1, rejects the hypothesized cointegrating relation. Of course, reliable economic inferences would need to include proper model selection, with corresponding settings for the 'model' and other default parameters.

Test Adjustment Speeds

Tests on A answer questions about common driving forces in the system. When constructing constraints, interpret the rows and columns of the n-by- r matrix A as follows:

  • Row i of A contains the adjustment speeds of variable $y_{it}$ to disequilibrium in each of the r cointegrating relations.

  • Column j of A contains the adjustment speeds of each of the n variables to disequilibrium in cointegrating relation j.

For example, an all-zero row in A indicates a variable that is weakly exogenous with respect to the coefficients in B. Such a variable may affect other variables, but does not adjust to disequilibrium in the cointegrating relations. Similarly, a standard unit vector column in A indicates a variable that is exclusively adjusting to disequilibrium in a particular cointegrating relation.

To demonstrate, we test for weak exogeneity of the inflation rate with respect to interest rates:

load Data_Canada
Y = Data(:,3:end); % Interest rate data
y1 = Data(:,1); % CPI-based inflation rate
YI = [y1,Y];

[hA,pValueA] = jcontest(YI,1,'ACon',[1 0 0 0]')
hA =

     0


pValueA =

    0.3206

The test fails to reject the null hypothesis. Again, the test is conducted with default settings. Proper economic inference would require a more careful analysis of model and rank specifications.

Constrained parameter estimates are accessed via a fifth output argument from jcontest. For example, the constrained, rank 1 estimate of A is obtained by referencing the fifth output with dot (.) indexing:

[~,~,~,~,mles] = jcontest(YI,1,'ACon',[1 0 0 0]');
Acon = mles.paramVals.A
Acon =

         0
    0.1423
    0.0865
    0.2862

The first row of A is 0, as specified by the constraint.

Was this topic helpful?