Estimating long term effects in error correction model

Hi everyone,

I run an error correction model in Stan and try to estimate the long term effects of my regressors.

To check whether my model is right I use the ecm package in R and the example from the package’s reference manual.

library(ecm)
library(rstan)
library(tidyverse)

data(Wilshire)
trn <- Wilshire[Wilshire$date<='2015-12-01',]

#Assume all predictors are needed in the equilibrium and transient terms of ecm
xeq <- xtr <- trn[c('CorpProfits', 'FedFundsRate', 'UnempRate')]
model1 <- ecm(trn$Wilshire5000, xeq, xtr, includeIntercept=TRUE)
summary(model1)

Call:
lm(formula = dy ~ ., data = x, weights = weights)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.9223 -0.6088  0.0210  0.6822  3.9381 

Coefficients:
                    Estimate Std. Error t value     Pr(>|t|)    
(Intercept)       -0.4160213  0.8236705  -0.505     0.614155    
deltaCorpProfits   0.0119853  0.0020332   5.895 0.0000000197 ***
deltaFedFundsRate -0.1231619  0.1547487  -0.796     0.427210    
deltaUnempRate    -1.4841457  0.4389805  -3.381     0.000896 ***
CorpProfitsLag1    0.0027077  0.0008258   3.279     0.001265 ** 
FedFundsRateLag1   0.0655636  0.0494706   1.325     0.186849    
UnempRateLag1     -0.0532751  0.1040916  -0.512     0.609448    
yLag1             -0.0337028  0.0192679  -1.749     0.082066 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.711 on 170 degrees of freedom
Multiple R-squared:  0.2919,	Adjusted R-squared:  0.2628 
F-statistic: 10.01 on 7 and 170 DF,  p-value: 0.0000000001885

ecm() transforms the ECM in a way, that the model can be estimated in one regression:


with {\gamma _{l,long - term}} = - {{\beta _{l,long - term}}\gamma}

I can run this model easily in Stan and recover the same parameters like in the example:

data <- data.frame(y=trn$Wilshire5000,
                   trn %>% 
                     dplyr::select(CorpProfits,
                                   FedFundsRate,
                                   UnempRate)) %>% 
  mutate_all(list(lag=~dplyr::lag(.),
                  fd=~.-dplyr::lag(.))) %>% 
  .[complete.cases(.),]%>% 
  dplyr::select(-CorpProfits,
                -FedFundsRate,
                -UnempRate,
                -y)


# Stan data

standata = list(N = NROW(data$y_fd),
                 K = ncol(data %>% 
                              dplyr::select(ends_with("fd"),
                                            -y_fd)),#number of coefficients for short term effects
                 L = ncol(data%>%
                            dplyr::select(ends_with("lag"),
                                          -y_lag)),#number of coefficients for long term effects
                 Y_fd = as.numeric(data$y_fd), #differenced outcome
                 Y_lag = as.numeric(data$y_lag),#lagged outcome
                 X = data %>% 
                   dplyr::select(ends_with("fd"),
                                 -y_fd),#matrix of differenced predictors
                 Z = data %>% 
                   dplyr::select(ends_with("lag"),
                                 -y_lag)##matrix of laged predictors
                 )

# Stan

stan_code_ecm="data {

int<lower=0> N; //number of observations
int<lower=0> K; //number of coefficients for short term effects
int<lower=0> L; //number of coefficients for long term effects
matrix[N, K] X; //matrix of differenced predictors
matrix[N, L] Z; //matrix of lagged predictors
vector[N] Y_fd; //differenced outcome
vector[N] Y_lag; //lagged outcome
}

parameters {
real alpha_st;
vector [K] beta_st;
vector [L] gamma_lt;
real gamma;
real<lower=0> sigma; 
}

model {

//priors
alpha_st ~ student_t(4,0,10);
beta_st ~ student_t(4,0,10);
gamma ~ student_t(4,0,10);
gamma_lt ~ student_t(4,0,10);
sigma ~ student_t(4,0,10);

//likelihood
Y_fd ~ normal(alpha_st + X*beta_st + gamma*Y_lag + Z*gamma_lt, sigma);
}"

fit_ecm = stan(model_code = stan_code_ecm,
               data = standata,
               chains = 4, 
               iter = 2000,
               cores = 4)

print(fit_ecm, digits=3)

Inference for Stan model: 72ebbdea57dae7e762831d6c92d0a799.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean    sd     2.5%      25%      50%      75%    97.5% n_eff  Rhat
alpha_st      -0.441   0.021 0.827   -2.063   -0.996   -0.432    0.122    1.175  1613 1.001
beta_st[1]     0.012   0.000 0.002    0.008    0.011    0.012    0.013    0.016  5832 1.000
beta_st[2]    -0.126   0.003 0.158   -0.432   -0.232   -0.128   -0.019    0.179  2031 1.002
beta_st[3]    -1.492   0.010 0.448   -2.357   -1.791   -1.485   -1.195   -0.625  1999 1.001
gamma_lt[1]    0.003   0.000 0.001    0.001    0.002    0.003    0.003    0.004  2204 1.000
gamma_lt[2]    0.066   0.001 0.048   -0.029    0.034    0.066    0.100    0.159  2362 1.000
gamma_lt[3]   -0.050   0.003 0.105   -0.252   -0.123   -0.051    0.021    0.153  1670 1.001
gamma         -0.033   0.000 0.019   -0.070   -0.046   -0.033   -0.021    0.006  1894 1.000
sigma          1.724   0.002 0.091    1.553    1.661    1.721    1.783    1.911  2813 1.000
lp__        -184.642   0.054 2.159 -189.825 -185.819 -184.270 -183.044 -181.532  1594 1.001

Samples were drawn using NUTS(diag_e) at Wed Dec 11 20:49:52 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

But, since {\gamma _{l,long - term}} is the product of - {\beta _{l,long - term}} and \gamma, I am also interested in {\beta _{l,long - term}} and its CI.

I already tried some things like:

(1) Directly defining the likelihood as the untransformed model:

model {

...

//likelihood
Y_fd ~ normal(alpha_st + X*beta_st + gamma*(Y_lag - Z*beta_lt), sigma);
}

This gives me weird results and diverging transitions and high Rhat/low n_eff.

Warning messages:
1: There were 138 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 
2: Examine the pairs() plot to diagnose sampling problems
 
3: The largest R-hat is 1.54, indicating chains have not mixed.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#r-hat 
4: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#bulk-ess 
5: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
Running the chains for more iterations may help. See
http://mc-stan.org/misc/warnings.html#tail-ess

Inference for Stan model: ebeaa5664f6c938ba1ee5875fa240a00.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

               mean se_mean     sd     2.5%      25%      50%      75%    97.5% n_eff  Rhat
alpha_st     -0.520   0.015  0.515   -1.707   -0.825   -0.441   -0.172    0.300  1156 1.014
beta_st[1]    0.012   0.000  0.002    0.008    0.010    0.012    0.013    0.016  5019 1.000
beta_st[2]   -0.086   0.003  0.154   -0.380   -0.191   -0.087    0.019    0.217  2259 1.000
beta_st[3]   -1.368   0.010  0.441   -2.212   -1.665   -1.361   -1.079   -0.485  2032 1.001
beta_lt[1]    0.067   0.380  0.705   -2.025    0.033    0.175    0.363    1.319     3 1.557
beta_lt[2]    3.256   4.267 11.698  -22.506   -1.332    4.060    9.095   25.032     8 1.184
beta_lt[3]    0.437   0.292 10.265  -20.389   -4.819    0.236    5.551   22.165  1238 1.015
gamma        -0.005   0.003  0.008   -0.024   -0.009   -0.004    0.000    0.006     5 1.282
sigma         1.733   0.002  0.095    1.559    1.668    1.727    1.794    1.934  2474 1.001
lp__       -186.473   0.415  2.477 -192.042 -187.865 -186.213 -184.748 -182.358    36 1.056

Samples were drawn using NUTS(diag_e) at Wed Dec 11 22:34:39 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

(2) Calculating {\beta _{l,long - term}} as generated quantities.

generated quantities {
vector [L] beta_lt;

beta_lt=gamma_lt/gamma;
}

This gives me also weird results with very broad CI

Inference for Stan model: a99af2ac032d43240be78c06c0fe10b0.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                mean se_mean       sd     2.5%      25%      50%      75%    97.5% n_eff  Rhat
alpha_st      -0.396   0.022    0.834   -2.057   -0.958   -0.399    0.160    1.231  1402 1.002
beta_st[1]     0.012   0.000    0.002    0.008    0.011    0.012    0.013    0.016  6873 1.000
beta_st[2]    -0.127   0.003    0.156   -0.429   -0.235   -0.125   -0.019    0.175  2086 1.002
beta_st[3]    -1.497   0.009    0.441   -2.333   -1.805   -1.508   -1.202   -0.615  2665 1.001
gamma_lt[1]    0.003   0.000    0.001    0.001    0.002    0.003    0.003    0.004  2485 1.000
gamma_lt[2]    0.065   0.001    0.050   -0.032    0.031    0.064    0.098    0.163  2109 1.001
gamma_lt[3]   -0.055   0.003    0.104   -0.258   -0.125   -0.055    0.015    0.149  1554 1.002
gamma         -0.034   0.000    0.019   -0.072   -0.047   -0.034   -0.021    0.003  1944 1.000
sigma          1.723   0.002    0.095    1.551    1.655    1.716    1.785    1.919  2678 1.000
beta_lt[1]    -0.651   0.375   23.776   -0.353   -0.099   -0.078   -0.065    0.083  4010 1.000
beta_lt[2]   -38.847  25.799 1633.578  -19.133   -3.424   -1.724   -0.675    7.102  4009 1.000
beta_lt[3]   -14.000  10.351  645.256  -14.341   -0.354    1.708    3.398   12.546  3886 1.000
lp__        -184.625   0.054    2.172 -189.815 -185.794 -184.295 -183.081 -181.407  1630 1.002

Samples were drawn using NUTS(diag_e) at Wed Dec 11 22:24:07 2019.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Now I wonder what the best way is to estimate {\beta _{l,long - term}} and also get CI for these parameters.

Thank you very much in advance!

Option (2) where you calculate the \beta_{l, long-term} in the generated quantities is probably the right way to do it. With option (1), the model has the product of two parameters, \gamma \beta_{l, long-term}. Stan is going to have trouble with these around for instance \gamma = 0. I would not be surprised if in option (1) some chains are (temporary) stuck with \gamma>0 or \gamma<0.

The broad CI in option (1) for the generated quantities could be because there is not enough information in the data to estimate these effects. You can “solve” this by using narrower priors if that is warranted.

My preferred way would be to see whether there are any correlations in the posterior between the parameters. This might give you some idea of which parameters are not uniquely identified and whether you can further reparametrise the model.