Modeling a GJR GARCH with skewed innovations

Hi, I am trying to estimate the following model:
image

Where g(.|υt,ηt) is a skewed t distribution (see code) and υ is the degrees of freedom, η is the asymmetry parameter. Σt^(1/2) is the Cholesky decomposition of the variance-covariance matrix – but this is not important in the one asset case. The variance of the asset is estimated via a GJR GARCH model, while the degrees of freedom and asymmetry parameter are also estimated using GARCH like models – they are stated as:
image

I have tried by implementing the following code - it runs, but it cannot start the iteration correctly, i.e. it cannot find appropriate starting values - I have tried also incorporating initial value as suggest by Stan, but it does not fix it. Any suggestions on, how to optimize the code?

functions {
  //Multivariate skewed t-distribution (Sahu, Dey & Branco, 2003)
  real MskT_lpdf(real z, real nu, real xi){
      real a = exp((lgamma((nu-1)/2)+log(sqrt(nu-2))) - (log(sqrt(pi()))+lgamma(nu/2))+log((xi-1/xi)));
      real b = sqrt(pow(xi,2)+1/pow(xi,2)-1-pow(a,2));
      real k = (-a/b>= z ? (b*z+a)*xi : (b*z+a)/xi);
      return log(2*b/(xi+1/xi)) + lgamma((nu+1)/2)-(log(sqrt(pi()*(nu-2)))+lgamma(nu/2))-((nu+1)/2)*log(1+(pow(k,2)/(nu-2)));
  }
}

data {
  int<lower=0> T;                               //number of observations 
  real r[T];                                    //return data - notice, cannot med -100%, as this will results in negative prices
  real<lower=0> sigma1;                         //First sigma
  //GARCH prior parameters
  real omg_p;
  real omg_q;
  real psi_p;
  real psi_q;
  real alpha_q;
  real alpha_p;
  real lam_p;
  real lam_q;
  real mu_mu;
  real mu_sigma;
  //evelution of nu and xi parameters
  real c0_p;
  real c0_q;
  real c1_plus_p;
  real c1_plus_q;
  real c1_minus_p;
  real c1_minus_q;
  real c2_p;
  real c2_q;
  real d0_p;
  real d0_q;
  real d1_plus_p;
  real d1_plus_q;
  real d1_minus_p;
  real d1_minus_q;
  real d2_p;
  real d2_q;
}

parameters {
  real mu; 
  real<lower=0> omg;                            //Cannot be negative to ensure positive variance
  real<lower=0,upper=1> alpha;                  //Cannot be negative to ensure positive variance  
  real<lower=0> psi;                            //Cannot be negative to ensure positive variance
  real<lower=0> lam;                            //Cannot be negative to ensure positive variance
  real c0;
  real c1_minus;
  real c1_plus;
  real c2;
  real d0;
  real d1_minus;
  real d1_plus;
  real d2;
}

transformed parameters{
  real<lower=0> sigma[T];
  real<lower=0, upper=(1-alpha-psi/2)> bet;     //Upper limit to ensure stationarity
  real nu[T];
  real xi[T];
  real inn[T];
  real eps[T];
  sigma[1] = sigma1;
  bet = lam - alpha - psi/2;
  inn[1] = 0.2;
  nu[1] = 4;
  xi[1] = 0.5;

  for (t in 2:T){ 
  sigma[t] = sqrt(omg + (alpha + psi*step(mu-r[t - 1]))*pow(r[t - 1] - mu,2) + bet*pow(sigma[t-1],2));
  nu[t] = exp(log(c0)+c1_minus*fabs(inn[t-1])*step(-inn[t-1])+c1_plus*fabs(inn[t-1])*(1-step(-inn[t-1]))+c2*log(nu[t-1]-4))-4;
  xi[t] = exp(log(d0)+d1_minus*inn[t-1]*step(-inn[t-1])+d1_plus*inn[t-1]*(1-step(-inn[t-1]))+d2*log(xi[t-1]));
  inn[t] = (r[t]-mu) / sigma[t];
  }
   
}

model{
  //Data distribution - model
  for(i in 2:T){
  inn[i] ~ MskT(nu[i], xi[i]);  
  }
  
  //prior distributions
  omg~beta(omg_p, omg_q);
  alpha~beta(psi_p, psi_q);
  psi~beta(alpha_q, alpha_q);
  lam~beta(lam_p, lam_q);
  mu~normal(mu_mu, mu_sigma);
  c0~lognormal(c0_p, c0_q);
  c1_minus~normal(c1_minus_p, c1_minus_q);
  c1_plus~normal(c1_plus_p, c1_plus_q);
  c2~normal(c2_p, c2_q)T[-1,1];
  d0~lognormal(d0_p, d0_q);
  d1_minus~normal(d1_minus_p, d1_minus_q);
  d1_plus~normal(d1_plus_p, d1_plus_q);
  d2~normal(d2_p, d2_q)T[-1,1];
}

[edit: escaped code]

Do you get any warning messages indicating what is failing? If not, what version/interface of Stan are you using?

Is there a way to start smaller and build up to something this complicated? It’s hard to diagnose large programs.

The other thing you can do is print intermediate results.

Usually the problem is lack of constraints in parameter declarations—Stan requires every value of the parameters meeting the declared constraints to have a finite log density (positive density). I’ll edit your post to escape the code and maybe there are constraints there.

Also, it looks like you haven’t accounted for the Jacobians in defining transformed variables like inn and then giving them distributions.

Thank you for the replies and your time. When running the full model (the one above - now with a few adjustment), I first obtain the warning:
“DIAGNOSTIC(S) FROM PARSER:
Warning (non-fatal):
Left-hand side of sampling statement (~) may contain a non-linear transform of a parameter or local variable.
If it does, you need to include a target += statement with the log absolute determinant of the Jacobian of the transform.
Left-hand-side of sampling statement:
inn[i] ~ MskT(…)”

It no longer has a problem with initializing - but out of 4 chains of 1000 minus 4x500 burnins, it states:
Warning messages:
1: There were 2000 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
2: Examine the pairs() plot to diagnose sampling problems

I have tried increasing adapt_delta above 0.8, but without effect - In general i tried following: http://mc-stan.org/misc/warnings.html about the warning, but cannot seem to eliminate the problem.

I model i build from a more simpel one i.e. I started with only GARCH GJR effects assuming normality, then expanded to the skew t (univariate case - which is also then one in this code) with constant nu and xi. Both worked like a charm - i.e. based on simulated data, they were spot on. However, as soon as time-variation in nu and xi is introduced, the divergent transitions occurs.

This is a bit of a long shot - I have included my R program and Stan - program 1 is for GARCH GJR with skewed t but constant nu and xi. Program 2 is with time variation in nu and xi (modelled using an exponential GARCH like model). The full and adjusted Stan model is given as:

//functions {
//Univariate skewed t-distribution (Sahu, Dey & Branco, 2003)
real MskT_lpdf(real z, real nu, real xi){
real a = exp((lgamma((nu-1)/2)+log(sqrt(nu-2))) - (log(sqrt(pi()))+lgamma(nu/2))+log((xi-1/xi)));
real b = sqrt(pow(xi,2)+1/pow(xi,2)-1-pow(a,2));
real k = (-a/b>= z ? (bz+a)xi : (bz+a)/xi);
return log(2
b/(xi+1/xi)) + lgamma((nu+1)/2)-(log(sqrt(pi()*(nu-2)))+lgamma(nu/2))-((nu+1)/2)*log(1+(pow(k,2)/(nu-2)));
}
}

data {
int<lower=0> T; //number of observations
real r[T]; //return data - notice, cannot med -100%, as this will results in negative prices
real<lower=0> sigma1; //First sigma
//GARCH prior parameters
real omg_p;
real omg_q;
real psi_p;
real psi_q;
real alpha_q;
real alpha_p;
real lam_p;
real lam_q;
real mu_mu;
real mu_sigma;
//evelution of nu and xi parameters
real c0_p;
real c0_q;
real c1_plus_p;
real c1_plus_q;
real c1_minus_p;
real c1_minus_q;
real c2_p;
real c2_q;
real d0_p;
real d0_q;
real d1_plus_p;
real d1_plus_q;
real d1_minus_p;
real d1_minus_q;
real d2_p;
real d2_q;
}

parameters {
real mu;
real<lower=0> omg; //Cannot be negative to ensure positive variance
real<lower=0,upper=1> alpha; //Cannot be negative to ensure positive variance
real<lower=-alpha> psi; //Cannot be negative to ensure positive variance
real<lower=0> lam; //Cannot be negative to ensure positive variance
real c0;
real c1_minus;
real c1_plus;
real c2;
real d0;
real d1_minus;
real d1_plus;
real d2;
}

transformed parameters{
real<lower=0> sigma[T];
real<lower=0, upper=(1-alpha-psi/2)> bet; //Upper limit to ensure stationarity
real nu[T];
real xi[T];
real inn[T];
real eps[T];
sigma[1] = sigma1;
bet = lam - alpha - psi/2;
inn[1] = 0.2;
nu[1] = 4;
xi[1] = 0.5;

for (t in 2:T){
sigma[t] = sqrt(omg + (alpha + psistep(mu-r[t - 1]))pow(r[t - 1] - mu,2) + betpow(sigma[t-1],2));
nu[t] = exp(log(c0)+c1_minus
fabs(inn[t-1])step(-inn[t-1])+c1_plusfabs(inn[t-1])(1-step(-inn[t-1]))+c2log(nu[t-1]-4))-4;
xi[t] = exp(log(d0)+d1_minusinn[t-1]step(-inn[t-1])+d1_plusinn[t-1](1-step(-inn[t-1]))+d2*log(xi[t-1]));
inn[t] = (r[t]-mu) / sigma[t];
}

}

model{
//Data distribution - model
for(i in 2:T){
inn[i] ~ MskT(nu[i], xi[i]);
target += -log(sigma[t])
}

//prior distributions
omg~beta(omg_p, omg_q);
alpha~beta(psi_p, psi_q);
psi~beta(alpha_q, alpha_q);
lam~beta(lam_p, lam_q);
mu~normal(mu_mu, mu_sigma);
c0~lognormal(c0_p, c0_q);
c1_minus~normal(c1_minus_p, c1_minus_q);
c1_plus~normal(c1_plus_p, c1_plus_q);
c2~normal(c2_p, c2_q)T[-1,1];
d0~lognormal(d0_p, d0_q);
d1_minus~normal(d1_minus_p, d1_minus_q);
d1_plus~normal(d1_plus_p, d1_plus_q);
d2~normal(d2_p, d2_q)T[-1,1];
}

Program_1.R (5.8 KB)
Program_1.stan (2.3 KB)
Program_2.stan (3.1 KB)
Program_2_R.R (3.5 KB)Univariate skewed t distribution.stan (801 Bytes)
Test_data.csv (36.3 KB)

That’s what I meant when I said it looks like you’re missing a Jacobian term. If you apply a non-linear transform to a parameter (or function of a parameter) and then give that a distribution, you need the Jacobian adjustment to account for the change of variables on the density.

So that’s probably a non-centered vs. centered parameterization thing if you have a latent sequence of parameters. They wind up all being very highly correlated with each other.

Regarding the last. is there a way to solve it in Stan, or is the model just not fit for Stan?

Yes, you can non-center just as with everything else if there are latent parameters.

Instead of

parameters {
  vector[N] z;
  real<lower = 0> sigma;
}
model {
  z[1] ~ normal(0, sigma);
  for (n in 2:N)
    z[n] ~ normal(z[n - 1], sigma)
}

the non-centered version defines a standardized innovation epsilon_std as the parameter, gives it a normal(0, 1) prior, and then scales and shifts the values in the transformed parameter block.

parameters {
  vector[N] epsilon_std;  // standardized innovation
  real<lower = 0> sigma;
}
transformed parameters {
  vector[N] epsilon = sigma * epsilon_std;  // innovation
  vector[N] z;  ...
  z[1] = epsilon[1];
  for (n in 2:N)
    z[n] = z[n - 1] + epsilon[n];
 ...
}
model {
  epsilon_std ~ normal(0, 1);