Bayesian Bekk models

Hi everyone, I am trying to fit a Bayesian Bekk model, but I can not control the sampling to get a symmetric covariance matrix. How can I improve the rejection of my sampling?


When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

data {
  int<lower=0> n;   // number of data items
  int<lower=1> d;   // number of dimensions
  int<lower=0> s;   // number of predictors arch
  int<lower=0> k;   // number of predictors garch
  row_vector[d]y[n];// outcome matrix
}
transformed data{
  row_vector[d] zero =  rep_row_vector(0, d); 
  matrix[d,d] Id = diag_matrix(rep_vector(1.0,d));
}
parameters {
  row_vector[d] mu;           // Location parameter
  cholesky_factor_cov[d] alpha0; //arch constant
  matrix[d,d] alpha[s]; // arch coefficients
  matrix[d,d] beta[k]; // garch coefficients
}
transformed parameters {
  cov_matrix[d] sigma[n];
  for(i in 1:n){
    if(i <= k){
      sigma[i]=Id;
    }
    else{
      sigma[i] = multiply_lower_tri_self_transpose(alpha0);
      for(j in 1:s){
        sigma[i] = sigma[i] + quad_form( (y[i-j] - mu)'*(y[i-j]-mu),alpha[j]); 
      }
      for(j in 1:k){
        sigma[i] = sigma[i] +quad_form(sigma[i-j],beta[j]);
      }
    }
  }
}
model{
  // Likelihood
  for(i in 1:n)
    y[i] ~ multi_normal_cholesky(mu,cholesky_decompose(sigma[i]));
}

In a GP one would usually add some small value on a diagonal (delta = 1e-9) to overcome the numerical inaccuracies.

What do you mean gp?

Gaussian Process

See https://github.com/stan-dev/example-models/blob/master/misc/gaussian-process/gp-fit-latent.stan

Thanks I tried, but I still have this error:

"Chain 1: Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Chain 1: Exception: validate transformed params: sigma[k0__] is not positive definite. (in ‘model21e462e22048_VarBekk’ at line 30)
Chain 1: If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
***Chain 1: but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.***"

And its because the sampling of sigma generates a non symmetric matrix. And I think this is what makes my sampling so ill and slow

Have you tried changing cov_matrix[d] sigma[n]; to matrix[d,d] sigma[n];? The cov_mat type just enforces some strict validation checks at the bottom of the transformed parameters block (to which your error message is referring).

Also, it looks like you don’t have any priors, which could potentially help the sampler avoid spaces where the covariance matrix is really nasty. Additionally, it’s probably better to decompose alpha0 into separate cholesky correlation and scale matrices, which are easier to put priors on.

Hi Christopher, thanks for your advices.

Yes I tried to do it but then the normal sampling (cause the cholesky need a symmetric matrix) have problems. I just put priors to all my parameters except to alpha0 cause I don’t know how to do what you told me. After putting priors the sampling got so much better but I still have the same errors with sigma[i].

I also try to put a wishart prior to sigma, and LKJ(d) to alpha[s] and beta[k], but I am not sure about the wishart.

I’m not particularly familiar with Bekk models, but I’m under the impression that alpha and beta aren’t required to be positive definite; in that case, the LKJ prior on them is inappropriate and could lead to some sampling problems (also, the LKJ parameter probably shouldn’t be set to matrix’s dimensionality). It’s also probably not a good idea to put a prior directly on sigma.

Here’s an example with the sigma0 decomposition; I also threw in some tentative priors, which you should probably tweak or update with better domain-specific knowledge.

data {
  int<lower=0> n;   // number of data items
  int<lower=1> d;   // number of dimensions
  int<lower=0> s;   // number of predictors arch
  int<lower=0> k;   // number of predictors garch
  row_vector[d]y[n];// outcome matrix
}
transformed data{
  row_vector[d] zero =  rep_row_vector(0, d); 
  matrix[d,d] Id = diag_matrix(rep_vector(1.0,d));
  matirx[d,d] L_Id = cholesky_factor(Id);
}
parameters {
  row_vector[d] mu;           // Location parameter
  cholesky_factor_corr[d] L_alpha0_corr; //arch constant correlation
  vector<lower=0>[d] alpha0_scale; //arch constant scale
  matrix[d,d] alpha[s]; // arch coefficients
  matrix[d,d] beta[k]; // garch coefficients
}
transformed parameters {
  matrix[d,d] alpha0 = multiply_lower_tri_self_transpose( 
                                     diag_pre_multiply(alpha0_scale, L_alpha0_corr)); // This would also be a place to add some small number to the diagonal to help w/ numerical stability
 matrix[d,d] L_sigma[n];
 matrix[d,d] sigma[n];
for(i in 1:n){
    if(i <= k){
      sigma[i] = Id;
      L_sigma[i]=L_Id; // define as cholesky of Id in transformed data
    }
    else{
      sigma[i] = alpha0;
      for(j in 1:s){
        sigma[i] = sigma[i] + quad_form( (y[i-j] - mu)'*(y[i-j]-mu),alpha[j]); 
      }
      for(j in 1:k){
        sigma[i] = sigma[i] +quad_form(sigma[i-j],beta[j]);
      }
     L_sigma[i] = cholesky_decompose(sigma[i]);
    }
  }
}

model{
  // Priors (feel free to change hyperparameters, or set them in data
  alpha0_scale ~ student_t(7, 0, 1); // 
  L_alpha0_corr ~ lkj_cholesky_corr(2); // parameter should probably be between 1 and 2
  for(i in 1:s) to_vector(alpha[i]) ~ normal(0, 5); 
  for(i in 1:k) to_vector(beta[i]) ~ normal(0, 5);
  mu ~ normal(0, 10); 

  // Likelihood
  for(i in 1:n)
    y[i] ~ multi_normal_cholesky(mu, L_sigma[i]);
}

Hi Chris,

thanks for your help I have been checking your updates, and the code is faster now, I still have the same warnings but is faster.

I really appreciate your help.

Asael