Estimation issue in dynamic multivariate stochastic volatility (DMSV) model

[edit: fixed LaTeX error]

Hello Everyone,

I’m trying to fit a dynamic multivariate stochastic volatility (DMSV) model in stan. The model is as follows:

\boldsymbol{r}_{t}= \boldsymbol{L}_{t} \boldsymbol{H}_{t}^{1/2} \epsilon_{t}, \epsilon_{t}\sim \text{MVN}(\boldsymbol{0,I_p}), \\ h_{i,1}=\mu+\eta_{i,1}, \eta_{i,1}\sim N(0,\frac{\sigma_i^2}{1-\phi_i^2}), \\ h_{i,t}=\mu+\phi_i(h_{i,t-1}-\mu)+\eta_{i,t},\eta_{i,t} \sim N(0,\sigma_i^2), t=2,\cdots,n; i=1,\cdots,p, \\ \textbf{H}_{t}=diag(\exp(h_{1,t}), \dots,\exp(h_{p,t})),

where
\boldsymbol{L}_{t}= \begin{pmatrix} 1 & 0 & \cdots & 0\\ q_{21,t} & 1 & \ddots & \vdots\\ \vdots & \ddots & \ddots & 0 \\ q_{p1,t} & \cdots & q_{p,p-1,t} & 1 \end{pmatrix}

and q_{it} is defined as an AR(1) process:

q_{i,t}=\beta_{i,0}+\beta_{i,1}(q_{i,t-1}-\beta_{i,0})+\sigma_{i,\rho}\nu_{it}, \nu_{it} \sim N(0,1), i = 1,\cdots,p(p-1)/2


The stan code is as follows:

data {
  int<lower=1> T;            // Number of time points
  int<lower=0> p;
  matrix[T, p] y;            // Observations
  int<lower=1> q_dim;
}

parameters {
  vector[p] mu;              // Mean of log-volatility process
  vector<lower=0>[p] sigmasq;  // Volatility of log-volatility process
  vector<lower=0, upper=1>[p] phistar;  // Transformed persistence parameter
  matrix[T, p] h_std;        // Standardized log-volatility process
  vector<lower=0, upper=1>[q_dim] betastar; // Transformed AR(1) process persistence parameter
  vector<lower=0>[q_dim] sigmasq_rho; // Volatility of AR(1) process for q
  matrix[T, q_dim] nu_raw; // Non-centered innovations for q
}

transformed parameters {
  vector<lower=-1, upper=1>[p] phi;
  phi = 2 * phistar - 1;
  
  matrix[T, p] mu_sv;    
  matrix[T, p] h;
  for (j in 1:p) {
    mu_sv[1, j] = mu[j];
    h[,j]=h_std[,j]*sqrt(sigmasq[j]);
    h[1,j] /= sqrt(1 - phi[j]*phi[j]);
    h[1,j] += mu_sv[1, j];
    for (t in 2:T) {
      mu_sv[t, j] = mu[j] + phi[j]* (h[t - 1, j] - mu[j]);
      h[t,j] += mu_sv[t, j];
    }
  }

  vector<lower=-1, upper=1>[q_dim] beta1;
  beta1 = 2 * betastar - 1;

  matrix[q_dim, T] mu_ar;
  matrix[q_dim, T] q_ar; // Transformed lower triangular elements of L_t
    for (j in 1:(q_dim)) {
    mu_ar[j, 1] = 0;
    q_ar[j,]=to_row_vector(nu_raw[,j]*sqrt(sigmasq_rho[j]));
    q_ar[j,1] /= sqrt(1 - beta1[j]*beta1[j]);
    q_ar[j,1] += mu_ar[j, 1];
    for (t in 2:T) {
      mu_ar[j, t] = beta1[j]* (q_ar[j, t - 1]);
      q_ar[j,t] += mu_ar[j, t];
    }
  }

  array[T] matrix[p, p] L;  
  for (t in 1:T) {
    int idx = 1;
    for (i in 1:p) {
      for (j in 1:(i-1)) {
        L[t, i, j] = q_ar[idx, t];
        idx += 1;
      }
      L[t, i, i] = 1.0;
      for (j in (i+1):p) {
        L[t, i, j] = 0.0;
      }
    }
  }
}

model {
  // Priors
  mu ~ normal(0, 10);
  sigmasq ~ inv_gamma(1, 1);
  phistar ~ beta(1, 1);
  betastar ~ beta(1, 1);
  sigmasq_rho ~ inv_gamma(1, 1);
  
  // Non-centered parameterization priors
  to_vector(h_std) ~ std_normal();
  to_vector(nu_raw) ~ std_normal();
  
  // Likelihood
  for (t in 1:T) {
  vector[p] h_t_exp = exp(to_vector(h[t, ]/2));
  matrix[p, p] H_t = diag_matrix(h_t_exp);
  matrix[p, p] cov_mat = L[t]*H_t *H_t'*L[t]';
  target += multi_normal_lpdf(y[t] | rep_vector(0, p), cov_mat); // Use cov_mat
}
}


I’m having trouble recovering the parameters in q_ar (i.e, beta1, sigmasq_rho). I have tried a few different approaches and priors but none of them seem to help. I’m not sure what I’m doing wrong. Any help with this will be greatly appreciated. The simulation code is as follows:

library(cmdstanr)
set.seed(1)

# Simulation parameters:
m <- 1
T <- 5000
p <- 4 # dimension of the multivariate model
y <- array(0, c(m, T, p))
Sigma <- matrix(1, nrow = p, ncol = p)

require(MASS)

for (j in 1:m){
  mu=c(-9,-9.5,-8.5,-8)
  itau2=c(0.9,0.7,0.5,0.3)
  phi=c(0.7,0.5,0.3,0.9)
  mu_ar <- c(0,0,0,0,0,0) # setting this to 0 for convenience
  beta_ar <- c(0.45, 0.4, 0.15, 0.15, 0.15, 0.2)
  sigmasq_rho <- c(0.1, 0.1, 0.1, 0.15, 0.15, 0.2)
  h=matrix(0,T,p)
  sigma_y=array(0,c(T,p,p))
  h[1,]=rnorm(p,mean=mu,sd=sqrt(itau2/(1-phi^2)))
  q_ar <- matrix(0, T, p*(p-1)/2)
  q_ar[1,]=rnorm(p*(p-1)/2,mean=mu_ar,sd=sqrt(sigmasq_rho/(1-beta_ar^2)))
  cor_matrix <- diag(p)
  idx <- 1
  for (f in 2:p) {
    for (e in 1:(f-1)) {
      cor_matrix[f, e] <- q_ar[1, idx]
      idx <- idx + 1
    }
  }

  sigma_y[1,,]=cor_matrix%*%diag(exp(h[1,]/2))%*%t(diag(exp(h[1,]/2)))%*%t(cor_matrix)
  y[j,1,]=mvrnorm(n = 1, rep(0,p),sigma_y[1,,])
  for (i in 2:T){
    h[i,]=rnorm(p,mean=mu+phi*(h[i-1,]-mu),sd=sqrt(itau2))
    q_ar[i, ] <- rnorm(p*(p-1)/2, mean = mu_ar + beta_ar* (q_ar[i - 1, ]-mu_ar), 
                       sd = sqrt(sigmasq_rho)) 
    cor_matrix <- diag(p)
    idx <- 1
    for (f in 2:p) {
      for (e in 1:(f-1)) {
        cor_matrix[f, e] <- q_ar[i, idx]
        idx <- idx + 1
      }
    }
    sigma_y[i,,]=cor_matrix%*%diag(exp(h[i,]/2))%*%t(diag(exp(h[i,]/2)))%*%t(cor_matrix)
    y[j,i,]=mvrnorm(n = 1, rep(0,p),sigma_y[i,,] )
  }
}

#######################
# Stan estimation:
#######################

y <- data.frame(y1=y[,,1],y2=y[,,2],y3=y[,,3],y4=y[,,4])
y=as.matrix(y)

data_list <- list(
  T = T,
  p = p,
  y = y,
  q_dim = p*(p-1)/2
)

## HMC
fit=model$sample(data=data_list,seed=1,chains = 4,iter_sampling = 1000,parallel_chains = 4,threads_per_chain = 12,iter_warmup=1000,thin=4)
par_summary=fit$summary(variables = c("sigmasq[1]", "sigmasq[2]"
                                      ,"sigmasq[3]", "sigmasq[4]","phi[1]", "phi[2]","phi[3]","phi[4]"
                                      ,"mu[1]", "mu[2]", "mu[3]","mu[4]","beta1[1]", "beta1[2]","beta1[3]", "beta1[4]"
                                      ,"beta1[5]", "beta1[6]","sigmasq_rho[1]", "sigmasq_rho[2]","sigmasq_rho[3]", "sigmasq_rho[4]"
                                      ,"sigmasq_rho[5]", "sigmasq_rho[6]"))

summary=cbind(par_summary,fit$summary(c("sigmasq[1]", "sigmasq[2]"
                                        ,"sigmasq[3]", "sigmasq[4]","phi[1]", "phi[2]","phi[3]","phi[4]"
                                        ,"mu[1]", "mu[2]", "mu[3]","mu[4]","beta1[1]", "beta1[2]","beta1[3]", "beta1[4]"
                                        ,"beta1[5]", "beta1[6]","sigmasq_rho[1]", "sigmasq_rho[2]","sigmasq_rho[3]", "sigmasq_rho[4]"
                                        ,"sigmasq_rho[5]", "sigmasq_rho[6]"), quantile, .args = list(probs = c(0.025, .975)))[,c(2,3)])
summary

Thank you in advance.

Hi, @Sreeram and welcome to the Stan forums. I’m sorry we haven’t answered this sooner, but it’s generally really time-consuming to try to understand and help debug someone else’s model. Especially when it has as much code as yours.

As top-level advice, I’d suggest starting with smaller models and building up to something complex like this. That kind of scaffolded development makes it a lot easier to detect problems if it’s possible with your model.

I’m afraid I don’t have time to work through the details of the model specification. How can q_{i,t} be an AR(1) process—it’s two dimensional? Also, q seems to be taking three indexes, so what does q_{i,t} mean?

Is the function you write as diag creating a diagonal matrix from a vector? I wasn’t sure because there are two indexes. Is this meant to flatten the matrix h into a vector?

In the likelihood, it’s much more efficient and numerically stable to use

y[t] ~ multi_normal_cholesky(zeros, diag_post_multiply(L[t], h_t_exp));

where zeros is defined as rep_vector(0, p) in the transformed data block. This way, it’s only \mathcal{O}(p^2) per eval instead of \mathcal{O}(p^3), and it’s much more arithmetically stable to boot. Defining zeros once and reusing avoids vector allocation eery usage, which reduces memory pressure, which is a huge bottleneck for Stan code.

You can also save lines and make the code clearer by using a compound declaration and definition.

vector<lower=-1, upper=1>[q_dim] beta1 = 2 * betastar - 1;

There are also a lot of vectorization opportunities here.

P.S. It helps to keep the indentation in code consistent.

Thank you for your response. Forgive my notational issues. To clarify, we assume that each element “q” in the L_t matrix follows its own AR(1) process. It depends on stock “i” and time point “t”. Regarding h, we observe it as a vector and then convert it to a diagonal matrix. Thank you for your tips on improving the model I will be sure to try those out. Thank you again for your feedback.