Trouble recreating SUG MVN prior example

Hiya folks. I’ve been trying to better understand the use of the MVN distribution for priors in a hierarchical model. Ultimately, I’m hoping to develop models with MVN priors on regression coefficients and covariance priors informed by group similarity. For now, however, I am starting with trying whats in the SUG here. I first tried to modify the example code to get rid of the group-level predictors because I don’t often deal with those and wanted to keep it simple. I simulated some code form that model and tried running it, only to face fine convergence on reasonable estimates but many divergent transitions. Then, I decided to try to recreate exactly whats in the SUG (including the group-level predictors), but similarly ran into many (180ish) divergent transitions. Here is the Stan code (copied from the above link) that I am working with:

data {
  int<lower=0> N;              // num individuals
  int<lower=1> K;              // num ind predictors
  int<lower=1> J;              // num groups
  int<lower=1> L;              // num group predictors
  array[N] int<lower=1, upper=J> jj;  // group for individual
  matrix[N, K] x;              // individual predictors
  array[J] row_vector[L] u;    // group predictors
  vector[N] y;                 // outcomes
parameters {
  corr_matrix[K] Omega;        // prior correlation
  vector<lower=0>[K] tau;      // prior scale
  matrix[L, K] gamma;          // group coeffs
  array[J] vector[K] beta;     // indiv coeffs by group
  real<lower=0> sigma;         // prediction error scale
model {
  tau ~ cauchy(0, 2.5);
  Omega ~ lkj_corr(2);
  to_vector(gamma) ~ normal(0, 5);
    array[J] row_vector[K] u_gamma;
    for (j in 1:J) {
      u_gamma[j] = u[j] * gamma;
    beta ~ multi_normal(u_gamma, quad_form_diag(Omega, tau));
  for (n in 1:N) {
    y[n] ~ normal(x[n] * beta[jj[n]], sigma);

Here is the code that I wrote to try to simulate an appropriate dataset in R:


# data
N = 600
K = 3
J = 3
L = 3
jj = rep(1:J, each = N/J)
x = matrix(data = rnorm(N*K), nrow = N, ncol = K)
u = matrix(data = rnorm(J*L), nrow = J, ncol = L)

# parameters
Omega <- rlkjcorr(1, K, 2)
tau <- rhalfcauchy(K, 0.25)
Sigma <- diag(tau) %*% Omega %*% diag(tau)
gamma <- matrix(data = rnorm(L*K, 0, 5), nrow = L, ncol = K)
sigma <- 1

beta <- matrix(NA, nrow = J, ncol = K)
for (j in 1:J) {
  u_gamma <- u[j, ] %*% gamma
  beta[j, ] <- mvrnorm(1, mu = u_gamma, Sigma = Sigma)

y <- numeric(N)
for (n in 1:N) {
  y[n] <- rnorm(1, mean = sum(x[n, ] * beta[jj[n], ]), sd = sigma)

stan_data <- list(
  N = N,
  K = K,
  J = J,
  L = L,
  jj = jj,
  x = x,
  u = u,
  y = y

test <- stan(file = "~/Downloads/MVN_from_guide.stan",
             data = stan_data,
             chains = 2, cores = 2)

I suspect that the issue has to do with my simulation as I am pretty new to working with MVN distributions. Any help in identifying why I might be struggling to fit the model in the SUG would be greatly appreciated!

Without looking at too much else of the code, often when the chains converge but there are divergences it has to do with suboptimal posterior geometry. Especially with random effects, the noncentered parameterisation often works better. For univariate normals it looks like this:

parameters {
  real<lower=0> tau;  // scale of random effects
  vector[N] z;  // standard-normal z-scores
transformed parameters {
  vector[N] mu = tau * z;  // equivalent to mu ~ normal(0, tau)

For multivariate normals, it’s pretty well the same thing except now you use the Cholesky factor of the covariance matrix, which is the multivariate analog of the standard deviation to the variance. The Cholesky factor of the covariance matrix can be obtained as follows:

parameters {
  vector<lower=0>[D] tau;  // scales of D dimensions
  cholesky_factor_corr[D] Omega_L;  // lower triangular Cholesky factor of correlation matrix
  matrix[D, N] z;  // standard-normal z-scores
transformed parameters {
  matrix[D, N] mu = diag_pre_multiply(tau, Omega_L) * z;  // equivalent to mu ~ multi_normal(0, Sigma)

Maybe trying this would help the sampler?

1 Like

This is a helpful suggestion, but implementing still landed me with a bunch of divergent transitions. Because I seem to be running into the same problem whether I use the SUG’s more complicated model or my reduced version, I’ll share my reduced version here. The main difference being that I took out the group-level predictors and simplified the problem down to just one individual-level predictor variable. So what I’m shooting for is something like:

y \sim Normal(\beta_0 + x\beta_j, \sigma) \\ \beta \sim MVN(\mu,\Sigma)

where \beta_j is the effect of x on y in for group j, and these effects are MVN with mean group effects \mu and covariance matrix \Sigma. I tried to implement your suggested uncentered method, modifying your code such that there was just group level deviances (ie z is a vector with length equal to the number of groups) rather than what looks like individual-level deviances (I think that is why your z is a matrix?). I also tried executing this method with the means of the MVN equal to zero as in your code as well as with the means coming from their own distribution, and both resulted in similar convergence on decent estimates but still with many divergences. Here is the code that I landed on that is still resulting in divergences:

data {
  int<lower=0> N; // number of samples
  int<lower=0> J; // number of groups
  vector[N] y; // response variable
  vector[N] x; // predictor variable
  int j[N]; // group identity

parameters {
  real b; // intercept
  vector<lower=0>[J] tau; // scale parameter for MVN of coefficients
  vector[J] mu; // mean coefficient values for each group
  cholesky_factor_corr[J] Omega_L;  // lower triangular Cholesky factor of correlation matrix
  vector[J] z;  // standard-normal z-scores
  real sigma; // variance in response variable

transformed parameters{
  vector[J] beta; // group coefficients
  beta = mu + diag_pre_multiply(tau,Omega_L) * z;

model {
  b ~ normal(0, 1);
  z ~ normal(0, 1);
  tau ~ exponential(1);
  Omega_L ~ lkj_corr_cholesky(1);
  sigma ~ exponential(1);
  mu ~ normal(0, 1);
  for(i in 1:N){
      y[i] ~ normal(b + x[i]*beta[j[i]], sigma);

and here is the R code I used to simulate the data for this model:

N = 1000
J = 10
j = rep(1:J, each = N/J)
x = rnorm(N)
sigma = .2
eta <- 1
lam <- 2

mu_mu = .5
sig_mu = .2

Omega <- rlkjcorr(1, J, eta)
tau <- rexp(J, lam)

Sigma <- round(diag(tau, J, J) %*% Omega %*% diag(tau, J, J), 10)

b = 0
mu = rnorm(J, mu_mu, sig_mu)
beta = rmvn(1, mu, Sigma)

y = rep(NA, N)
for(i in 1:N){
  y[i] = rnorm(1, b + x[i] * beta[j[i]], sigma)

stan_dat <- list(N = N, J = J, y = y, x = x, j = j)

Still stumped as to why I’m getting divergences. Your recommendations have been helpful and insightful (and certainly let me know if I made an error trying to implement these recommendations)! I’ll keep digging, but anyone’s help would be more than appreciated in trying to get this running smoothly.

Oh wait, can you try restricting sigma to be positive?

Ah good catch! Though that didn’t seem to resolve the issue. <lower=0> on sigma still results in divergent transitions.