Multivariate hierarchical outcomes correlated within a group

techniques
performance

#1

I’m trying to model a fairly straightforward hierarchical model. For clarity, I will provide a “toy” example with simulated data since my actual use case is a part of a much larger model.

I make observations of 3 variables, each observation comes from one of two groups, the group labels are known. For samples in group 1, the predictors are correlated, so I model them as a multivariate Gaussian distribution. In group 2, however, the predictors are independent and are modeled as 3 independent Gaussians.

I run into the following problem: the posterior for the covariance matrix strongly suggests that there is no covariance between the predictors in group 1. My R snippet for simulated data and Stan code are below.

set.seed(8)
N_half <- 30
N <- N_half * 2
M <- 3

# group indexes:
g1_idx <- c(rep_len(1, N_half), rep_len(0, N_half))
g2_idx <- 1 - g1

mu1 <- 1:3
Sigma1 <- rbind(
  c(1, 0.9, 0.8),
  c(0.9, 1, 0.7),
  c(0.8, 0.7, 1)
)
y1 <- MASS::mvrnorm(n = N_half, mu = mu1, Sigma = Sigma1)

mu2 <- 1:3
y2 <- MASS::mvrnorm(n = N_half, mu = mu2, Sigma = diag(1, nrow = 3))

y <- rbind(y1, y2)
data {
  int<lower=1> N; // number of samples
  int<lower=1> M; // number of predictors
  int<lower=0, upper=1> g1_idx[N]; // group 1 indicator
  int<lower=0, upper=1> g2_idx[N]; // group 2 indicator
  vector[M] y[N]; // response
}
parameters {
  // group 1 params:
  vector[M] mu_g1;
  cholesky_factor_corr[M] L_Omega;
  vector[M] L_sigma;
  vector[M] g1; // group 1 effect

  // group 2 params:
  vector[M] mu_g2;
  vector<lower=0>[M] sigma_g2;
  vector[M] g2; // group 2 effect

  real<lower=0> sigma; // response sigma
}
model {
  // group 1: correlated predictors within the group
  matrix[M, M] L_Sigma;
  L_Omega ~ lkj_corr_cholesky(1);
  L_sigma ~ cauchy(0, 2.5);
  L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
  mu_g1 ~ normal(0, 5);
  g1 ~ multi_normal_cholesky(mu_g1, L_Sigma);

  // group 2: uncorrelated predictors
  mu_g2 ~ normal(0, 5);
  sigma_g2 ~ normal(0, 5);
  g2 ~ normal(mu_g2, sigma_g2); 

  // likelihood: 
  for (n in 1:N)
    if (g1_idx[n] == 1)
      y[n] ~ normal(g1, sigma);
    else
      y[n] ~ normal(g2, sigma);
}
generated quantities {
  // recover correlation matrix
  corr_matrix[M] Omega = multiply_lower_tri_self_transpose(L_Omega);
}

I also have a couple of implementation questions about the model above:

  1. Instead of declaring and assigning to a local variable L_Sigma in the model block, can I define it in transformed parameters block as shown below? Should I expect any performance differences between these two implementations?
...
transformed parameters {
  matrix[M, M] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
}
model {
  L_Omega ~ lkj_corr_cholesky(1);
  L_sigma ~ cauchy(0, 2.5);
  mu_g1 ~ normal(0, 5);
  g1 ~ multi_normal_cholesky(mu_g1, L_Sigma);

  ...
}
  1. Originally, I implemented the likelihood as follows. Are there performance advantages to this and which implementation is preferred?
model {
  ...

  // likelihood:
  for (n in 1:N)
    y[n] ~ normal(g1_idx[n] * g1 + g2_idx[n] * g2, sigma);
}

Thanks for your help!


#2

The first thing that sticks out to me here is your model is more complicated than your data generating process.

You’re drawing from two different multivariate normals in your test data, but the model has all sorts of latent parameters and such.

There could be a number of things happening here, but the place to start if you want to evaluate if a model is fit-able or not is generate data from the exact model. Doing the data generation and fit in Stan can cut down on accidental bugs between writing in two different languages.

I’d actually start with your simpler just-a-multivariate model (like you have in R), get that exact thing working in Stan, and build from there.

Also, 30 data points doesn’t seem like a lot? I dunno. Maybe it’s enough. Hope that helps! Good luck!