I’m trying to model a fairly straightforward hierarchical model. For clarity, I will provide a “toy” example with simulated data since my actual use case is a part of a much larger model.

I make observations of 3 variables, each observation comes from one of two groups, the group labels are known. For samples in group 1, the predictors are correlated, so I model them as a multivariate Gaussian distribution. In group 2, however, the predictors are independent and are modeled as 3 independent Gaussians.

I run into the following problem: the posterior for the covariance matrix strongly suggests that there is no covariance between the predictors in group 1. My R snippet for simulated data and Stan code are below.

```
set.seed(8)
N_half <- 30
N <- N_half * 2
M <- 3
# group indexes:
g1_idx <- c(rep_len(1, N_half), rep_len(0, N_half))
g2_idx <- 1 - g1
mu1 <- 1:3
Sigma1 <- rbind(
c(1, 0.9, 0.8),
c(0.9, 1, 0.7),
c(0.8, 0.7, 1)
)
y1 <- MASS::mvrnorm(n = N_half, mu = mu1, Sigma = Sigma1)
mu2 <- 1:3
y2 <- MASS::mvrnorm(n = N_half, mu = mu2, Sigma = diag(1, nrow = 3))
y <- rbind(y1, y2)
```

```
data {
int<lower=1> N; // number of samples
int<lower=1> M; // number of predictors
int<lower=0, upper=1> g1_idx[N]; // group 1 indicator
int<lower=0, upper=1> g2_idx[N]; // group 2 indicator
vector[M] y[N]; // response
}
parameters {
// group 1 params:
vector[M] mu_g1;
cholesky_factor_corr[M] L_Omega;
vector[M] L_sigma;
vector[M] g1; // group 1 effect
// group 2 params:
vector[M] mu_g2;
vector<lower=0>[M] sigma_g2;
vector[M] g2; // group 2 effect
real<lower=0> sigma; // response sigma
}
model {
// group 1: correlated predictors within the group
matrix[M, M] L_Sigma;
L_Omega ~ lkj_corr_cholesky(1);
L_sigma ~ cauchy(0, 2.5);
L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
mu_g1 ~ normal(0, 5);
g1 ~ multi_normal_cholesky(mu_g1, L_Sigma);
// group 2: uncorrelated predictors
mu_g2 ~ normal(0, 5);
sigma_g2 ~ normal(0, 5);
g2 ~ normal(mu_g2, sigma_g2);
// likelihood:
for (n in 1:N)
if (g1_idx[n] == 1)
y[n] ~ normal(g1, sigma);
else
y[n] ~ normal(g2, sigma);
}
generated quantities {
// recover correlation matrix
corr_matrix[M] Omega = multiply_lower_tri_self_transpose(L_Omega);
}
```

I also have a couple of implementation questions about the model above:

- Instead of declaring and assigning to a local variable
`L_Sigma`

in the model block, can I define it in transformed parameters block as shown below? Should I expect any performance differences between these two implementations?

```
...
transformed parameters {
matrix[M, M] L_Sigma = diag_pre_multiply(L_sigma, L_Omega);
}
model {
L_Omega ~ lkj_corr_cholesky(1);
L_sigma ~ cauchy(0, 2.5);
mu_g1 ~ normal(0, 5);
g1 ~ multi_normal_cholesky(mu_g1, L_Sigma);
...
}
```

- Originally, I implemented the likelihood as follows. Are there performance advantages to this and which implementation is preferred?

```
model {
...
// likelihood:
for (n in 1:N)
y[n] ~ normal(g1_idx[n] * g1 + g2_idx[n] * g2, sigma);
}
```

Thanks for your help!