Hierarchical model for multi_normal response w/per-group covariance

I’d like to fit a hierarchical model in stan, where the response is not univariate. However, I can only find examples
for when the response is univariate. Here’s a super simple example model for the univariate case, just to be clear about what I mean:

set.seed(42)
obs_per_group <- 50
data <- list(num_groups = 10)
data$num_obs <- data$num_groups*obs_per_group
data$Y <- rnorm(data$num_obs)
data$group_inds <- rep(1:data$num_groups,each=obs_per_group)
data$Y <- data$Y + rnorm(data$num_groups)[data$group_inds]

library('rstan')
compiled_model <- stan_model(model_code = 
"
data {
  int<lower=1> num_obs;
  int<lower=1> num_groups;
  vector[num_obs] Y;
  int<lower=1> group_inds[num_obs];
}
parameters {
  real mu;
  real sigma_offset;
  real sigma_obs;
  vector[num_groups] offset_per_group;
}
model {
  vector[num_obs] yhat;

  mu ~ normal(0,1); 
  sigma_offset ~ lognormal(0,1);
  offset_per_group ~ normal(0, sigma_offset);

  sigma_obs ~ lognormal(0,1);
  yhat = mu + offset_per_group[group_inds];
  Y ~ normal(yhat, sigma_obs);
}
")

Now, I’d like to fit a similar model, but where Y is a matrix with K columns, one for each response-per-row.

The tricky thing is that, not only do I want to model the population-level covariance/correlation between the responses-- I also want to capture any per-group differences from this average amount of covariance/correlation. I.e. this group is especially correlated between his Y[,1] and Y[,2] responses, this other group is not very correlated between his Y[,1] and Y[,2] responses, etc.

So in the univariate case, including per-group offsets is as easy as add mu + offset_per_group. But I’m not sure how this would work in the case of correlation or covariance matrices? How do I generate the per-group covariance/correlation matrices?

This case is slightly simpler than the discussion of multi-output GPs at the end of chapter 18 of the 2.17 Stan User’s Manual. Basically, you still have multiple outcomes but dependence within groups of observations instead of the whole dataset.

I’ll admit I found that section hard to follow, and I’m not entirely seeing the connection. In that section, the dependencies across the whole dataset are captured with a multivariate normal whose covariance matrix has entries for each row in the dataset. In my case, would the idea be to have a separate covariance matrix for each group, with entries for each row within each group? That seems like overkill here, right? I don’t care about dependencies across rows (except insofar as there’s a common group mean), I care about dependencies across columns.

And even then, I’m not seeing how I could capture the “population parameter / group offsets from that parameter” aspect for the correlation across response-dimensions. That is, I don’t want the estimation of the correlation matrices for each group to be done entirely separately, I want their estimates to be partially pooled towards some population level estimate.

Sorry if I’m missing something completely obvious here.

Rather than a scalar offset, you get a vector offset. The typical thing to do would be to give the vectors for the groups a multivariate hierarchical prior:

vector[K] offsets_per_group[num_groups];
...
offsets_per_group ~ multi_normal_cholesky(mu, L_Sigma);

There’s an example in the manual using the LKJ correlation prior with a Cholesky factor that shows the best way to code it.

I don’t think there are any per-group covariance/correlation matrices involved. It’s at the level of prior.

The model you’ve described would capture the fact that the responses are correlated overall. What I’m hoping for, though, is a model that captures the fact that the responses can be more correlated within some groups, and less correlated in others. Does this make sense?

(And forgot to say: thanks to both of you for the replies!)

Yes, but I’m not sure what kind of hierarchical prior to give the covariance matrix in that case. You could have no pooling if there’s enough data. Or someone else may have a suggestion.

Right, that was where I got stuck as well-- it’s not clear what kind of ‘offset’ could be used that would keep (e.g.) a positive-definite constraint.

I think Wisharts sort of have the right properties, being conjugate priors. They’re not very efficient in Stan, though, as we don’t have Cholesky-based versions.

Would it make sense to place the offsets on the elements of the cholesky-factorized-matrix itself? I know the diagonal needs to be positive, but I wasn’t sure what the constraints on the off-diagonal elements are; or if there’s some other reason this would be a bad idea.

For covariance matrices, there are no constraints on the off-diagonals. For correlation matrices, the rows have to have L2 norms of 1.

The issue is what it means it to put priors on the components. The LKJ priors break that down into partial correaltions so the whole thing fits back together to give a reasonable prior on correlation matrices.

Bumping this up again, since I’m trying to build a similar model. @jwdink , did you ever find a solution to this?

I did not, sorry.