Hi everyone, I am using a hierarchical model for a meta analysis, with a model very similar to the original Rubin’s Randome Effect model. Some observations that we have in our dataset come from the same RCT, either different treatment arms or different timings of measurement and so I would like to put a correlation structure, while maintaining independece across different RCTs.
I have an issue in the way I declare the block covariance matrix, because Stan does not allow dynamic arrays and groups are of different sizes. I tried some work-arounds by declaring matrices of excessive sizes and then only using the dimension I actually need, but the model does not perform well and I suspect it is because of my cave man programming and approach.
Does anyone have a good fix for it? I am sure the problem is not new, having within group block covariance matrices.
data {
int<lower=0> N; // num observations
int<lower=0> J; // num RCTs (groups)
int<lower=1, upper=J> group[N]; // group indicator (which RCT each observation belongs to)
vector[N] y; // outcome
vector[N] se; // vector of standard error estimates
int<lower=0> group_size[J]; // number of observations per group (size of each RCT)
}
parameters {
real t; // common mean for the theta's
real<lower=0> sigma_theta; // standard deviation of group means
// Latent variables for within-group correlation for groups with multiple observations
vector[N] z; // uncorrelated latent variables for observations
cholesky_factor_corr[max(group_size)] L_Omega[J]; // Cholesky factor of correlation matrix for each group
}
transformed parameters {
vector[N] theta; // Observation-level theta with group correlations
// Apply index trick to theta[j] directly for groups with multiple vs single observations
for (n in 1:N) {
theta[n] = t + z[n] * sigma_theta;
}
}
model {
// Priors
t ~ normal(0, 5); // Prior for the common mean
sigma_theta ~ normal(0, 5); // Prior for the group-level standard deviation
for (j in 1:J) {
L_Omega[j] ~ lkj_corr_cholesky(1); // Prior for the Cholesky factor of each group
}
// Correlation for groups with multiple observations
for (j in 1:J) {
if (group_size[j] > 1) {
vector[group_size[j]] z_group;
int start_idx = sum(head(group_size, j - 1)) + 1;
z_group = z[start_idx:(start_idx + group_size[j] - 1)];
// Extract submatrix from L_Omega to match group size
matrix[group_size[j], group_size[j]] L_sub = L_Omega[j][1:group_size[j], 1:group_size[j]];
z_group ~ multi_normal_cholesky(rep_vector(0, group_size[j]), L_sub); // Correlated deviations within groups
} else {
z[j] ~ normal(0, 1); // Single observation groups get standard normal deviations
}
}
// Likelihood for observations
for (n in 1:N) {
y[n] ~ normal(theta[n], se[n]);
}
}
Any suggestions? (secretly hoping @betanalpha has an answer)
Yes, that’s not going to work because of the complicated constraint structure of covariance matrices.
If you really don’t know the number of groups ahead of time and you really want covariance matrices per group and they’re of different sizes, then the only recourse of which I’m aware is to serialize the coefficients and define your own covariance matrix transforms. The ragged data structure chapter of the User’s Guide shows how to do that.
The transform for the Cholesky factors isn’t bad for covariance because it’s just a log transform of the diagonal. But if you want to make it a Cholesky factor for a correlation matrix, that is going to be painful as the construction is much more involved to ensure a unit diagonal in the correlation matrix. The math’s in the constraint transforms chapter of the Reference Manual.
We plan to add ragged arrays at some point which will make this easier, but I’m afraid we don’t have it yet.
As an alternative to adding correlation matrices for groups, which are a kind of fixed effect, you can also model correlation by adding group-level random effects.
Hi @Bob_Carpenter, thank you very much for your reply. I am not sure if I completely understood what you said in the second paragraph.
I do know the number of groups and how many observations I have per group, but I have different outcomes and it feels a bit inefficient to code a model ad hoc for a specific dataset. But I guess what you are saying is that I could manually code all y_1,..., y_G as separate inputs and then assign distributions this way, right?
The other alternative, using the ragged data structure is to have one big covariance matrix and then slice it when I assign group level distributions. But then how do I ensure that the off block diagonal elements are all zeros?
I am intrigued by the group-level random effect suggestion. You are thinking of something like \theta_{n,g} = t + \gamma\eta_g + \epsilon_{n}, so that cov(\theta_{n,g}, \theta_{m,g}) = \gamma var(\eta_g) and then \gamma determines the sign of the intra-group correlation? It could also be \gamma_g.
Try using a list structure for your covariance matrices instead of fixed sizes. This lets you define each group’s covariance dynamically. Also, consider using array types in your parameters block for more flexibility.
Hi Aida, thanks a lot for the suggestion. Do you have some reference code I could look at, as an example of what you are saying? A simple example will do.
Yes. Not pretty, but neither is packing things into an array and doing your own transforms. This is something where we really need ragged arrays.
Not quite. The other alternative I was sketching is to declare a bunch of unconstrained parameters the total size of the Cholesky factors of all your matrices (each has n choose 2 parameters if it’s n x n). Then you need to manually slice out enough parameters per factor that you can transform them into an appropriate Cholesky factor for correlation (or covariance) matrices. That’s not so bad for covariance matrices, but it’s challenging for correlation matrices. That’s why I was suggesting looking at the ragged array chapter of the Stan User’s Guide:
Even then, you’re going to have to use something like a function that doesn’t have fixed sizes to process and then it’s tricky to assign other than to use directly. I wouldn’t recommend this approach other than as a last resort.
That’d work in R or Python, but I’m afraid we don’t have a list data structure in Stan. We have tuples, but those require size declarations and type declarations up front.