Is covariance structure necessary in multi-level models?

I am trying to implement a hierarchical multinomial logistic model. The data are behavioural choices in a visual discrimination task, spread over several subjects and several sessions per subject. In my model I hope to incorporate global parameters as well as per-subject and per-session deviations from these global parameters.

I’ve been following a tutorial cited on the Stan website. In this tutorial they distinguish two kinds of model

  1. Varying Intercepts, Varying Slopes Mixed Effects Model
  2. Correlated Varying Intercepts, Varying Slopes Mixed Effects Model

In other texts/lectures (e.g. Richard McElreath’s), multi-level models always seem to incorporate a covariance structure. I’ve never come across an example of a varying intercepts and varying slopes model where you do NOT have covariance structure.

Therefore my question is: how necessary is the covariance structure? Would it be misleading to model varying intercepts and varying slopes without it?

Thank you

2 Likes

If there is “no” covariance structure, then you are assuming the variation in the slopes and / or intercepts is independent across groups. And no, that does not generally make any sense substantively, but many people do so either as a first step toward a more complex model that relaxes the strong independence assumptions or as the last step because they do not know how to specify dependence structures.

1 Like

Thank you for clearing that up! I realise now that the basis for my confusion about whether it’s really necessary was because I confused covariation in intercept and slope across groups, with covariation in the posterior for those same parameters but without any explicit model of that covariation. I realise now that the covariation of intercept/slope across groups is referring to the posterior modes for each member of that group.

It refers to the joint posterior distribution of the group-level parameters, not merely the mode. The posterior mode is not relevant for much.

Even if there is no covariance structure, the different groups do share common level error residuals, which makes it better than the no pooling case, where intependent models are fit to each group separately.

There are so many options for covariance structures that I also get confused about how to proceed assuming or reasoning the use of one over the other. A better fit does not always mean a sound justification, as it may be just level-2 variables being correlated.

Now I’m confused, I think I have some fundamental misunderstanding going on here.

Let’s say we have a simple non-hierarchical model, y ~ normal(intercept + slope*x, sigma)
Even if your priors are independent (i.e. intercept ~ normal(0, 5) and slope ~ normal(0,5) ), the joint posterior distribution p(intercept, slope / data) can still have correlation within it (inherited from the likelihood function). So in this case the posterior seems to capture the parameter covariance despite the priors not containing any covariance structure.

Maybe i’m misunderstanding something here, but isn’t the multi-level case similar? All parameters can have independent priors, but the joint posterior can contain all the covariance you need? But apparently now we have to add group-level covariance explicitly, and estimate the correlation as an extra parameter. Perhaps I’m conflating two different kinds of covariance here (1: covariance in the posterior distribution of two parameters, 2: covariance across members of a group)? I would appreciate it if anyone could shine some light on this, as my thinking is evidently confused.

Thank you very much!

The posterior dependence reflects both the (in)dependence in the prior on the group-specific parameters and the data. If you use independent priors, then you nudge the posterior dependence toward zero. This may not be noticeable if you have many groups and thus the posterior distribution of the group-specific parameters and the hyperparameters is almost entirely determined by the data. But often in hierarchical models, there is not a lot of information in the data on every group-specific parameter, so the hyperpriors can make a pretty big difference.

1 Like