One question about hierarchical model in cognitive science

Dear experts in the Bayesian model,

Hierarchical model fitting is becoming more and more popular in the cognitive modeling field. Psychological/Behavioural neuroscience suffers from the limitation of trial number, hierarchical model can improve the accuracy of model fitting through the shrinkage effect. However, I have a problem when studying individual differences. I think it’s not wise to use these individual parameters to do some regression or correlation analysis(eg using individual parameter value to predict some questionnaire data), right? Because individual parameters are surlily influenced by the group-level parameters, which disobey the Independent and identically distributed assumption. However, I read lots of papers just using individual parameter correlate the questionnaire or some other kinds of data. For example, Liu, L., et al. (2021). “Long-Term Stress and Trait Anxiety Affect Brain Network Balance in Dynamic Cognitive Computations.” Cereb Cortex.
Did they do the wrong analysis or it’s my own misunderstanding? Thank you very much.

It seems that you’re comfortable with the idea that the random-effects specification can improve parameter estimates (assuming that the model is not badly misspecified). I don’t precisely understand what your concern is about why these improved estimates (assuming the random effects model is not misspecified) might not be appropriate for downstream use. You seem concerned that the group-level parameters are not independent and identically distributed (IID), but this confuses me for two reasons:

  • You discuss a scenario in which you are using the parameter values as predictors in a further regression. But we do not, in general, care whether our predictors are independent or identically distributed. In most simple cases we don’t treat our predictors as random variables at all. In your example, we might treat the predictors as random since we know them only with some posterior uncertainty. If your question is about how to compose inference in this way, and propagate uncertainty in the first model through the second model, then that’s a worthwhile question but it has nothing to do with the fact that the first model includes a random effect term; you would see the same problems of non-independence if the first model treated the effects as fixed.
  • In the context of the random effects assumption, we are assuming that the effects are IID (this assumption is closely related to the notion of exchangeability in random effects models). So the random effects model builds in stronger assumptions about IID effects than the equivalent fixed-effects model.

Thank you for your response. The question I asked is not whether the fixed/pooling model is better than the mixed model for studying individual differences, instead, it’s whether random effect model more appropriate for individual differences study than mixed model. Like is it okay to use random effect coefficients in mixed model to predict or be predicted by some other kinds of data? My concerns origins from this paper “Thomas, A. W., et al. (2019). “Gaze bias differences capture individual choice behaviour.” Nature Human Behaviour 3(6): 625-635.”
In this paper, they use pymc3 to fit their decision model called GLAM. And instead of using hierarchical model fitting, they fit the model separately for each participant. In the discussion part of this paper, they have the following acclaim:

The distinction that you are drawing between a random-effect model and a mixed model is unfamiliar to me. In the terminology that is familiar to me, mixed models are the subset of random-effect models that also include at least some fixed effects in addition to the intercept.

It is completely fine and standard to include covariates to predict the random effect mean. It is fine but nonstandard to include covariates to modify the random effect variance. Conditional on the random effect mean and variance, it is literally impossible to allow the random coefficients to be predicted by other kinds of data; random effects modeling requires an assumption of exchangeability. On the other hand, the random coefficients are always used to predict the observed data, so I’m not sure where the concern that it’s not ok to use them to predict data comes from.

For what it’s worth, I don’t detect in the paper of Thomas et al. any expressions of concern about whether it is possible or appropriate make inference about the random-effect coefficients in a hierarchical model.

Okay, thank you for your response. Sorry I had the wrong concept. The random effect model I mentioned above is the model that only has random effect and no fixed effect. Maybe I made the wrong example. The analysis way in the Cereb Cortex paper is like, first fit a mixed effect model, extract the MAP estimation of each random effect parameter and build the second model, in this second model, they use these random effect parameters to predict or be predicted by other kinds of data not combine the two model within a single model.
The picture I put located in the discussion part of Thomas paper, acclaiming that hierarchical fitting bring a better group-level estimation which is the fixed effect parameter, and not better capturing individual’s choice behavior. So instead of using a hierarchical fitting method to fit their GLAM model, they chose to fit the GLAM separately for each participant and then use parameters in GLAM to correlate with other kinds of data.

rather late on this, but @mingqian.guo you are correct. The issue is that the shrinkage being incorporated into the models means that the random effect parameters are no longer independent and cannot be treated as such for further analyses. We found a pretty big difference in using this ‘two stage’ approach versus estimating effects along with model estimation - see paper here. You won’t always see much of an effect - it depends on how much shrinkage is occurring - but people tend to ignore this entirely which can be an issue.
For the second paper, one argument is that variation in parameters often doesn’t effect secondary measures (like prediction error) much - see this paper for a discussion related to fMRI measures. But for authors to claim that this approach is valid, they should show their results are not sensitive to variation in parameters.

1 Like

Hi, Vanessa, thank you for your reply. I am thrilled to see you respond to me. Actually, I have read several of your works, they are all amazing. I also download the first paper you suggested, but I never finish reading it. I recently watched the quentin huys tutorial video. He uses an EM hierarchical model fitting and had a similar result, ‘two stage’ way is much worse than integrated model.

Chiming in late here. I recommend these two posts from Nate Haines. I don’t think he explicitly speaks to the workflow you are thinking, where you are looking to use the outputs of a hierarchical model as inputs to a correlation model, but does show the case of doing both in a single model, which is what I strongly recommend.

1 Like

glad to help! and good eye - I didn’t realize that was covered in that video. Nathaniel Daw also has a quick reference to it in his 2011 paper, but people have been slow to pick up on it.