Hi Stan community,
I have a couple of theoretical question regarding hierarchical multi-level modelling for multiple groups & repeated measures when we do not know whether the different groups are indeed different groups or come from a same continuum.
Let me give an example:
Say we have 4 group study in medicine/psychiatry (Group1: Healthy controls, Group 2: healthy controls @ increased familial risk of addiction, Group 3: currently addicted patients, Group 4: remitted from addiction).
Aim: The goal of such a study would be to fit behavioural data from all participants on a task and compare whether latent variables of interest (say, from our best fitting model) differ between groups.
Question 1:
What do you think is the most appropriate way of modelling these 4 groups hierarchically ?
Option1 : Should I fit my multi-level model with priors over parameters that are shared across all 4 groups (i.e. partial-pooling across all groups)? As in:
model_option1 {
mu_group ~ N(0,1);
sd_group ~ HC(0,5);
param_subject ~ N(mu_group,sd_group);
y[group,subject,trial] ~ normal(param_subject*x[group,subject],…);
}
The problem with such a model is that I’m enforcing different groups to be more similar from each other, while they might not be (i.e. parameter estimates will be too similar/conservative-estimate?)
Option 2: Should I fit each group separately and compare the ‘group-level’ priors instead of ‘subject_level’ parameters ?
The problem with option 2 is that it’s more likely to make groups more dissimilar to each other and potentially inflate differences between groups. Theoretically this could be justified if we assume that different groups are indeed coming from distinct populations as they have a different diagnosis (but not everyone agrees with this — some researchers argue all subjects, regardless of diagnosis, lie within a continuum from ‘healthy’ to ‘pathological’). That is, with option 2 are parameter estimates going to be ‘too liberal’ and finds significant group differences more often than would/should be warranted?
Option 3: Should I add another layer to the hierarchy (as in the radon levels case study) where I enforce group level priors to come from an underlying shared distribution, as in:
model_option3 {
mu_shared_groupwise ~ N(0,1);
sd_shared_groupwise ~ HC(0,5);
mu_group1 ~ N(mu_shared_groupwise,…);
sd_group1 ~ HC(sd_shared_groupwise,…);
mu_group2 ~ N(mu_shared_groupwise,…);
sd_group2 ~ HC(sd_shared_groupwise,…);
mu_group3 ~ N(mu_shared_groupwise,…);
sd_group3 ~ HC(sd_shared_groupwise,…);
mu_group4 ~ N(mu_shared_groupwise,…);
sd_group4 ~ HC(sd_shared_groupwise,…);
param_subject_group[1,:] ~ N(mu_group1,sd_group1);
param_subject_group[2,:] ~ N(mu_group2,sd_group2);
param_subject_group[3,:] ~ N(mu_group3,sd_group3);
param_subject_group[4,:] ~ N(mu_group4,sd_group4);
for i_group = 1 in 4 {
y[i_group,subject] ~ normal(param_subject_group[i_group,i_subj]*x[i_group,subject],…);
}
}
Is option 3 the least biased (i.e. not too conservative, nor too liberal?)? If I use option 3, would you compare ‘subject_level’ parameters (as in: param_subject_group[1,:] vs param_subject_group[2,:]) or ‘group_level’ parameters ?
Thank you so much for your help
Best regards,
Feez