I’m struggling to compute meaningful Bayes Factors or model comparison metrics for a hierarchical model, possibly because my background is more frequentist. Here’s my problem:
I have repeated measurements per individual and fit a hierarchical model where:
Each individual has a parameter (\theta_i)
These \theta_i depend on group-level parameters (\nu, \sigma)
I estimate the posteriors for \nu, \sigma, and all \theta_i using a Gibbs sampler.
A covariate suggests there might be two subpopulations, but I’m unsure.
Goal
I want to determine whether:
Model 1: A single group (\theta_i \sim \mathcal{N}(\nu, \sigma^2) for all individuals.
Model 2: Two separate groups (each with their own \nu_k, \sigma_k, k=1,2).
is better supported by the data.
Problem with LOO
Model 1 (single group) estimates a high \sigma, so \theta_i fit individual data closely → high likelihood p(y_i|\theta_i)
Model 2 (two groups) estimates low \sigma_k for each group → \theta_i are more shrunk → lower likelihood.
Issue: LOO only evaluates p(y_i|\theta_i) and ignores the group-level likelihood p(\theta_i|\nu, \sigma). This favors the flexible Model 1 even if Model 2 is theoretically more appropriate.
Questions
Is there a better way to compare these models? (e.g., Bayes Factors, WAIC, or a different parametrization?)
Should I modify Model 2 to account for this shrinkage effect?
Are there alternatives to LOO for hierarchical models that penalize overfitting more effectively?
Any advice or references would be greatly appreciated!
it’s not easy to answer as the model is not completely clear, but then you are also using Gibbs, which indicates you are not using Stan, so there is no help in asking for you to show your model code. So I’m answering only the questions I’m confident answering
If you use Bayesian LOO it evaluates p(y_i | y_{-i}) which is obtained by integrating over the posterior and thus it takes into account both individual level and group-level.
You also use posterior or cross-validation predictive checking, to check which model can better model the data. See, e.g. Visualization in Bayesian workflow
I think you gave me the reason why indeed the PSIS-LOO was not working since I am more interested in the global effect than predicting at the individual level.
I believe what confused you is that in reality my groups are my individuals, so I do have a lot of groups and I should be using LOGO, so thanks for this idea, however I still have some problems because it is takes a lot of time to compute.
In a way what I have is group and supra-group, so if anyone has another criteria to compare different configurations of the groups in a hierarchical model it will be really useful !