Help with Bayesian Model Comparison for Hierarchical Models

Hi everyone,

I’m struggling to compute meaningful Bayes Factors or model comparison metrics for a hierarchical model, possibly because my background is more frequentist. Here’s my problem:

  • I have repeated measurements per individual and fit a hierarchical model where:
    • Each individual has a parameter (\theta_i)
    • These \theta_i depend on group-level parameters (\nu, \sigma)
  • I estimate the posteriors for \nu, \sigma, and all \theta_i using a Gibbs sampler.
  • A covariate suggests there might be two subpopulations, but I’m unsure.

Goal
I want to determine whether:

  1. Model 1: A single group (\theta_i \sim \mathcal{N}(\nu, \sigma^2) for all individuals.
  2. Model 2: Two separate groups (each with their own \nu_k, \sigma_k, k=1,2).

is better supported by the data.

Problem with LOO

  • Model 1 (single group) estimates a high \sigma, so \theta_i fit individual data closely → high likelihood p(y_i|\theta_i)
  • Model 2 (two groups) estimates low \sigma_k for each group → \theta_i are more shrunk → lower likelihood.
  • Issue: LOO only evaluates p(y_i|\theta_i) and ignores the group-level likelihood p(\theta_i|\nu, \sigma). This favors the flexible Model 1 even if Model 2 is theoretically more appropriate.

Questions

  1. Is there a better way to compare these models? (e.g., Bayes Factors, WAIC, or a different parametrization?)
  2. Should I modify Model 2 to account for this shrinkage effect?
  3. Are there alternatives to LOO for hierarchical models that penalize overfitting more effectively?

Any advice or references would be greatly appreciated!