LOO hierarchical models: very low p_loo without warnings nor errors

I am struggling to understand the estimate of p_loo after fitting a hierarchical model and performing leave-one-out cross-validation. The model is built to fit biological data that follow a skew normal distribution. This distribution has four parameters which have been transformed into hyperparameters in order to understand more precisely how each hyperparameter depends on the biological variables of interest. The biological variables are seven and five are dummy, one is continuous and one is the interaction between a dummy variable and the continuous variable.

Each of the four hyperparameters is defined as a linear regression with one intercept and seven slopes (there are seven biological variables), so eight parameters for each hyperparameter. In total, the model contains 32 parameters (four hyperparameters times eight parameters). All of this is translated into the stan file:

data {
  int<lower=0> N;                //sample size
  int<lower=0> K;                //number of dummy variables
  matrix[N, K] X;                //matrix with dummy variables 
  vector[N] ratio;               //continuous independent variable
  int<lower=0, upper=1> y[N];    //binary dependent variable

parameters {
  real<lower=-4, upper=4> alpha_intercept;
  real<lower=0, upper=1> b_intercept;
  real<lower=-2, upper=2> c_intercept;
  real<lower=0.0001, upper=3> d_intercept;
  vector[K] alpha_coeff;       //slopes hyperparameter alpha
  vector[K] b_coeff;           //slopes hyperparameter b
  vector[K] c_coeff;           //slopes hyperparameter c
  vector[K] d_coeff;           //slopes hyperparameter d

transformed parameters {
  vector[N] alpha_hyp;
  vector[N] b_hyp;
  vector[N] c_hyp;
  vector[N] d_hyp;
  vector[N] y_hat;
  alpha_hyp = alpha_intercept + X * alpha_coeff;    //linear regression alpha
  b_hyp = b_intercept + X * b_coeff;                //linear regression b
  c_hyp = c_intercept + X * c_coeff;                //linear regression c
  d_hyp = d_intercept + X * d_coeff;                //linear regression d
  for (i in 1:N) {
    y_hat[i] = 0.01 + b_hyp[i] * exp(-0.5 * ((ratio[i] - c_hyp[i]) / d_hyp[i])^2) * (1 + erf(alpha_hyp[i] * (ratio[i] - c_hyp[i]) / (1.414214 * d_hyp[i])));

model {
  alpha_coeff ~ normal(0, 1);        //prior regression coefficients hyperparameter alpha
  b_coeff ~ normal(0, 1);            //prior regression coefficients hyperparameter b
  c_coeff ~ normal(0, 1);            //prior regression coefficients hyperparameter c
  d_coeff ~ normal(0, 1);            //prior regression coefficients hyperparameter d
  alpha_hyp ~ gamma(12, 7);          //prior hyperparameter alpha
  b_hyp ~ beta(2, 2);                //prior hyperparameter b
  c_hyp ~ normal(-0.1, 0.5);         //prior hyperparameter c
  d_hyp ~ gamma(7, 8);               //prior hyperparameter d
  y ~ bernoulli_logit(logit(y_hat));

generated quantities {
  vector[N] log_lik;
  for (n in 1:N) {
    log_lik[n] = bernoulli_logit_lpmf(y[n] | logit(0.01 + b_hyp[n] * exp(-0.5 * ((ratio[n] - c_hyp[n]) / d_hyp[n])^2) * (1 + erf(alpha_hyp[n] * (ratio[n] - c_hyp[n]) / (1.414214 * d_hyp[n])))));

The fitting ends without warnings nor errors as well as the loo cross-validation (I have followed this vignette. And, this is the loo output:

Computed from 24000 by 4330 log-likelihood matrix

. Estimate SE
elpd_loo -2224.2 27.1
p_loo 4.7 0.1
looic 4448.5 54.2

Monte Carlo SE of elpd_loo is 0.0.
All Pareto k estimates are good (k < 0.5).

My reasoning behind this large discrepancy between the p_loo (4.7) and the real number of parameters (32) is:

  1. Most of the posterior distributions are centred around zero.
  2. Only the four intercepts show a posterior distribution whose 95% confidence intervals do not overlap zero.

Nevertheless, there are also five-six parameters that are estimated to be close to zero but without the 95% CI spanning over zero. If my reasoning is correct, I would have expected the p_loo to be around seven or eight. Am I making sense?

  • What diagnostics can I run to explore this issue (no divergent transitions and low Pareto k estimates)?
  • Does the stan file reflect what is described in the first paragraph?

I have tried to reduced the number of hyperparameters and thus the number of parameters. For example, with only one hyperparameter, the model contains the three parameters that were not transformed into hyperparameters plus one intercept and seven slopes of the hyperparameters. So, the total number of parameters for the model with one hyperparameter is 11 (3 + 8). This returns a very similar p_loo of the one above which would make sense if my earlier reasoning is correct.

Any comments, suggestions or requests for further explanations are very appreciated.

I think everything is fine.

It’s more relevant how the posterior is different from prior and how strong posterior dependencies are.

Compare prior to posterior and examine posterior dependencies.

Looking at your model, you have very strong constraints on intercepts (lower and upper should be used only for logical or physical constraints like variances being positive and probabilities between 0 and 1, and otherwise smooth priors should be used)

What is the equation for y_hat? Couldn’t figure out, but seems like something with strong dependency between parameters.

In generated quantities you could re-use y_hat as it is transformed parameter.

Why do you have logit twice in y ~ bernoulli_logit(logit(y_hat)); ?