I am struggling to understand the estimate of p_loo after fitting a hierarchical model and performing leave-one-out cross-validation. The model is built to fit biological data that follow a skew normal distribution. This distribution has four parameters which have been transformed into hyperparameters in order to understand more precisely how each hyperparameter depends on the biological variables of interest. The biological variables are seven and five are dummy, one is continuous and one is the interaction between a dummy variable and the continuous variable.
Each of the four hyperparameters is defined as a linear regression with one intercept and seven slopes (there are seven biological variables), so eight parameters for each hyperparameter. In total, the model contains 32 parameters (four hyperparameters times eight parameters). All of this is translated into the stan file:
data {
int<lower=0> N; //sample size
int<lower=0> K; //number of dummy variables
matrix[N, K] X; //matrix with dummy variables
vector[N] ratio; //continuous independent variable
int<lower=0, upper=1> y[N]; //binary dependent variable
}
parameters {
real<lower=-4, upper=4> alpha_intercept;
real<lower=0, upper=1> b_intercept;
real<lower=-2, upper=2> c_intercept;
real<lower=0.0001, upper=3> d_intercept;
vector[K] alpha_coeff; //slopes hyperparameter alpha
vector[K] b_coeff; //slopes hyperparameter b
vector[K] c_coeff; //slopes hyperparameter c
vector[K] d_coeff; //slopes hyperparameter d
}
transformed parameters {
vector[N] alpha_hyp;
vector[N] b_hyp;
vector[N] c_hyp;
vector[N] d_hyp;
vector[N] y_hat;
alpha_hyp = alpha_intercept + X * alpha_coeff; //linear regression alpha
b_hyp = b_intercept + X * b_coeff; //linear regression b
c_hyp = c_intercept + X * c_coeff; //linear regression c
d_hyp = d_intercept + X * d_coeff; //linear regression d
for (i in 1:N) {
y_hat[i] = 0.01 + b_hyp[i] * exp(-0.5 * ((ratio[i] - c_hyp[i]) / d_hyp[i])^2) * (1 + erf(alpha_hyp[i] * (ratio[i] - c_hyp[i]) / (1.414214 * d_hyp[i])));
}
}
model {
alpha_coeff ~ normal(0, 1); //prior regression coefficients hyperparameter alpha
b_coeff ~ normal(0, 1); //prior regression coefficients hyperparameter b
c_coeff ~ normal(0, 1); //prior regression coefficients hyperparameter c
d_coeff ~ normal(0, 1); //prior regression coefficients hyperparameter d
alpha_hyp ~ gamma(12, 7); //prior hyperparameter alpha
b_hyp ~ beta(2, 2); //prior hyperparameter b
c_hyp ~ normal(-0.1, 0.5); //prior hyperparameter c
d_hyp ~ gamma(7, 8); //prior hyperparameter d
y ~ bernoulli_logit(logit(y_hat));
}
generated quantities {
vector[N] log_lik;
for (n in 1:N) {
log_lik[n] = bernoulli_logit_lpmf(y[n] | logit(0.01 + b_hyp[n] * exp(-0.5 * ((ratio[n] - c_hyp[n]) / d_hyp[n])^2) * (1 + erf(alpha_hyp[n] * (ratio[n] - c_hyp[n]) / (1.414214 * d_hyp[n])))));
}
}
The fitting ends without warnings nor errors as well as the loo cross-validation (I have followed this vignette. And, this is the loo output:
Computed from 24000 by 4330 log-likelihood matrix
. | Estimate | SE |
---|---|---|
elpd_loo | -2224.2 | 27.1 |
p_loo | 4.7 | 0.1 |
looic | 4448.5 | 54.2 |
Monte Carlo SE of elpd_loo is 0.0.
All Pareto k estimates are good (k < 0.5).
My reasoning behind this large discrepancy between the p_loo (4.7) and the real number of parameters (32) is:
- Most of the posterior distributions are centred around zero.
- Only the four intercepts show a posterior distribution whose 95% confidence intervals do not overlap zero.
Nevertheless, there are also five-six parameters that are estimated to be close to zero but without the 95% CI spanning over zero. If my reasoning is correct, I would have expected the p_loo to be around seven or eight. Am I making sense?
- What diagnostics can I run to explore this issue (no divergent transitions and low Pareto k estimates)?
- Does the stan file reflect what is described in the first paragraph?
I have tried to reduced the number of hyperparameters and thus the number of parameters. For example, with only one hyperparameter, the model contains the three parameters that were not transformed into hyperparameters plus one intercept and seven slopes of the hyperparameters. So, the total number of parameters for the model with one hyperparameter is 11 (3 + 8). This returns a very similar p_loo of the one above which would make sense if my earlier reasoning is correct.
Any comments, suggestions or requests for further explanations are very appreciated.