Below is my stan code for a meta regression model, basically I want each Y_{2i} value to be from a normal distribution with known variance \sigma_{2i} and whose mean depends on Y_{1i} in a way such that each mean \mu_{2i} = \lambda_{0i} + \lambda_{1}Y_{1i}. Priors are then placed on each parameter. I wanted to compute WAIC using loo package, but I got not only these warnings for running loo:
1: Relative effective sample sizes (‘r_eff’ argument) not specified.
For models fit with MCMC, the reported PSIS effective sample sizes and
MCSE estimates will be over-optimistic.
2: Some Pareto k diagnostic values are too high. See help(‘pareto-k-diagnostic’) for details.
but also these warnings from running waic function:
(100.0%) p_waic estimates greater than 0.4. We recommend trying loo instead.
What does it mean? WAIC or loo output won’t be useful for this model? But why that is?
data{
int<lower=0> n;
real y1[n];
real y2[n];
real<lower=0> se[n];
}
parameters {
real lambda0[n];
real lambda1;
real beta;
real<lower=0> psi;
}
transformed parameters {
real mu2[n];
for (i in 1:n){
mu2[i] = lambda0[i]+lambda1*y1[i];
}
}
model {
// Priors
beta ~ normal(0,100);
lambda1 ~ normal(0,100);
psi ~ uniform(0,100);
lambda0 ~ normal(beta,psi);
// Data
for (i in 1:n){
y2[i] ~ normal(mu2[i], se[i]);
}
}generated quantities {
vector[n] log_lik;
for (i in 1:n) {
log_lik[i] = normal_lpdf(y2[i]| mu2[i], se[i]);
}
}
Rhat is around 1.516252. ESS_bulk were around 35.65368. I had 20 data sample and there’s no divergent samples. I suppose it indicates some convergence/efficiency problem? How could I improve it?
Clearly the posterior has nasty shape, and you need to solve the inference problems, before trying to solve waic/loo computation. Currently, you have more parameters than observations and super wide priors, and psi constraint is <lower=0> but you are setting a uniform(0,100) prior which is bounded from right at 100, which can cause also problems.
Thanks for the info. May I ask how it’s going to cause problem for setting lower bound for psi(and I think we should since it’s a variance component) and assigning a uniform(0,100) prior could be problematic? Or are you suggesting if we alraedy set a lower bound for psi we won’t need to worry about the bounding for its prior?
As you don’t have upper=100 in the parameter declaration, the unconstrained parameter space has a sharp edge at 100 due to the uniform prior. The lower/upper constraints are used to transform from the constrained space to the unconstrained space. If you really would need to use uniform(0,100) prior, you should use lower=0,upper=100 in the parameter declaration.
On the other hand, the uniform(0,100) prior is likely to be too wide and adding the upper=100 is not probably removing all the sampling problems, and this very wide prior combined with hierarchical model with only one observation per group parameter leads to very thick tailed posterior that is challenginf for HMC.