I wrote a stan program to model count data with a negative-binomial distribution, with the log mean parameter depending on a group association ‘xHOUSE’ (with hierarchy, 12 levels) and an explanatory variable ‘xTREAT’ (a treatement, 4 levels), and the dispersion parameter depending on the explanatory variable ‘xTREAT’. The data points (n=675) can be seen as individuals examined at different time points with different treatments, while I am not interested in correlation by time at the moment. There are two different count measures (different nature of data, ‘outdoor’ and ‘indoor’) per data point, so the model is essentially doubled with independent parameters by adding the log-likelihoods. The model code is below (tested). There are generated quantities which I excluded here. I wrote my own model to be more flexible, and since I am interested in a measure whose distribution is unknown (at least to me) and which I estimate by a function of forward simulated data points. There are 44 parameters in total.
I want to analyse this model with the loo-package and compare to an alternative model including an additional hierarchical layer (data points grouped by weeks). However, the Pareto-k value is so high (above 1 for all data points,see plot below) that I suspect that there is a conceptual problem in the way I wrote the model. Thank you very much for any help.
There were no sampling problems and my posterior predictive checks (forward simulation by ‘generated quantities’, for each group separately and for the general population) looked OK to me. I read or at least skimmed the following resources https://avehtari.github.io/modelselection/roaches.html, https://avehtari.github.io/modelselection/CV-FAQ.html, https://avehtari.github.io/modelselection/rats_kcv.html, Vehtari, A., Gelman, A., and Gabry, J. (2017a). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5), but I have little theoretical background in the corresponding methods. I can of course provide more background and explanation.
data{
int<lower=0> n; // number of 'house-days' (experimental days * houses)
int<lower=0> yHLC[n]; // data HLC
int<lower=0> yCDC[n]; // data CDC
int xHOUSE[n]; // feature HOUSE
int xTREAT[n]; // feature TREATment
int<lower=0> priorscale; // to control scale of prior
}
parameters{
real logaIN; // mean of lograte indoors
real logaOUT; // mean of lograte outdoors
real<lower=0> sigma_alphaIN; // standard deviation of lograte indoors
real<lower=0> sigma_alphaOUT; // standard deviation of lograte outdoors
vector[12] thetaIN; // standardised deviation of indoor lograte from mean of lograte indoors
vector[12] thetaOUT; // standardised deviation of outdoor lograte from mean of lograte outdoors
vector[3] logbetaIN; // treatment effect indoor
vector[3] logbetaOUT; // treatment effect outdoor
vector[4] logpsiIN; // one scale parameter per treatment group and per indoor, so far no hierarchy
vector[4] logpsiOUT; // one scale parameter per treatment group and per outdoor, so far no hierarchy
}
transformed parameters{
vector[n] loglambdaIN; // lograte indoor
vector[n] loglambdaOUT; // lograte indoor
vector[n] phiIN; // scale parameter indoor
vector[n] phiOUT; // scale parameter outdoor
{
vector[4] logbetaINall = append_row(0, logbetaIN);
vector[4] logbetaOUTall = append_row(0, logbetaOUT);
loglambdaIN = logaIN + thetaIN[xHOUSE] * sigma_alphaIN + logbetaINall[xTREAT];
loglambdaOUT = logaOUT + thetaOUT[xHOUSE] * sigma_alphaOUT + logbetaOUTall[xTREAT];
phiIN = exp(logpsiIN[xTREAT]);
phiOUT = exp(logpsiOUT[xTREAT]);
}
}
model{
//priors and hyperpriors
target += normal_lpdf(logaIN | 0, priorscale);
target += normal_lpdf(logaOUT | 0, priorscale);
target += 2*cauchy_lpdf(sigma_alphaIN | 0,priorscale);
target += 2*cauchy_lpdf(sigma_alphaOUT | 0,priorscale);
target += normal_lpdf(logbetaIN| 0, priorscale);
target += normal_lpdf(logbetaOUT| 0, priorscale);
target += normal_lpdf(logpsiIN| 0, priorscale);
target += normal_lpdf(logpsiOUT| 0, priorscale);
// hierarchical parameters
target += normal_lpdf(thetaIN| 0, 1);
target += normal_lpdf(thetaOUT| 0, 1);
// model
target += neg_binomial_2_log_lpmf(yCDC | loglambdaIN, phiIN);
target += neg_binomial_2_log_lpmf(yHLC | loglambdaOUT, phiOUT);
}
generated quantities{
vector[n] log_lik;
for (j in 1:n) {
log_lik[j] = neg_binomial_2_log_lpmf(yCDC[j] | loglambdaIN, phiIN) + neg_binomial_2_log_lpmf(yHLC[j] | loglambdaOUT, phiOUT);
}
}