I ran a simulation study comparing two models and the fit comes out to be equivalent in terms of LOO. However, when i run both models through real data, it doesnt line up as cleanely and i have some high pareto K values.
I think this is because there are some outliers in the data. But these datapoints are not mistakes so I should not be removing them. My assumption here is that with this kind of LOO result, i should not be comparing models with it. I then sought to do K fold as i saw that is an option that can handle some more extreme data points but I am struggling setting this up for my data.
In short my model is a count model with theta[j] + beta[i] + gamma[k], where theta is a person parameter, beta are individual item effects, and gamma are item type effects. So for instance, the real data has 1140 people with 16 items, with 3-5 items per item type (total of 3 different types).
my cmdstan code is as follows for the more complex model:
mod_ecmp <- write_stan_file('
functions{
real approximation(real log_lambda, real nu){
real log_mu = log_lambda / nu;
real nu_mu = nu * exp(log_mu);
real nu2 = nu^2;
// first 4 terms of the residual series
real log_sum_resid = log1p(
nu_mu^(-1) * (nu2 - 1) / 24 +
nu_mu^(-2) * (nu2 - 1) / 1152 * (nu2 + 23) +
nu_mu^(-3) * (nu2 - 1) / 414720 * (5 * nu2^2 - 298 * nu2 + 11237)+
nu_mu^(-4) * (nu2 - 1) / 39813120 * (5 * nu2^3 - 1887*nu2^2 - 241041*nu^2 + 2482411)
);
return nu_mu + log_sum_resid -
((log(2 * pi()) + log_mu) * (nu - 1) / 2 + log(nu) / 2);
}
real summation(real log_lambda, real nu){
real z = negative_infinity();
real z_last = 0;
real t = 0;
for (j in 0:1000000) {
t = t + 1;
z_last = z;
z = log_sum_exp(z, j * log_lambda - nu * lgamma(j+1));
if ((abs(z - z_last) < 1e-05)) {
break;
}
if( t > 999998){
reject("max terms hit, prob bad value");
}
}
return z;
}
real partial_sum(array[] int slice_n, int start, int end,
array[] int y_slice,
array[] int jj_slice, array[] int ii_slice, vector theta,
vector beta, vector nu, vector alpha, array[] int kk_slice, vector gamma) {
real partial_target = 0.0;
for (n in start : end) {
real log_lambda = (alpha[ii_slice[n]] * theta[jj_slice[n]] + beta[ii_slice[n]] + gamma[kk_slice[n]]) * nu[ii_slice[n]];
real log_prob = 0;
if((log_lambda > 8.25 && nu[ii_slice[n]] > 4 && log_lambda < 40) ||
(log_lambda > 1 && log_lambda < 11 && nu[ii_slice[n]] > 1.5 && nu[ii_slice[n]] < 3.1))
{
log_prob = y_slice[n] * log_lambda -
nu[ii_slice[n]] * lgamma(y_slice[n] + 1) -
approximation(log_lambda, nu[ii_slice[n]]);
} else {
log_prob = y_slice[n] * log_lambda -
nu[ii_slice[n]] * lgamma(y_slice[n] + 1) -
summation(log_lambda, nu[ii_slice[n]]);
}
partial_target += log_prob;
}
return partial_target;
}
}
data {
int<lower=0> I;
int<lower=0> J;
int<lower=1> N;
int<lower=1> K;
array[N] int<lower=1, upper=I> ii;
array[N] int<lower=1, upper=J> jj;
array[N] int<lower=1, upper=K> kk;
array[I] int<lower=1,upper=K> item_type_for_beta;
array[N] int<lower=0> y;
array[N] int seq_N;
int<lower=0> grainsize;
}
parameters {
vector[J] theta;
vector[K] gamma;
vector[I] beta;
vector<lower=0>[I] alpha;
real<lower=0, upper = 3> sigma_alpha;
vector<lower=0.0001, upper = 4>[I] nu; //.2
real<lower=0, upper = 5> sigma_nu;
vector<lower=0, upper = 3>[K] sigma_beta_k;
real<lower=0, upper = 5> sigma_gamma;
real<lower=0, upper = 5> mu_gamma;
}
model {
sigma_nu ~ cauchy(0,5);
nu ~ lognormal(0, sigma_nu);
theta ~ normal(0, .3);
alpha ~ lognormal(0,sigma_alpha);
sigma_alpha ~ cauchy(0,5);
gamma ~ normal(mu_gamma,sigma_gamma);
mu_gamma ~ normal(0,5);
sigma_gamma ~ cauchy(0,5);
sigma_beta_k ~ cauchy(0,5);
for (i in 1:I) {
beta[i] ~ normal(gamma[item_type_for_beta[i]], sigma_beta_k[item_type_for_beta[i]]);
}
target += reduce_sum(partial_sum, seq_N, grainsize, y, jj, ii, theta,
beta, nu, alpha, kk, gamma);
}
generated quantities {
array[N] real log_lik;
for (n in 1:N) {
real log_lambda = nu[ii[n]] * (alpha[ii[n]] * theta[jj[n]] + beta[ii[n]] + gamma[kk[n]]);
if((log_lambda > 8.25 && nu[ii[n]] > 4 && log_lambda < 40) ||
(log_lambda > 1 && log_lambda < 11 && nu[ii[n]] > 1.5 && nu[ii[n]] < 3.1))
{
log_lik[n] = y[n] * log_lambda - nu[ii[n]] * lgamma(y[n] + 1) - approximation(log_lambda, nu[ii[n]]);
} else {
log_lik[n] = y[n] * log_lambda - nu[ii[n]] * lgamma(y[n] + 1) - summation(log_lambda, nu[ii[n]]);
}
}
}
')
where
How can i do K fold analysis with this data? Is it as simple as just taking observations and putting them into folds? or does the structure of my data change how I should go about this? I also want to explore Mixture IS leave-one-out cross-validation for high-dimensional Bayesian models • loo this type of mixture LOO, however, i am not sure i am able to do this while using reduce_sum, which may cause issues with runtime ( I could be wrong about the implementation here though).
Thanks!