I am trying to use loo to see if I could use a linear instead of a logit link function for a beta-binomial regression.
However, if I simulate data consistent with the generative model in the Stan code, I can recover the regression weights, but consistently get this warning message when using loo
Some Pareto k diagnostic values are too high. See help(‘pareto-k-diagnostic’) for details.
because all Pareto k diagnostic values are between 1 and Inf.
The code to reproduce this is below.
I would be grateful if somebody could have a look and explain what is going on (or point out an error in my code/thinking).
Simulate data in R:
library(boot)
library(VGAM)
library(rstan)
library(loo)
library(HDInterval)
options(mc.cores = 4)
set.seed(1234)
N = 100
K = 4
phi = 50
X = matrix(rnorm(N*K),ncol = K)
b = matrix(rnorm(K),ncol = 1)/2
p = inv.logit(-.8 + X %*% b)
Y = rbetabinom.ab(N,size = 22, shape1 = p*phi, shape2 = (1-p)*phi)
hist(Y,breaks = (0:23)-.5)
standata = list(N = N,
K = K,
X = X,
Y = Y)
Stan model (logit link):
data {
int<lower=1> N; // number of observations
int Y[N]; // response variable
int<lower=1> K; // number of predictors
matrix[N, K] X; // design matrix
}
parameters {
vector[K] b;
real<lower=0, upper=1> theta;
real alpha;
}
transformed parameters {
real<lower = 0> phi = (1-theta)/theta;
vector[N] p = inv_logit(X * b + alpha);
}
model {
b ~ normal(0,2);
alpha ~ normal(0,2);
target += beta_binomial_lpmf(Y | 22, p * phi, (1-p) * phi);
}
generated quantities {
int y_rep[N];
real log_lik[N];
for (n in 1:N) {
y_rep[n] = beta_binomial_rng(22, p[n] * phi, (1-p[n]) * phi);
log_lik[n] = beta_binomial_lpmf(Y | 22, p[n] * phi, (1-p[n]) * phi);
}
}
Sample, compare true and estimated b, try loo
sm_logit = stan_model("bebi_logit.stan")
sf_logit = sampling(sm_logit,
data = standata,
seed = 123)
## plot b with 90% HDIs of estimated b ##
b_hdis = hdi(extract(sf_logit,"b")$b,
credMass = .90)
plot(0, type = "n",
ylim = c(range(b_hdis)),
xlim = c(1,4),
xaxt = "n",ylab = "", xlab = "")
axis(1,1:4)
segments(x0 = 1:4,
y0 = b_hdis[1,],
y1 = b_hdis[,2])
points(1:4,b, pch = 16, cex = 2, col = "green4")
loo(extract_log_lik(sf_logit))