# Loo: High Pareto k diagnostic values for beta binomial regression

I am trying to use loo to see if I could use a linear instead of a logit link function for a beta-binomial regression.

However, if I simulate data consistent with the generative model in the Stan code, I can recover the regression weights, but consistently get this warning message when using loo

Some Pareto k diagnostic values are too high. See help(‘pareto-k-diagnostic’) for details.

because all Pareto k diagnostic values are between 1 and Inf.

The code to reproduce this is below.

I would be grateful if somebody could have a look and explain what is going on (or point out an error in my code/thinking).

Simulate data in R:

``````library(boot)
library(VGAM)
library(rstan)
library(loo)
library(HDInterval)
options(mc.cores = 4)

set.seed(1234)

N = 100
K = 4
phi = 50
X = matrix(rnorm(N*K),ncol = K)
b = matrix(rnorm(K),ncol = 1)/2
p = inv.logit(-.8 + X %*% b)
Y = rbetabinom.ab(N,size = 22, shape1 = p*phi, shape2 = (1-p)*phi)
hist(Y,breaks = (0:23)-.5)
standata = list(N = N,
K = K,
X = X,
Y = Y)
``````

``````data {
int<lower=1> N;  // number of observations
int Y[N];        // response variable
int<lower=1> K;  // number of predictors
matrix[N, K] X;  // design matrix
}

parameters {
vector[K] b;
real<lower=0, upper=1> theta;
real alpha;
}

transformed parameters {
real<lower = 0> phi = (1-theta)/theta;
vector[N] p = inv_logit(X * b + alpha);
}

model {
b ~ normal(0,2);
alpha ~ normal(0,2);
target += beta_binomial_lpmf(Y | 22, p * phi, (1-p) * phi);
}

generated quantities {
int y_rep[N];
real log_lik[N];
for (n in 1:N) {
y_rep[n] = beta_binomial_rng(22, p[n] * phi, (1-p[n]) * phi);
log_lik[n] = beta_binomial_lpmf(Y | 22, p[n] * phi, (1-p[n]) * phi);
}
}
``````

Sample, compare true and estimated b, try loo

``````sm_logit = stan_model("bebi_logit.stan")
sf_logit = sampling(sm_logit,
data = standata,
seed = 123)

## plot b with 90% HDIs of estimated b ##
b_hdis = hdi(extract(sf_logit,"b")\$b,
credMass = .90)
plot(0, type = "n",
ylim = c(range(b_hdis)),
xlim = c(1,4),
xaxt = "n",ylab = "", xlab = "")
axis(1,1:4)
segments(x0 = 1:4,
y0 = b_hdis[1,],
y1 = b_hdis[,2])
points(1:4,b, pch = 16, cex = 2, col = "green4")

loo(extract_log_lik(sf_logit))
``````

should be

``````log_lik[n] = beta_binomial_lpmf(Y[n] | 22, p[n] * phi, (1-p[n]) * phi);
``````

The way I figured where the problem is that I checked the log_lik values. For discrete model with uniform probability for 22 categories lpmf would be \log(1/22)\approx -3.1, but lpmf’s from your code were around -461, so the were clear two orders of magnitude error in log scale! With the fixed code mean lpmf is about -2.3>-3.1, ie the model is beating the uniform distribution. loo works fine.

1 Like

Thanks!