Mixis and E_loo

hello

E_loo uses the weights generated by loo to estimate the leave-one-out mean (and quantiles). However, “A small Pareto k estimate is necessary, but not sufficient, for E_loo() to give reliable estimates.” Mixis is more robust for estimating elpd. Can the log-weights generated by mixis be used to improve the estimate of the leave-one-out mean?

Below is a simple gaussian example with synthetic data including one clear outlier. Putting eloov as the vector of mean values obtained from E_loo, I compare the true (brute force) expected value at the outlier, eloov at the outlier, and a proposed way to use mixis to give an another estimate, which is better at the outlier in this example. Is this a valid method for this purpose? Is there a better way? Does it perform better than eloov in general?

thanks

Greg

R

library(brms)
library(rstan)
rstan_options(auto_write = TRUE)
library(brms)
library(cmdstanr)
options(mc.cores=2, brms.normalize=TRUE,brms.backend="cmdstanr")
library(ggplot2)
library(loo)
library("matrixStats")

set.seed(1)
N=400
x<-runif(N)
f<-function(t) $1.8+sin(12*t)*exp(-3*t)$
y<-$f(x)+0.3*rnorm(N)$
y[1]=-1


dat<-data.frame(x,y)
bf1<-bf(y~s(x))
mod1<-brm(bf1,data=dat,family="gaussian",adapt_delta=0.9)

LL<-log_lik(mod1)
eLL<-exp(LL)

cid<-c(rep(1,1000),rep(2,1000),rep(3,1000),rep(4,1000))
reff<-relative_eff(eLL,chain_id=cid)

loo1<-loo(LL,r_eff=reff,save_psis=TRUE)
loo1
loopt1<-loo1$pointwise
head(loopt1)

pospred<-posterior_epred(mod1)

psis_object<-loo1$psis_object
log_ratios<- -1*LL
eloo<-E_loo(pospred,psis_object,type="mean",log_ratios=log_ratios)
eloov<-eloo$value

#brute force estimate

datx1<-dat[-1,]
modx1<-brm(bf1,data=datx1,family="gaussian",adapt_delta=0.9)
newd<-data.frame(x=x[1])

ppd<-posterior_predict(modx1,newdata=newd)
mean(ppd)
#1.787043
eloov[1]
#1.795551

#mixis

cat(mod1$model,file="mod1.stan")
#save as mod1mix.stan
#amend the model block with mixis code lines
#amend the generated quantities block to generate mu and log_lik
#save as mod1mix.stan

the two amended blocks now read

model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] log_lik = rep_vector(0.0, N);
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept + Xs * bs + Zs_1_1 * s_1_1;
    for (n in 1:N) {
      log_lik[n] = normal_lpdf(Y[n] | mu[n], sigma);
    }
    target += sum(log_lik);
    target += log_sum_exp(-log_lik);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(zs_1_1);
}
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept;
    vector[N] log_lik = rep_vector(0.0, N);
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept + Xs * bs + Zs_1_1 * s_1_1;
    for (n in 1:N) {
      log_lik[n] = normal_lpdf(Y[n] | mu[n], sigma);
    }

#make Cmdstanr model and run it

mod1.mix<-cmdstan_model("mod1mix.stan")
class(mod1.mix)
mod1.mix$print()

stand<-standata(mod1)

mix1<-mod1.mix$sample(data=stand,chains=4,parallel_chains=2,refresh=100,
iter_warmup=1000,iter_sampling=1000,save_warmup=TRUE,max_treedepth=13,adapt_delta=0.9)
mix1$save_object(file="mix1.RDS")
#mix1<-readRDS("mix1.RDS")

#elpd from mixis

draws<-mix1$draws()
drawmat<-as_draws_matrix(draws)
dim(drawmat)

grep("log_lik",colnames(drawmat))[c(1,N)]
dwLL<-drawmat[,24:423]
dim(dwLL)
colnames(dwLL)[c(1,N)]

l_common_mix <- rowLogSumExps(-dwLL)
log_weights <- -dwLL - l_common_mix
elpd_mixis <- logSumExp(-l_common_mix) - rowLogSumExps(t(log_weights))
save(elpd_mixis,file="elpd_mixis.rds")

#compare elpd_loo
elpd_loo<-loopt1[,1]
plot(elpd_mixis,elpd_loo)

#proposed method for E_mix

lw<-log_weights
dim(lw)
#make normalised weights
lse<-apply(lw,2,logSumExp)
lseo<-outer(rep(1,4000),lse)
lwn<-lw-lseo
w<-exp(lwn)
dim(w)
range(w)
apply(w,2,sum)[1:10]
dwmu<-drawmat[,424:823]
dim(dwmu)
colnames(dwmu)[c(1,N)]

emix<-apply(dwmu*w,2,sum)
plot(emix,eloov)

mean(ppd)
#1.787043
eloov[1]
#1.795551
emix[1]
#1.783195

Thanks for bringing this up as it turns out we have forgot to update the documentation. The text continues with

Additional diagnostic checks for gauging the reliability of the estimates are in development and will be added in a future release.

and these additional diagnostics have been added!

The current version computes h-specific Pareto k (see Pareto Smoothed Importance Sampling paper), and when running E_loo() if you don’t get warnings, then that h-specific Pareto k is also small enough.

Yes.

Using Mixture-IS is valid and can be expected to perform better, but it can also fail. Unfortunately, I don’t have time to check your code in detail right now.

Moment matching might work, too.

I made a PR

thanks very much for replying.

The key bits of the code are at the end. The idea is to take the log_weights generated during mixis, normalise and exponentiate them, and apply the normalised weights to the mu values from the generated quantities block included when running the mixis. It improved the estimate in this example - hopefully someone has tested it out more generally - or could do? This is analogous to how E-loo generates the mean by applying loo weights, but the mixis weights are from a different context, they are the probability that a value sampled from the mixture actually arose from leaving out the point in question.

thanks

Greg