Loo_moment_match failure beta_binomial example

gregd · June 15, 2024, 7:38am

Short summary of the problem
example in which loo_moment_match appears to run but does not alter the pareto values.

#E with nested factors, spline and outliers
#a simpler example - not shown - without the nested factors, does work

library(extraDistr)
library(brms)
library(cmdstanr)
options(mc.cores=2, brms.normalize=TRUE,brms.backend="cmdstanr")
library(loo)

set.seed(4)
N=1000
x<-runif(N)
fac1<-sample(LETTERS[1:20],N,replace=TRUE)
fac1<-factor(fac1)
ifac1<-as.integer(fac1)
g<-sample(c("South","West","North","East"),20,replace=TRUE)
g<-factor(g)
fac2<-g[as.integer(fac1)]
ifac2<-as.integer(fac2)

rn1<-0.5*rnorm(20)
rn2<-rnorm(4)
lp<--0.5+sin(12*x)*exp(-3*x)+rn1[ifac1]+rn2[ifac2]
plot(x,lp,col=ifac1,pch=ifac2)
h<-function(t) (1+exp(-t))^-1
mu<-h(lp)
phi=50
T=1000
alpha<-mu*phi
beta<-(1-mu)*phi
y<-rbbinom(N,T,alpha=alpha,beta=beta)
plot(x,y/T)
#now alter some y values to create outliers
oyt<-order(y/T)
wh<-oyt[998:1000]
x[wh]
y[wh]
y[wh]<-30
fac1[wh]
fac2[wh]
plot(x,y/T)
points(x[wh],y[wh]/T,pch=16,col=2)

bfE<-bf(y|trials(T)~-1+(1|fac2/fac1)+s(x),family="beta_binomial")
datE<-data.frame(x,y,T,fac1,fac2)
mE<-brm(bfE,data=datE,save_pars=save_pars(all=TRUE),file="bbtest_E")
summary(mE)

fitE<-fitted(mE)[,1]
plot(fitE/T,y/T,col=rgb(0,0,1,0.3))
points((fitE/T)[wh],(y/T)[wh],pch=16,col=2)
lines(c(0,1),c(0,1),col="darkgreen",lty=2)

looE<-loo(mE)
#three outliers
plot(looE)
looE$diag$par[wh]

loommE<-loo_moment_match(mE,looE,cores=getOption("mc.cores",2))
loommE
plot(loommE)
loommE$diag$par[wh]
loommE$pointwise[wh,]
identical(loommE$diag$par,loommE$pointwise[,5])
#runs, but does not alter the pareto values

Operating System: linux 5.15.0-91-generic Linux Mint 21.2 Victoria
brms Version: 2.21.0
loo Version: 2.7.0.9000

avehtari · June 17, 2024, 5:27pm

This can happen. Moment matching is not guaranteed to improve. You could try Mixture IS leave-one-out cross-validation for high-dimensional Bayesian models • loo, but at the moment it requires modifying the Stan model code (not yet directly supported by brms)

gregd · June 17, 2024, 6:23pm

thanks. I may try this but not immediately!

I got the impression from another post Add_criterion with moment_match=TRUE failing even when save_pars(all=TRUE) was set during model fit - #13 by mtenan that some problems with moment_match were due to failure to catch exceptions. So I thought it might get resolved at some point.

thanks

Greg

gregd · June 26, 2024, 6:42am

hi

I’ve looked at this now using mixis, as suggested.

The vignette Mixture IS leave-one-out cross-validation for high-dimensional Bayesian models • loo is helpful, but I couldn’t run it as data(voice) gave an error. But hopefully the code below is the right idea in this example.

For both mE1 and mE2 I used reloo to obtain the true values of elpd at the exceptional points, and then plotted against the pointwise loo-psis and elpd_mixis for those points. I also compared the pointwise loo-psis and elpd_mixis across the rest of the data, and globally. I compared the results from loo_compare(looE1,looE2) with the values obtained using elpd_mixis for the two models.

Although mixis requires an additional fit of the modified code, the theory (glug) seems to put mixis on a firmer footing, with guaranteed asymptotic finite variance. psis is faster but the theory breaks down when pareto-k > 0.7 and particularly when pareto-k > 1. Despite that, loo-psis and mixis give similar answers in this example.

Lastly (not shown below), running mixis in cmdstanr rather than rstan gave unexpected output - maybe my error but has anyone else had problems?

thanks

Greg

#obtain mE1,looE1,mE2,looE2 as previously

wh<-which(looE1$diag$par>0.7)
#450 634 677

relooE1<-reloo(mE1,looE1,k_threshold=0.7)
save(relooE1,file="relooE1.rds")
#load("relooE1.rds",verbose=TRUE)

library(rstan)
library(matrixStats)

scod<-stancode(mE1)
cat(scod,file="scod.txt")
#now alter and save as "scodm.stan"
#the altered portion is at the end, and is shown immediately below
#it includes the mixis line and also generates log_lik
#####
    *vector[N] log_lik;*
*    vector[N] mu = rep_vector(0.0, N);*
*    mu += Xs * bs + Zs_1_1 * s_1_1;*
*    for (n in 1:N) {*
*      // add more terms to the linear predictor*
*      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];*
*    }*
*    mu = inv_logit(mu);*
*    for (n in 1:N) {*
*      log_lik[n] = beta_binomial_lpmf(Y[n] | trials[n], mu[n] * phi, (1 - mu[n]) * phi);*
*    }*
*    target += sum(log_lik);*
*    target += log_sum_exp(-log_lik);*
*  }*
*  // priors including constants*
*  target += lprior;*
*  target += std_normal_lpdf(zs_1_1);*
*  target += std_normal_lpdf(z_1[1]);*
*  target += std_normal_lpdf(z_2[1]);*
*}*
*generated quantities {*
*  vector[N] log_lik;*
*  vector[N] mu = rep_vector(0.0, N);*
*    mu += Xs * bs + Zs_1_1 * s_1_1;*
*    for (n in 1:N) {*
*      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];*
*    }*
*    mu = inv_logit(mu);*
*  for (n in 1 : N) {*
*    log_lik[n] = beta_binomial_lpmf(Y[n] | trials[n], mu[n] * phi, (1 - mu[n]) * phi);*
*    }*
*}*
#####

fitE1ms<-stan(file="scodm.stan",data=sdatE1,cores=getOption("mc.cores",2))
save(fitE1ms,file="fitE1ms.rds")
#load("fitE1ms.rds",verbose=TRUE)

log_lik_mix <- extract(fitE1ms)$log_lik
l_common_mix <- rowLogSumExps(-log_lik_mix)
log_weights <- -log_lik_mix - l_common_mix
elpd_mixis <- logSumExp(-l_common_mix) - rowLogSumExps(t(log_weights))

plot(relooE1$pointwise[wh,1],relooE1$pointwise[wh,1],ylim=c(-77,-65),xlim=c(-77,-65),ylab="elpd")
lines(c(-80,-60),c(-80,-60),col=2)
points(relooE1$pointwise[wh,1],elpd_mixis[wh],pch=16)
points(relooE1$pointwise[wh,1],looE1$pointwise[wh],pch=16,col=2)

points(-76,-66,pch=16,col=2)
points(-76,-65.5,pch=16)
text(-75.5,-66,pos=4,label="looE1")
text(-75.5,-65.5,pos=4,label="elpd_mixis")

#as expected, elpd_mixis is (slightly) closer to the true values at the exceptional points

looE1$est
            Estimate        SE
elpd_loo -5887.67703 111.57827
p_loo       45.41424  15.40389
looic    11775.35406 223.15654

sum(elpd_mixis)
# -5887.423

1000^0.5*sd(looE1$pointwise[,1])
# 111.5783
1000^0.5*sd(elpd_mixis)
# 111.8122

plot(elpd_mixis[-wh],looE1$pointwise[-wh,1],col=rgb(0,0,1,0.5))
lines(c(-10,-5),c(-10,-5),col=2)

#with looE2, all pareto-k < 0.7
#but use reloo anyway to obtain the true values

relooE2<-reloo(mE2,looE2,k_threshold=0.65,cores=getOption("mc.cores",2),
control=list(adapt_delta=0.95))
save(relooE2,file="relooE2.rds")
#load("relooE2.rds",verbose=TRUE)

scod2<-stancode(mE2)
cat(scod2,file="scod2.txt")
#now alter analogous to scodm.stan (not shown) and save as "scod2m.stan"

sdatE2<-sdatE1

fitE2ms<-stan(file="scod2m.stan",data=sdatE2,cores=getOption("mc.cores",2),
control=list(adapt_delta=0.95))
save(fitE2ms,file="fitE2ms.rds")
#load("fitE2ms.rds",verbose=TRUE)

log_lik_mix2 <- extract(fitE2ms)$log_lik
l_common_mix2 <- rowLogSumExps(-log_lik_mix2)
log_weights2 <- -log_lik_mix2 - l_common_mix2
elpd_mixis2 <- logSumExp(-l_common_mix2) - rowLogSumExps(t(log_weights2))

plot(relooE2$pointwise[wh,1],relooE2$pointwise[wh,1],ylim=c(-45,-35),xlim=c(-45,-35),ylab="elpd")
lines(c(-45,-35),c(-45,-35),col=2)
points(relooE2$pointwise[wh,1],elpd_mixis2[wh],pch=16)
points(relooE2$pointwise[wh,1],looE2$pointwise[wh],pch=16,col=2)

points(-44,-36,pch=16,col=2)
points(-44,-35.5,pch=16)
text(-43.5,-36,pos=4,label="looE2")
text(-43.5,-35.5,pos=4,label="elpd_mixis2")

#both loo-psis and mixis give values very close to the true elpd

#use elpd_mixis and elpd_mixis2 to compare mE1 with mE2

sum(elpd_mixis-elpd_mixis2)
# 173.5925
1000^0.5*sd(elpd_mixis-elpd_mixis2)
# 54.04471

loo_compare(looE1,looE2)
    elpd_diff se_diff
mE1    0.0       0.0 
mE2 -172.5      53.8 

#very similar conclusions from model comparison by loo-psis and mixis

sum(elpd_mixis-looE1$pointwise[,1])
1000^0.5*sd(elpd_mixis-looE1$pointwise[,1])
#no significant difference between pointwise estimates

sum(elpd_mixis2-looE2$pointwise[,1])
1000^0.5*sd(elpd_mixis2-looE2$pointwise[,1])
#no significant difference between pointwise estimates

plot(elpd_mixis2[-wh],looE2$pointwise[-wh,1],col=rgb(0,0,1,0.5))
lines(c(-10,-5),c(-10,-5),col=2)

Edit by Aki: added code block backticks for easier reading of the code

avehtari · June 27, 2024, 8:46am

In practice, it may require more draws than the number of atoms in the universe, to reach that asymptotic regime. Even the Mix-IS-LOO vignette has not reached the asymptotic regime, but it helps that non-weighted result itself is not that way optimistic as the posterior lpd. There is a need for investigating a bit more about how to assess the reliability of Mix-IS-LOO (yes, I’m working on it).

That is always a good sign that the results are not unstable.

Topic		Replies	Views
Error using Loo with moment matching brms	9	1277	April 14, 2021
Error: "object 'storage' not found" when running loo with moment matching brms loo	7	1896	August 24, 2020
Difficulty comparing models with loo brms fitting-issues , loo	24	5085	October 30, 2020
Brms::loo(moment_match = T) does not appear to work General loo , brms	13	2471	February 6, 2023
Error with loo_moment_match brms loo	15	2416	August 12, 2021

Loo_moment_match failure beta_binomial example

Related topics