Trouble using loo::psis for importance sampling

I’m having some trouble making the loo::psis function do what I think it’s supposed to do when using it to build an importance sampler. (NB: Yes. I know this is not the intended use case, but I can’t see from the code why it wouldn’t work, so I’m clearly missing something.)

My expectation for the following code is that it takes the log ratios log_r and produces the smoothed log weights log_w. If I ask for unnormalized weights, I expect all of these weights to be the same except for the last attr(out,"tail_len") (which is 300 in this case) which should be the modified weights according to the fitted pareto distribution.

Hence if I fit a regression of the log-weights vs the log-ratios, I expect to get an intercept of around zero and a slope of around 1.This did not happen. Instead if I run lm on the samples I get

Call:
lm(formula = log_w ~ log_r)

Coefficients:
(Intercept)        log_r  
    -5.2179       0.9997  

I’m clearly missing something here, but that -5.21 doesn’t make sense to me. If I ask for normalized weights I get an intercept of -9.17, which also doesn’t make sense.

Any idea what’s going on? (Specifically @avehtari but someone else may know)

Reproducible code and session info below:

library(loo)
set.seed(9183)
S = 10000
x = rexp(S,rate=3)
log_r = dexp(x = x,rate=1,log=TRUE) - dexp(x=x,rate=3,log=TRUE) 
out = loo::psis(log_r,r_eff=1)
log_w = weights(out,log=TRUE,normalize=FALSE)  
lm(log_w~log_r)
mean(exp(log_r)) #[1] 0.9603866
mean(exp(log_w)) #[1] 0.005216566

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] loo_2.1.0

loaded via a namespace (and not attached):
[1] compiler_3.5.1     parallel_3.5.1     tools_3.5.1       
[4] yaml_2.1.19        matrixStats_0.54.0

To make computations more stable we subtract the maximum log_r before exponentiating required by Pareto fit. In this case

> max(log_r)  #[1] 5.217747

We don’t add that back, but you can get what you want by

log_w = weights(out,log=TRUE,normalize=FALSE)  + max(log_r)
lm(log_w~log_r)
mean(exp(log_r)) #[1] 0.9603866
mean(exp(log_w)) #[1] 0.9625504

Normalized weights sum to 1.

log_w = weights(out,log=TRUE,normalize=TRUE)  
sum(exp(log_r)) #[1] 9603.866
sum(exp(log_w)) #[1] 1
3 Likes