Hi Stanimals,
I was wondering whether anything (experiments/heuristics/theory) speaks against using an optimiser to minimise \hat{k} (for example in combination with ADVI, by automatically adjusting parameters like tol_rel
and eta
) instead of tuning the parameters involved (like tol_rel
or eta
) systematically/manually (i.e. de-creasing) until \hat{k} is below, say, 0.7, for the first time (I think I remember @avehtari mentioning this to me).
Short experiment: Ignoring, for now, that there are other ways to tune the parameters below, I came up with the following toy-example (not ADVI):
library(loo)
library(tidyverse)
set.seed(42)
mu_true <- pi
sigma_true <- 1/sqrt(2)
N <- 10000
target_func <- function(x) {
mu_trial <- x[1]
samps <- rnorm(N,mu_trial, sigma_true)
log_ratios <- dnorm(samps, mu_true, sigma_true, log=T)-dnorm(samps, mu_trial, sigma_true,log=T)
psis(log_ratios=log_ratios, r_eff=NA, cores=2)$diagnostics$pareto_k
}
rslts <- purrr::map_dbl(1:5000, ~optim(c(2),target_func,method="BFGS")$par)
ggplot(tibble(x=rslts))+
geom_histogram(aes(x=x), binwidth = .05, fill=NA, color="black") +
geom_vline(xintercept=pi,color='green')
This produces the following figure
The peak around 2, might be an artefact because I was not careful with the optimiser options. Nevertheless, I wanted to check whether something else could cause that minimising \hat{k} can go wrong (after all, low \hat{k} is not a sufficient condition for a good approximation).
Thank you!