The current ADVI implementation can sometimes (stochastically) fail to fit even the simplest models. I think this might indicate that some defaults for algorithm parameters are suspicious.
Context: I am helping @hyunji.moon to build an example of using SBC with ADVI for the SBC package. This means that we need do a lot of ADVI runs. Some of the runs do stochastically result in an error instead of giving us output, even for simple model.
Here’s a simple reproducible example with latest cmdstanr
+ CmdStan 2.27.0
library(cmdstanr)
tn <- "
data {
int N;
vector[N] y;
}
parameters {
real loc;
real <lower = 0> scale;
}
model {
loc ~ normal(0, 1);
scale ~ lognormal(0, 1);
y ~ normal(loc, scale);
}
"
simple_mod = cmdstan_model(stan_file = write_stan_file(tn))
set.seed(48752)
loc <- 1
scale <- 1
data <- list(
N = 20,
y = rnorm(20, loc, scale)
)
n_errors <- 0
sink("all_ouptuts.txt", type = "output")
for(i in 1:100) {
fit <- simple_mod$variational(data = data)
if(all(fit$return_codes() != 0)) {
n_errors <- n_errors + 1
}
}
sink(NULL)
cat("Total errors: ", n_errors)
I get 4 errors. The exact number of errors varies slightly with seed, but there are almost always some. All the error messages are:
Chain 1 stan::variational::normal_meanfield::calc_grad:
The number of dropped evaluations has reached its maximum amount (10).
Your model may be either severely ill-conditioned or misspecified.
It doesn’t appear to be really documented, but looking at the code the number of dropped evaluations that triggers the message depends on the grad_samples
parameter. And indeed, when I add grad_samples = 20
to the $variational
call, I quite reliably get no errors.
To reliably get no error across 1000 fits, I need to ramp up grad_samples
even higher…
I don’t understand internals of ADVI very well, but if even such a simple model has a reasonable (a few percent) chance of failing on default settings, maybe the defaults should be made more conservative? Or is there other than performance downside to setting grad_samples
to be larger? Or is there something else one should do to make the ADVI results a bit more stable?
I’ve noted two previous mentions of the error: Stan::variational::normal_meanfield::calc_grad - can be falsely driven by tranformed parameters? and
"stan::variational::normal_meanfield::calc_grad: The number of dropped evaluations has reached its maximum amount (10)." Is 10 a reasonable number? in both cases the recommendation was to make the model better behaved - I however think that in the case I present here, the model is about as well behaved as you can get.