Hello,
I am using pathfinder to find initialisation values for sampling, but the warmup phase seems still very slow. Now I am wondering if I am doing something wrong and the user-submitted initial value are not used.
From the cmdstanr documentation it seems that initial values are saved if one does not submit initial values?
Is there another way for me to check which initial values were used?
library(cmdstanr)
scode <- "
data {
int N;
array[N] real y;
}
parameters {
real mu;
real<lower=0> sigma;
}
model {
target += normal_lpdf(y | mu, sigma);
}
"
m <- cmdstan_model(write_stan_file(scode))
data <- list(N = 1000, y = rnorm(1000, mean = 3, sd = 2))
fit <- m$sample(data = data,
chains = 1,
init = list(list(mu = 20, sigma = 5)))
fit$init()
> fit$init()
[[1]]
[[1]]$mu
[1] 20
[[1]]$sigma
[1] 5
You could also save the warmup draws and check the values for the initial few draws:
fit <- m$sample(data = data,
chains = 1,
init = list(list(mu = 20, sigma = 5)),
save_warmup = TRUE)
fit$draws(inc_warmup = TRUE)[,,c('mu')]
# A draws_array: 2000 iterations, 1 chains, and 1 variables
, , variable = mu
chain
iteration 1
1 20
2 20
3 20
4 20
5 20
1 Like
As for the speed of the warmup - that depends not only on the initial values, but for complex models much more on adapting the inverse mass matrix (MCMC Sampling). So if the parameters posteriors have very different variances, warmup will be slow
1 Like
With the current Stan implementation it is quite unlikely to see much difference in warmup speed except for very challenging posteriors or if the default initialization would start in the region, where e.g. ODE solver would struggle.
Stan warmup uses a fixed number of iterations and the default number of iterations has been chosen to be safe and conservative. It would be possible to use adaptive warmup length as demonstrated in [1905.11916] Selecting the Metric in Hamiltonian Monte Carlo and New adaptive warmup proposal (looking for feedback)!, and with these you would more often see faster warmup given Pathfinder initialization. Unfortunately, these adaptive warmup methods have not yet been implemented in Stan.
In addition that Pathfinder is useful for quickly testing your model code as you can check that quick result looks somewhat reasoanble, you may get better mass matrix and step size adaptation with very short warmup. For example, in case of one nasty posterior iter_warmup=100, iter_sampling=100
takes already 5mins per chain, but we can get much better ESSs with Pathfinder inits (running Pathfinder takes 5s). Look at the rhat’s and ess’s. First the posterior summary with default inits:
> print(f1, digits=1)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -7509.0 -7426.0 202.0 53.0 -7945.8 -7350.8 1.4 9 18
b[1] -1.2 -1.2 0.1 0.1 -1.3 -1.0 1.0 544 258
b[2] -2.4 -2.4 0.1 0.2 -2.6 -2.1 1.1 52 295
b[3] -0.8 -0.8 0.1 0.1 -1.0 -0.7 1.0 452 374
b[4] -1.3 -1.3 0.1 0.1 -1.4 -1.2 1.0 464 348
Intercept[1] 0.2 0.2 0.1 0.1 0.1 0.4 1.1 32 115
Intercept[2] 2.4 2.4 0.1 0.1 2.2 2.6 1.2 18 38
sd_1[1] 1.6 1.9 0.8 0.2 0.0 2.2 1.4 10 22
sd_1[2] 1.3 1.3 0.1 0.1 1.1 1.5 1.2 17 48
sd_1[3] 1.9 1.9 0.2 0.1 1.6 2.2 1.2 14 52
# showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
and then the posterior summary with Pathfinder inits:
> print(f1i, digits=1)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -7404.4 -7409.1 43.0 42.7 -7468.8 -7335.9 1.0 94 111
b[1] -1.2 -1.2 0.1 0.1 -1.3 -1.0 1.0 397 341
b[2] -2.4 -2.4 0.1 0.1 -2.7 -2.2 1.0 424 352
b[3] -0.8 -0.8 0.1 0.1 -1.0 -0.7 1.0 388 255
b[4] -1.3 -1.3 0.1 0.1 -1.4 -1.1 1.0 506 400
Intercept[1] 0.2 0.2 0.1 0.1 0.1 0.4 1.0 480 297
Intercept[2] 2.5 2.5 0.1 0.1 2.3 2.6 1.0 421 305
sd_1[1] 2.0 2.0 0.1 0.1 1.8 2.2 1.0 216 397
sd_1[2] 1.3 1.2 0.1 0.1 1.1 1.4 1.0 190 270
sd_1[3] 2.0 2.0 0.1 0.1 1.7 2.2 1.1 95 178
# showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
But if in this same case the default 1000 warmup iterations is used, there is no visible benefit from Pathfinder inits as the default, which is good in that sense that the default 1000 warmup iteration approach is not sensitive to the initial values.
2 Likes
Thanks Aki,
I think tbe posterior of my model might be quite complex (I am estimating a multivariate regressions model with imputarion of multiple variables) and thought after reading your birthday case study that i could speed up things.
Pathfinder was/is indeed very useful for model testing!