Checking if sampling uses initial values?

Guido_Biele · March 9, 2024, 1:07pm

Hello,

I am using pathfinder to find initialisation values for sampling, but the warmup phase seems still very slow. Now I am wondering if I am doing something wrong and the user-submitted initial value are not used.
From the cmdstanr documentation it seems that initial values are saved if one does not submit initial values?

Is there another way for me to check which initial values were used?

Ven_Popov · March 9, 2024, 1:26pm

library(cmdstanr)

scode <- "
data {
  int N;
  array[N] real y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  target += normal_lpdf(y | mu, sigma);
}
"

m <- cmdstan_model(write_stan_file(scode))
data <- list(N = 1000, y = rnorm(1000, mean = 3, sd = 2))

fit <- m$sample(data = data, 
                chains = 1, 
                init = list(list(mu = 20, sigma = 5)))
fit$init()

> fit$init()
[[1]]
[[1]]$mu
[1] 20

[[1]]$sigma
[1] 5

You could also save the warmup draws and check the values for the initial few draws:

fit <- m$sample(data = data, 
                chains = 1, 
                init = list(list(mu = 20, sigma = 5)),
                save_warmup = TRUE)
fit$draws(inc_warmup = TRUE)[,,c('mu')]
# A draws_array: 2000 iterations, 1 chains, and 1 variables
, , variable = mu

         chain
iteration  1
        1 20
        2 20
        3 20
        4 20
        5 20

Ven_Popov · March 9, 2024, 1:33pm

As for the speed of the warmup - that depends not only on the initial values, but for complex models much more on adapting the inverse mass matrix (MCMC Sampling). So if the parameters posteriors have very different variances, warmup will be slow

avehtari · March 9, 2024, 6:43pm

With the current Stan implementation it is quite unlikely to see much difference in warmup speed except for very challenging posteriors or if the default initialization would start in the region, where e.g. ODE solver would struggle.

Stan warmup uses a fixed number of iterations and the default number of iterations has been chosen to be safe and conservative. It would be possible to use adaptive warmup length as demonstrated in [1905.11916] Selecting the Metric in Hamiltonian Monte Carlo and New adaptive warmup proposal (looking for feedback)!, and with these you would more often see faster warmup given Pathfinder initialization. Unfortunately, these adaptive warmup methods have not yet been implemented in Stan.

In addition that Pathfinder is useful for quickly testing your model code as you can check that quick result looks somewhat reasoanble, you may get better mass matrix and step size adaptation with very short warmup. For example, in case of one nasty posterior iter_warmup=100, iter_sampling=100 takes already 5mins per chain, but we can get much better ESSs with Pathfinder inits (running Pathfinder takes 5s). Look at the rhat’s and ess’s. First the posterior summary with default inits:

> print(f1, digits=1)
     variable    mean  median    sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__         -7509.0 -7426.0 202.0 53.0 -7945.8 -7350.8  1.4        9       18
 b[1]            -1.2    -1.2   0.1  0.1    -1.3    -1.0  1.0      544      258
 b[2]            -2.4    -2.4   0.1  0.2    -2.6    -2.1  1.1       52      295
 b[3]            -0.8    -0.8   0.1  0.1    -1.0    -0.7  1.0      452      374
 b[4]            -1.3    -1.3   0.1  0.1    -1.4    -1.2  1.0      464      348
 Intercept[1]     0.2     0.2   0.1  0.1     0.1     0.4  1.1       32      115
 Intercept[2]     2.4     2.4   0.1  0.1     2.2     2.6  1.2       18       38
 sd_1[1]          1.6     1.9   0.8  0.2     0.0     2.2  1.4       10       22
 sd_1[2]          1.3     1.3   0.1  0.1     1.1     1.5  1.2       17       48
 sd_1[3]          1.9     1.9   0.2  0.1     1.6     2.2  1.2       14       52

 # showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

and then the posterior summary with Pathfinder inits:

> print(f1i, digits=1)
     variable    mean  median   sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__         -7404.4 -7409.1 43.0 42.7 -7468.8 -7335.9  1.0       94      111
 b[1]            -1.2    -1.2  0.1  0.1    -1.3    -1.0  1.0      397      341
 b[2]            -2.4    -2.4  0.1  0.1    -2.7    -2.2  1.0      424      352
 b[3]            -0.8    -0.8  0.1  0.1    -1.0    -0.7  1.0      388      255
 b[4]            -1.3    -1.3  0.1  0.1    -1.4    -1.1  1.0      506      400
 Intercept[1]     0.2     0.2  0.1  0.1     0.1     0.4  1.0      480      297
 Intercept[2]     2.5     2.5  0.1  0.1     2.3     2.6  1.0      421      305
 sd_1[1]          2.0     2.0  0.1  0.1     1.8     2.2  1.0      216      397
 sd_1[2]          1.3     1.2  0.1  0.1     1.1     1.4  1.0      190      270
 sd_1[3]          2.0     2.0  0.1  0.1     1.7     2.2  1.1       95      178

 # showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

But if in this same case the default 1000 warmup iterations is used, there is no visible benefit from Pathfinder inits as the default, which is good in that sense that the default 1000 warmup iteration approach is not sensitive to the initial values.

Guido_Biele · March 10, 2024, 10:51am

Thanks Aki,

I think tbe posterior of my model might be quite complex (I am estimating a multivariate regressions model with imputarion of multiple variables) and thought after reading your birthday case study that i could speed up things.

Pathfinder was/is indeed very useful for model testing!

Topic		Replies	Views
Using Pathfinder or other method to set initial values for sampling Modeling	32	1200	July 19, 2024
Provinding Pathfinder with Initial Values Modeling	2	147	March 29, 2024
Initializing larger models using samples from smaller models Interfaces techniques , cmdstanr	1	439	August 24, 2023
Stan warm-up/initialization phase CmdStan	1	500	April 30, 2020
Initialization CmdStan fitting-issues , cmdstanr	1	400	October 19, 2023

Checking if sampling uses initial values?

Related topics