Simulation-based calibration case study with RStan

Thank you for letting me know. I learn a lot form you many times.

Thanks. I should’ve run it rather than guessing. I’m very bad at guessing what R functions are going to do.

Is there a reason not to use 0:4 instead of these highly spread out numbers? Do seeds that are closer somehow windup generating draws that are closer? My ignorance of PRNGs runs deep!

1 Like

If your data block has something like

transformed data {
  real mu_ = std_normal_rng();
  real sigma_ = exponential_rng();
  vector[N] y;
  for (n in 1:N) y[n] = normal_rng(mu_, sigma_);
}

Then my intuition is that if you started one chain with a seed of 0 and another with a seed of 1, then the y realizations would be the same except offset by one index position. So, that is why I went with maximal spacing of the seeds.

2 Likes

Generally pseudo random number generators don’t work like that. That would indicate that the sequence would repeat after .Machine$integer.max steps. The better prngs have much more states than .Machine$integer.max and integer seeds represent only small portion of those states and those states are very far away. Stan uses ecuyer1988, which has length of cycle 2^{61} and .Machine$integer.max is 2^{31}, so there are plenty of pseudo random numbers between each seed.

Mathworks did some carefully analysis years ago choosing their default prng, which is now Mersenne twister. Mersenne twister mt19937 is also what boost recommends as the first choice. It has length of cycle 2^{19937}-1. Use of ecuyer1988 in Stan has been questioned before Quality of Pseudo-Random Number Generators, but for some reason only other than boost library solutions (e.g. mt19937) were considered https://github.com/stan-dev/stan/issues/403
mt19937 would be also 13 times faster than ecuyer1988, but I guess speed has not been a problem.

1 Like

Mersenne is the typical default for most implementations for the reasons that @avehtari mentions, despite its relatively large state.

My understanding of why we went with ecuyer1988 instead of the twister is that it can be strided – offsets in the seed yield guarantees on how independent the resulting sequences appear. This is why we have chain_id – the actual seed used for each chain is seed + chain_id so that each chain will have independent PRNGs, at least within the first cycle.

For something like SBC we only need seed variation amongst chains – the changing data will ensure variation in the posterior samples from iteration to iteration.

If we wanted to consider moving beyond ecuyer1998 then it would be worthwhile to look into http://www.pcg-random.org which allows for PRNGs that can be strided (i.e. that admit multiple streams) with relatively small state and fast execution and configurable periods.

2 Likes

That was the reasoning, specifically because @bgoodri was concerned about using proper parallel RNGs (ones that can be strided ahead efficiently).

Neither speed nor randomness seem to be a problem with what we have. The other candidate that @bgoodri was recommending was

The only reason we never got around to adding it was that it was an extra dependency and the one we have seems to work fine. If we do need to change RNGs, it won’t be a big deal as it has a pretty simple interaction with the rest of the code.

In case someone would like to have a parallel version of Bob’s SBC function, I copied from the rstanarm implementation of sbc() (the latter does not support the automatic adaption of thinning based on n_eff while ensuring a fixed number of output samples). Here we go:

sbc_parallel <- function(model, data = list(),
                sbc_sims = 1000, stan_sims = 999,
                init_thin = 4, max_thin = 64,
                target_n_eff = 0.8 * stan_sims) {
    num_params <- num_monitored_params(model, data) 
    ranks <- matrix(nrow = sbc_sims, ncol = num_params) 
    thins <- rep(NA, sbc_sims)
    sbc_sim <- function(n) {
        seed <- seq(from = 0, to = .Machine$integer.max, length.out = sbc_sims)[n]
        n_eff <- 0
        thin <- init_thin 
        ranks <- rep(NA,num_params) #matrix(nrow = sbc_sims, ncol = num_params) 
        while (TRUE) {
            fit <- sampling(model,
                            data = data,
                            seed=seed,
                            chains = 1,
                            iter = 2 * thin * stan_sims,
                            thin = thin,
                            control = list(adapt_delta = 0.99), 
                            refresh = 0)
            fit_summary <- summary(fit, pars = c("lp__"), probs = c())$summary 
            n_eff <- fit_summary["lp__", "n_eff"]
            if (n_eff >= target_n_eff || (2 * thin) > max_thin) break;
            thin <- 2 * thin
        }
        
        lt_sim <- rstan::extract(fit)$I_lt_sim
        for (i in 1:num_params)ranks[i] <- sum(lt_sim[ , i]) + 1
        c(thin,ranks)
    }
    #rslt <- lapply(1:sbc_sims, sbc_sim)
    rslt <- parallel::mclapply(1:sbc_sims, sbc_sim)
    for (n in 1:sbc_sims) {
        thins[n] = rslt[[n]][1]
        ranks[n,] = rslt[[n]][-1]
    }
    pval <- rep(NA, num_params) 
    for (i in 1:num_params)
        pval[i] <- test_uniform_ranks(ranks[ , i],
                                      max_rank = stan_sims + 1)
    list(rank = ranks, p_value = pval, thin = thins) 
}

Moreover I also copied the seed specification from @bgoodri, albeit I understand from the discussion above that the “max-spacing” approach for seeds might not be needed (so we could have set seed=n above as well, I guess, or leave it as it is in Bob’s original sbc implementation).

3 Likes

Condition n_eff >= target_n_eff has the problem that it doesn’t recognize antithetic chains, which should be thinned, too. Instead of having target_n_eff = 0.8 * stan_sims, it would be better to have, e.g., n_eff_reltol=0.2 and then condition (n_eff/stan_sims >= 1-n_eff_reltol && n_eff/stan_sims <= 1+n_eff_reltol)

2 Likes

Thanks Aki, didn’t think about this, I guess implicitly I assumed antithetic chains would not cause any problems in SBC.

This is one reason why the sbc function returns all the draws and then you can choose the thinning interval afterward when you do the plots.

1 Like

For completeness, here the corresponding parallel version that also works with antithetic chains:

sbc_parallel_antithetic <- function(model, data = list(),
                sbc_sims = 1000, stan_sims = 999,
                init_thin = 4, max_thin = 64,
                n_eff_reltol=0.2) {
    num_params <- num_monitored_params(model, data) 
    ranks <- matrix(nrow = sbc_sims, ncol = num_params) 
    thins <- rep(NA, sbc_sims)
    sbc_sim <- function(n) {
        seed <- seq(from = 0, to = .Machine$integer.max, length.out = sbc_sims)[n]
        n_eff <- 0
        thin <- init_thin 
        ranks <- rep(NA,num_params) #matrix(nrow = sbc_sims, ncol = num_params) 
        while (TRUE) {
            fit <- sampling(model,
                            data = data,
                            seed=seed,
                            chains = 1,
                            iter = 2 * thin * stan_sims,
                            thin = thin,
                            control = list(adapt_delta = 0.99), 
                            refresh = 0)
            fit_summary <- summary(fit, pars = c("lp__"), probs = c())$summary 
            n_eff <- fit_summary["lp__", "n_eff"]
            if ((n_eff/stan_sims >= 1-n_eff_reltol && n_eff/stan_sims <= 1+n_eff_reltol) || (2*thin) > max_thin) break;
            thin <- 2 * thin
        }
        
        lt_sim <- rstan::extract(fit)$I_lt_sim
        for (i in 1:num_params)ranks[i] <- sum(lt_sim[ , i]) + 1
        c(thin,ranks)
    }
    rslt <- parallel::mclapply(1:sbc_sims, sbc_sim)
    for (n in 1:sbc_sims) {
        thins[n] = rslt[[n]][1]
        ranks[n,] = rslt[[n]][-1]
    }
    pval <- rep(NA, num_params) 
    for (i in 1:num_params)
        pval[i] <- test_uniform_ranks(ranks[ , i],
                                      max_rank = stan_sims + 1)
    list(rank = ranks, p_value = pval, thin = thins) 
}
1 Like

I’d suggest looking at @bgoodri’s implementation in RStan. Mine was more meant for tutorial purposes to explain how SBC works, not be the engine people use to actually do it in practice!