Comparing theory and stan implementation on sbc

I was comparing rstan sbc.R and stan user manual(+paper) and had some questions:

1. Rank consistency
vignettes has: int ranks_[1] = {pi > pi_};

but user manual and original paper, Validating Bayesian Inference Algorithms with Simulation-Based Calibration, uses

\sum_{m=1}^{M} \mathbf{I}\left[\theta_{k}^{(n, m)}<\theta_{k}^{\operatorname{sim}(n)}\right]

Would it be better to use int ranks_[1] = {pi < pi_}; as the paper shows diagnostic techniques, which depend on the inequality direction.

2. M, N
stan manual explains:

  • M: the number of posterior draws per simulated data set
  • N: total number of simulated data sets and fits

How many times of stan fit is supposed to happen in sbc? MN or N?
If the answer is N, does M in code, correspond to N in user guide, as M times of sampling is happening?

3. n_eff consideration
Considering n_eff, suggested by the above paper, would the following code improvement be o.k? if(n_eff/M >= 1-n_eff_reltol && n_eff/M <= 1+n_eff_reltol) is what @avehtari suggested here.

  • question: can M substituted as iter_sampling?
  • concern: n_eff/iter_sampling might have different value, once thinned. However, in current frame, thinning is not happening because of anesthetic chain as @bgoodri suggested here.
    out <- sampling(stanmodel, data, pars = c("ranks_", if (has_log_lik) "log_lik"), include = TRUE,
      chains = 1L, cores = 1L, seed = S, save_warmup = FALSE, thin = 1L, ...)

with n_eff (cmdstanr code)

    iter_sampling = 1000
      out <- stanmodel$sample(data, chains = 1, iter_sampling = iter_sampling, seed = floor(S), parallel_chains = 1, save_warmup = FALSE, thin = 1)
      n_eff <- out$summary("lp__") %>% pull("ess_bulk")
      if(n_eff/iter_sampling >= 1-n_eff_reltol && n_eff/iter_sampling <= 1+n_eff_reltol) break
      iter_sampling = 2 * iter_sampling



I don’t know. @jonah do you know this?

N is the number of Stan fits. M is the number of draws in the manual.

It looks like M in that code is number of stan fits.

I guess M is the number of draws after thinning, and iter_sampling is the number of draws before thinning.

I think we still do thinning in SBC, just what Bgoodrich is saying is you might choose the amount of thinning after you do all the calculations (because it is expensive to rerun the calculations and it won’t be obvious how to thin before you do them).

1 Like

I guess, you are correct. The inequality sign is the opposite between the paper and Stan.

Even if the interpretation of the histogram of ranks depends on the inequality direction, the uniformity dose not depend on the inequality.
However, when the histogram is not uniformly distributed, then maybe, we have to interpret it by modifying the paper’s manner according to the inequality. E.g., If a histogram gerated by rstan::sbc() has the asymmetry form in page 10 Fig 7 (b) of the paper, then the interpretation will be modified.

I guess the anwer is the product of the size of posteriors and the size of ranks, because we can specify the number of posterior samples and the number of samples of rank statistics as following manner. For each single rank statistic, the size iter of posterior samples are drawn, and thus, for all rank statistics, it is the product.

stanmodel = stanmodel,
      data = data,
          M = M, The sample size of rank statistics
        iter = iter, The sample size of the posteriors

I am a beginner, so, …dont trust me. I am also interested in this topic.
Please correct me if I am wrong.

1 Like

Good question @hyunji.moon. I’m not actually super familiar with the SBC implementation in RStan. I do think it would be better if the code and paper were consistent with each other, but I don’t think it’s wrong to do it either way. I’m not an SBC expert though! We can check with @bgoodri though as he implemented SBC in RStan. Was there a particular reason to do the inequality this way?

It doesn’t matter for whether the histograms (now freqpolys) look uniform.

In this case, the number of fit itself is N, right? We draw M samples from one fit which happens N times. Thanks for the reply.

I agree, but as I have mentioned above, ‘as the paper shows diagnostic techniques, which depend on the inequality direction’ the inequality direction might matter in some cases, the direction of bias for example.

1 Like

Hello everyone,

I am sorry I am jumping into this topic with a basic question, but why don’t we compute the rank statistics within the data averaged posterior? If we do it for each simulated dataset of SBC, wouldn’t the uniformity depend on the number of draws? If the number of draws goes to infinity the posterior should be centered on the ground truth that we sampled and then used to generate the data.

I am also following the example in the manual: 27.3 SBC in Stan | Stan User’s Guide, which is also confusing for me.

Thank you,