Simulation-based calibration case study with RStan

Bob_Carpenter · May 13, 2019, 5:51pm

I created a standalone case study for SBC:

Simulation-Based Calibration with Stan and RStan. sbc.pdf (268.1 KB)

I’d love to get feedback before putting it up with the rest of our case studies. Particularly on my less-than-fluent R, on anything that’s not clear in the presentation, and on the hypothesis testing for uniformity of ranks so this can all be automated.

P.S. @avehtari tells me I should be using this confidence bands code, but even after a couple emails and trying to read the paper on which it’s based, I can’t figure out how to turn bands into p-values. All I need are p-values for a bag of ranks for the null hypothesis that the bag of ranks is uniformly distributed.

avehtari · May 13, 2019, 7:15pm

Here’s the link to multidimensional uniformity test paper [1902.10142] A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions

I still don’t understand why do you need p-values, when you can directly control false positive rate (in some meaning of control), that is, you can choose alpha level for your test. If you make more tests you can include the multiplicity adjustment there. But if you really need individual p’s, I have time next week to make you code for the inversion.

Based on a quick look, it seems you are not thinning antithetic chains? Those can have high n_eff and high correlations messing uniformity. With the current dynamic HMC, it’s very likely that chain for lp__ has thetic behavior, while for other parameters the chain may have highly antithetic behvior. For antithetic cases it would be good to thin with odd lag.

Bob_Carpenter · May 15, 2019, 7:21pm

I don’t need the p-values if I can just flag ones that would have a p-value lower than a given alpha threshold. Is that what you’re saying I could do?

You’re right, I’m not.

By an odd lag, you mean something like thin = 3 or something? I’ll have to rethink the logic to have thinning be odd because I can’t just keep doubling it. But I think you’re saying that no matter what I do, I should always do an odd thinning to deal with the antithetical behavior. As a computer scientist, I like everything to be a factor of two :-)

avehtari · May 15, 2019, 9:20pm

Yes. The code let’s you define alpha threshold for one test. You can adjust that, e.g., using Bonferroni adjustment, for multiple comparison and the code will compute the envelope and then just test whether any part of ecdf (or ecdf difference) is outside the envelope.

Yes,

Double + 1?

bgoodri · May 15, 2019, 10:01pm

@Bob_Carpenter We have put a decent amount of work into the SBC.R function on GitHub to make it easier for people to do SBC and document the conventions that are needed to make it work. I asked everyone for suggestions on how to improve it if they disagreed with the conventions or the functions but no one said anything. I think it would be better if we could all get on the same page as to what people should be calling to do SBC.

ermeel · May 16, 2019, 4:56am

Where precisely?

ermeel · May 16, 2019, 5:04am

@avehtari, what I don’t understand about these bands derived from the 2013 paper by Sivan Aldor-Noiman: Aren’t they derived using the CDF’s of order statistics of [0,1]-valued (continuous) uniform RVs? So for fixed rank, say k, isn’t this proposing to use in the ECDF plot the CI interval for the k’th largest Uniform random number, out of n, where n is the total number of ranks. Am I missing a simple relationship here or am I completely getting this wrong?

avehtari · May 16, 2019, 3:03pm

Yes, which we approximate with beta for large n. Do have better idea? Additional step in the envelope algorithm is to take into account the multiple comparison as we test for all k. For MCMC rank plots we need a small change. We’ll provide more info soon,

ermeel · May 16, 2019, 3:34pm

By no means! I just try to understand.

avehtari · May 16, 2019, 3:46pm

I was already hoping for something better :)

Bob_Carpenter · May 20, 2019, 8:02pm

My goal was explaining SBC, not writing a reference implementation. I found SBC.R here:

github.com

stan-dev/rstan/blob/7f2de6f2dd748396825a139ed7b66aad0c2c66db/rstan/rstan/R/SBC.R

sbc <- function(stanmodel, data, M, ...) {
  stopifnot(is(stanmodel, "stanmodel"))
  
  # parameter names
  stan_code <- get_stancode(stanmodel)
  stan_code <- scan(what = character(), sep = "\n", quiet = TRUE, text = stan_code)
  pars_lines <- grep("[[:space:]]*(pars_)|(pars_\\[.*\\])[[:space:]]*=", stan_code, value = TRUE)
  pars_lines <- pars_lines[!grepl("^[[:space:]]*vector", pars_lines) & 
                           !grepl("^[[:space:]]*real", pars_lines)]   
  pars_names <- trimws(sapply(strsplit(pars_lines, split = "=", fixed = TRUE), tail, n = 1))
  pars_names <- unique(sub("^([a-z,A-Z,0-9,_]*)_.*;", "\\1", pars_names))
  noUnderscore <- grepl(";", pars_names, fixed=TRUE)
  if (any(noUnderscore)) {
    warning(paste("The following parameters were added to pars_ but did not",
      "have the expected underscore postfix:",
      paste(pars_names[noUnderscore], collapse = ' ')))
    return()
  }
  has_log_lik <- any(grepl("log_lik[[:space:]]*;[[:space:]]*", stan_code))

This file has been truncated. show original

Your code’s beyond my R reading comprehension. Could you point out where you make sure you have enough iterations after thinning?

Why is there Pareto K stuff in there? Is that part of SBC or just some other diagnostic for convergence that’s layered onto SBC?

bgoodri · May 21, 2019, 2:30am

The idea is to output unthinned draws and thin when plotting the histogram. That way you can change the thin argument, depending on whether it is antithetic or thetic. The Pareto k is optional, but if you put log_lik into the generated quantities of the Stan program, the print method will tell you what is the proportion greater than 0.7, etc.

ermeel · May 22, 2019, 7:33am

@avehtari, following up on the short discourse regarding CI’s, just to make sure to see where the analytic “difficulty” is located:

If we denote by \mathcal{C}_k the number of (integer) ranks that are less or equal to k (integer) with 0 \leq k \leq L. Don’t we have for \ell (integer) with 0\leq \ell \leq N, under the (Null)-hypothesis that the N ranks are i.i.d. uniform on the integers \{0,1,\dots, L\}:

\mathbb{P}[\mathcal{C}_k \leq \ell ] = I_{1-\frac{k+1}{L+1}}(N-\ell, 1+\ell)

where N is the number of synthetic datasets generated as part of the SBC algorithm and L the number of (uncorrelated) posterior samples generated, given a synthetic dataset. I is the regularized incomplete beta function. This comes via the cumulative distribution function of a Bionomial with number of trials equal to N and success probability equal to (k+1)/(L+1), evaluated at \ell.

Now, the empirical cumulative distribution function (ECDF) at rank k, denoted by \mathcal{E}_k\in[0,1]\subseteq \mathbb{R}, is distributed as \mathcal{C}_k/N thus

\mathbb{P}[\mathcal{E}_k\leq \frac{\ell}{N}] = \mathbb{P}[\mathcal{C}_k \leq \ell ] = I_{1-\frac{k+1}{L+1}}(N-\ell, 1+\ell)

In theory we could use this to get the intended CI, given the null hypothesis of N ranks uniformly distributed over the integers {0, 1,\dots,L}:

So, for each 0\leq k \leq L we need to find integers \ell_k \geq \ell_k' such that (say)

\mathbb{P}[\frac{\ell'_k}{N}\leq \mathcal{E}_k] = 0.05 \Leftrightarrow \mathbb{P}[\frac{\ell'_k}{N}\geq \mathcal{E}_k] = 0.95

and

\mathbb{P}[\frac{\ell_k}{N}\geq \mathcal{E}_k] = 0.05

to get a 90\% CI. (For each k): So for the first case, couldn’t we start at \ell_k'=N and stepwise decrease \ell_k' by one, until we find that \mathbb{P}[\frac{\ell'_k}{N}\geq \mathcal{E}_k]\leq 0.95? Similar, for the second case, we would start at \ell_k=0 and increase by one, until the first time we have \mathbb{P}[\frac{\ell_k}{N}\geq \mathcal{E}_k] \geq 0.05?

avehtari · May 22, 2019, 7:30pm

Thanks for the very clear presentation

It seems you are not certain of this? Well, I think you are right on what you present.

cumulative distribution function of the beta distribution

and the envelope is using beta distribution, but we were just ignoring discreteness C_k/N

The difficult part comes when want examine all k at the same time. Your derivation gives the interval for one k independently, but C_k’s are not independent. It would be great to have non-simulation based solution which would take into account the dependency between C_k’s. Due to the existence of simulation based approaches like in Aldor-Noiman paper, I’ve assumed that there is no easier solution.

Jean_Billie · May 29, 2019, 6:06am

I have question.
Is the following code of SBC available for an arbitrary stanfit object ?

I want to run the SBC for my fitted model object fit of class stanfit created with data dataList

I am not sure what is ‘M’ and what is suitable M and init_r
But I think the code like as follows;

sbc(
fit@stanmodel,
dataList,
M=11 # I am not sure, this is the sample size of pseudo uniform distributuion ?
)

github.com

stan-dev/rstan/blob/7f2de6f2dd748396825a139ed7b66aad0c2c66db/rstan/rstan/R/SBC.R

sbc <- function(stanmodel, data, M, ...) {
  stopifnot(is(stanmodel, "stanmodel"))
  
  # parameter names
  stan_code <- get_stancode(stanmodel)
  stan_code <- scan(what = character(), sep = "\n", quiet = TRUE, text = stan_code)
  pars_lines <- grep("[[:space:]]*(pars_)|(pars_\\[.*\\])[[:space:]]*=", stan_code, value = TRUE)
  pars_lines <- pars_lines[!grepl("^[[:space:]]*vector", pars_lines) & 
                           !grepl("^[[:space:]]*real", pars_lines)]   
  pars_names <- trimws(sapply(strsplit(pars_lines, split = "=", fixed = TRUE), tail, n = 1))
  pars_names <- unique(sub("^([a-z,A-Z,0-9,_]*)_.*;", "\\1", pars_names))
  noUnderscore <- grepl(";", pars_names, fixed=TRUE)
  if (any(noUnderscore)) {
    warning(paste("The following parameters were added to pars_ but did not",
      "have the expected underscore postfix:",
      paste(pars_names[noUnderscore], collapse = ' ')))
    return()
  }
  has_log_lik <- any(grepl("log_lik[[:space:]]*;[[:space:]]*", stan_code))

This file has been truncated. show original

bgoodri · May 29, 2019, 3:47pm

It will work for any Stan program that adheres to the SBC conventions listed in the helpfile and the vignette. M is the number of times to draw from the prior distribution and so it should be much more than 11.

Jean_Billie · May 30, 2019, 5:05am

Thank you for reply. I will try to use your SBC program.

I had also tried to implement SBC, but drawing step, I think my seed selection is not so mixed and cause the non-uniformity, or, my model contains actual bias.

Bob_Carpenter · June 4, 2019, 6:02pm

It’s best to control the seed (or chain ID) directly. The SBC.R code does this as follows (where S is the variable used for the seed) in the code:

    S <- seq(from = 0, to = .Machine$integer.max, length.out = M)[m]

I’m not an R expert, but that looks to me like a roundabout way to compute

  S <- m - 1

bgoodri · June 4, 2019, 6:44pm

If M were 5, then

seq(from = 0, to = .Machine$integer.max, length.out = 5)

yields

[1]          0  536870912 1073741824 1610612735 2147483647

Jean_Billie · June 7, 2019, 1:04pm

Thank you for letting me know the R scripts for the random seed selection !! Using your suggesting code, I will try to write SBC algorithm.

Topic		Replies	Views
Rewriting example models with SBC Developers simulation-based-calibration	8	1340	February 25, 2019
Best practices for Simulation Based Calibration with hierarchical models Algorithms simulation-based-calibration	10	1587	May 22, 2021
Comparing theory and stan implementation on sbc General simulation-based-calibration	7	957	October 29, 2021
SBC for cmdstanr General cmdstanr , simulation-based-calibration	22	1534	December 9, 2020
Need help on sbc uniform proof General simulation-based-calibration	14	1784	July 6, 2021

Simulation-based calibration case study with RStan

Related topics