Need help on sbc uniform proof

This is from appendix B of SBC paper.
I am a little confused about the following, could anyone please explain in more detail?

  1. variable changing part and it is p or P?
  2. could P(\tilde{f(p)}) = p be explained more mathematically or intuitively?

According to the paper, \tilde{f} is the evaluation of the random variable with respect to the prior sample with f_n = f(\theta_n ) the evaluation of the random variable with respect to the posterior sample.


1 Like

If @betanalpha can’t explain this then nobody can :)

@hyunji.moon The result here is just the fact the quantile function is the inverse of a cumulative distribution function and vise versa. It’s also why the null model for ranks is also in general uniform. Really the only insight in SBC is that the average posterior is equal to the prior so that the quantile function of one is the inverse of the cumulative distribution function of the other.

Note also that technically the proof doesn’t actually cover SBC as we describe it in the paper. The proof is valid if the prior sample and posterior samples are generated independently, but the procedure we recommend draws the posterior samples conditioned on the prior sample. Empirically is still works but the proof is much more complicated and few have spent much time trying to expand the proof (which probably involves using the exchangeability of the posterior samples).

I had to stop and think about this for a minute because you don’t recommend using the prior sample \tilde{\theta} to inform the posterior samples \theta_{i}, right? The theoretical dependency is indirect, only through the posterior being conditional on the simulated data \tilde{y} which in turn is conditional on \tilde{\theta}?
I think such indirect dependency becomes irrelevant if you think in terms of the joint distribution \pi\left(\theta,y\right) without worrying about splitting it into the prior and likelihood. It doesn’t seem like the general proof should be complicated.
Unless I’m missing something you can just argue that because

\pi\left(\theta\right)\pi\left(y\middle|\theta\right)=\pi\left(y\right)\pi\left(\theta\middle|y\right)=\pi\left(\theta,y\right)

then the SBC algorithm

  1. draw prior sample \tilde{\theta}\sim\pi\left(\theta\right)
  2. draw simulated data set \tilde{y}\sim\pi\left(y\middle|\tilde{\theta}\right)
  3. draw posterior samples \theta_{1},\dots,\theta_{L}\sim\pi\left(\theta\middle|\tilde{y}\right)

samples from the same joint distribution \left(\tilde{\theta},\tilde{y},\theta_{1},\dots,\theta_{L}\right)\sim\pi\left(\theta,y\right)\pi\left(\theta\middle|\tilde{y}\right)^{L} as

  1. draw prior predictive sample \tilde{y}\sim\pi\left(y\right)
  2. draw posterior sample \tilde{\theta}\sim\pi\left(\theta\middle|\tilde{y}\right)
  3. draw posterior samples \theta_{1},\dots,\theta_{L}\sim\pi\left(\theta\middle|\tilde{y}\right)

where it’s now obvious that \tilde{\theta} is exhangable with each of \theta_{1},\dots,\theta_{L} and the rank statistic must be uniformly distributed.

3 Likes

Unfortunately that’s not sufficient. In SBC the expectations that we’re computing are actually with respect to the joint distribution

\pi(\theta', y, \theta) = \pi(\theta' \mid y) \pi(y \mid \theta) \pi(\theta).

As you note the influence of \theta on \theta' is only indirect through the intermediate data y, but that indirect influence is still enough to prevent the nested integrals from factoring in a clean way and allowing a straightforward calculation.

Another way of stating it is that the implicit function whose expectation we’re taking to get the ranks depends on both \theta and \theta' and hence the full conditional structure of \pi(\theta', y, \theta).

1 Like

Here’s more detail in case anyone has suggestions.sbc_calcs.pdf (99.0 KB)

2 Likes

Side note, I hate the conventional notation that overloads the symbols so that every distribution is called \pi\left(\cdot\right). Let’s go with probability density functions \pi_{\mathrm{prior}}\left(\theta\right), \pi_{\mathrm{sample}}\left(y|\theta\right) and \pi_{\mathrm{post}}\left(\theta|y\right).

The equality f\left(u\right)=u does not hold even in the most elementary example.
For suppose \theta\sim\mathit{Uniform}\left(0,1\right) and y\sim\mathit{Bernoulli}\left(\theta\right). The density functions are

\pi_{\mathrm{prior}}\left(\theta\right)=1 \\ \pi_{\mathrm{sample}}\left(y|\theta\right)=\begin{cases} 1-\theta & y=0\\ \theta & y=1 \end{cases} \\ \pi_{\mathrm{post}}\left(y|\theta\right)=\begin{cases} 2-2\theta & y=0\\ 2\theta & y=1 \end{cases}

The prior quantile function \tilde{\theta}_{0}\left(u\right)=u is simply the identity and the y integral becomes a sum so that

\begin{equation} \begin{split} f\left(u\right) & = \int_{-\infty}^{\tilde{\theta}_{0}\left(u\right)}\mathrm{d}\theta\,\sum_{y}\,\pi_{\mathrm{post}}\left(\theta|y\right)\pi_{\mathrm{sample}}\left(y|\tilde{\theta}_{0}\left(u\right)\right)\\ & = \int_{0}^{u}\mathrm{d}\theta\,\left(\pi_{\mathrm{post}}\left(\theta|0\right)\pi_{\mathrm{sample}}\left(0|u\right)+\pi_{\mathrm{post}}\left(\theta|1\right)\pi_{\mathrm{sample}}\left(1|u\right)\right)\\ & = \int_{0}^{u}\mathrm{d}\theta\,\left(\left(2-2\theta\right)\left(1-u\right)+2\theta u\right) \\ & = \left(2u-u^{2}\right)\left(1-u\right)+u^{2}u\\ & \neq u \end{split} \end{equation}

Regardless, I don’t see why the full distribution wouldn’t factor in a clean way. The obstruction arises when integrating out \tilde{y}; but what if you condition on \tilde{y} instead?

The full joint density is

\pi_{\mathrm{prior}}\left(\tilde{\theta}_{0}\right)\pi_{\mathrm{sample}}\left(\tilde{y}|\tilde{\theta}_{0}\right)\prod_{l}\pi_{\mathrm{post}}\left(\theta_{l}|\tilde{y}\right)

and the Bayes’ theorem tells us that

\pi_{\mathrm{prior}}\left(\tilde{\theta}_{0}\right)\pi_{\mathrm{sample}}\left(\tilde{y}|\tilde{\theta}_{0}\right)=\pi_{\mathrm{post}}\left(\tilde{\theta}_{0}|\tilde{y}\right)L\left(\tilde{y}\right)

where

L\left(y\right)=\int\mathrm{d}\theta\,\pi_{\mathrm{prior}}\left(\theta\right)\pi_{\mathrm{sample}}\left(y|\theta\right)

so that the joint density is in fact equal to

L\left(\tilde{y}\right)\pi_{\mathrm{post}}\left(\tilde{\theta}_{0}|\tilde{y}\right)\prod_{l}\pi_{\mathrm{post}}\left(\theta_{l}|\tilde{y}\right)

and conditional on \tilde{y} all thetas are IID.

1 Like

Again, manipulating the joint density isn’t sufficient without considering the appropriate expectation. In particular because the order statistic conditions on a prior sample one has to be careful to respect that conditioning otherwise the integrals will remain intractable.

In looking back on this again I realized that I was trying to prove the wrong thing. The marginal conditional

\pi(\theta' \mid \theta) = \int \mathrm{d} y \, \pi(\theta', y \mid \theta)

corresponds to simulating a single model configuration from the prior but then multiple observations and then multiple posterior samples before computing the rank, which is not how we construct the SBC algorithm in the paper. What we actually do is sample a single observation for each model configuration and then a single observation before generating multiple posteriors samples and then computing the rank. This subtle difference implies that the probabilities in the order statistic are actually conditional on the data as well,

\pi(\theta' \mid \theta, y)

with an extra integral over y on the outside.

This conditional structure of the joint then allows us to simplify \pi(\theta' \mid \theta, y) = \pi(\theta' | y) from which the proof proceeds basically as intended.

I’m attaching what I believe is a proper proof for anyone interested.sbc_calcs_proper.pdf (96.8 KB)

6 Likes

If anyone has the time I’m super curious to see the result of an empirical study where the data are simulated multiple times (and only one posterior sample drawn from each posterior)
i.e.

\begin{align*} \tilde{\theta} &\sim \pi(\theta) \\ \tilde{y}_{n} &\sim \pi(y \mid \tilde{\theta}) \\ \tilde{\theta}'_{n} &\sim \pi(\theta \mid \tilde{y}_{n}) \\ r &= \sharp [ \theta < \tilde{\theta}'_{n} ] \end{align*}

Are the ranks uniform, implying that there should be some way to get the old proof to work, or are they in fact non-uniform?

1 Like
from numpy import zeros, arange
from numpy.random import beta, binomial
from matplotlib.pyplot import hist, show
r = []
for _ in range(10_000):
  t = beta(1,1)
  y = binomial(1,t,size=10)
  tt = beta(1+y,2-y)
  r.append(sum(t < tt))

hist(r, bins=arange(12)-0.5)
show()

sbc

4 Likes

Perfect. I spent some time after dinner to put together a R demonstration for anyone interested.

c_dark <- c("#8F2727")
c_dark_highlight <- c("#7C0000")

m <- 0
s <- 0.5
s2 <- s * s

sigma <- 1
sigma2 <- sigma * sigma

N <- 100000 # Number of replications
L <- 25     # Number of posterior samples
B <- 25     # Number of rank bins

# Correct SBC
ranks <- rep(0, 100)
for (n in 1:N) {
  theta <- rnorm(1, m, s)
  y <- rnorm(1, theta, sigma)
  post_m <- (y * s2 + m * sigma2) / (s2 + sigma2)
  post_s <- sqrt( s2 * sigma2 / (s2 + sigma2) )
  post_theta <- rnorm(L, post_m, post_s)
  ranks[n] <- sum(post_theta < theta)
}

sbc_hist <- hist(ranks, breaks=seq(0, L + 1, 1) - 0.5, plot=FALSE)
plot(sbc_hist, main="SBC (Correct)", xlab="Prior Rank", yaxt='n', ylab="")

low <- qbinom(0.005, N, 1 / (L + 1))
mid <- qbinom(0.5, N, 1 / (L + 1))
high <- qbinom(0.995, N, 1 / (L + 1))
bar_x <- c(-1, L + 1, L + 0.5, L + 1, -1, -0.5, -1)
bar_y <- c(high, high, mid, low, low, mid, high)

polygon(bar_x, bar_y, col=c("#DDDDDD"), border=NA)
segments(x0=-0.5, x1=(L + 0.5), y0=mid, y1=mid, col=c("#999999"), lwd=2)

plot(sbc_hist, col=c_dark, border=c_dark_highlight, add=T)

# Incorrect SBC
ranks <- rep(0, 100)
for (n in 1:N) {
  theta <- rnorm(1, m, s)
  y <- rnorm(L, theta, sigma)
  post_m <- (y * s2 + m * sigma2) / (s2 + sigma2)
  post_s <- sqrt( s2 * sigma2 / (s2 + sigma2) )
  post_theta <- rnorm(L, post_m, post_s)
  ranks[n] <- sum(post_theta < theta)
}

sbc_hist <- hist(ranks, breaks=seq(0, L + 1, 1) - 0.5, plot=FALSE)
plot(sbc_hist, main="SBC (Incorrect)", xlab="Prior Rank", yaxt='n', ylab="")

low <- qbinom(0.005, N, 1 / (L + 1))
mid <- qbinom(0.5, N, 1 / (L + 1))
high <- qbinom(0.995, N, 1 / (L + 1))
bar_x <- c(-1, L + 1, L + 0.5, L + 1, -1, -0.5, -1)
bar_y <- c(high, high, mid, low, low, mid, high)

polygon(bar_x, bar_y, col=c("#DDDDDD"), border=NA)
segments(x0=-0.5, x1=(L + 0.5), y0=mid, y1=mid, col=c("#999999"), lwd=2)

plot(sbc_hist, col=c_dark, border=c_dark_highlight, add=T)
3 Likes