Sbc histogram interpretation

Here are the least uniformly distributed histograms from a simulation-based calibration,

The alpha parameters seem to exhibit a slight inverted U shape like Figure 5 in the SBC paper. This is integrating over 500 datasets. The pattern persists with binning of 4 or 8. How do I interpret this?

@seantalts The SBC paper says that this pattern indicates that the data averaged posterior is overdispersed relative to the prior, but that doesn’t make sense. In the code (3.7 KB), alpha is generated from lognormal_rng(log(1)-0.15^2/2.0, 0.15) but the prior for parameter recovery is set to lognormal(log(2)-.5^2/2.0, .5). Any thoughts?

1 Like

So in that model you’re both generating the random draws from a prior and then fitting to them, right? I think it makes sense to me - your posterior is wider than your prior for all of the alphas (look at the sigma parameter of the lognormal - it’s larger in the model than in the data generation), hence calling it over-dispersed relative to the prior. I might be missing something - can you give me some more context?

Re: Binning - ideally you’d aim for 1 bin per rank to avoid any binning artifacts whatsoever.


Oh! When you put it that way, yes, it makes sense.

Also, tangentially, I find Equation 1 from the SBC manuscript a bit confusing because it relies on definitions given in-line in the preceding two paragraphs. I think it’s worth using numbered displayed equations and grouping all the math together for clarity. My other complaint is that you use the same notation for the prior for ground truth parameters \tilde\theta \sim \pi(\theta) and the data averaged prior \pi(\theta) defined in Equation 1. How about using \pi_1(\theta) for the former and \pi_2(\theta) for the latter so then you can say that you expect \pi_1(\theta) = \pi_2(\theta)?


I like that - when we do a second revision I’ll make a note to incorporate that. Thanks!


Instead of lognormal, I put an exponential(1.0) prior on alpha. The histogram is slightly improved, sbc-exp
This parameter is analogous to \alpha in edstan. If I can’t think of a better way to parameterize the model then what? Put the histogram in the manuscript and say that alpha will be slightly biased toward zero?

Just to be clear, you are trying to put the same prior on alpha in both the data generating process and the model block now right?

No, I don’t see how that could work. Consider edstan which has an exponential(0.1) prior on the scale of a normal distribution, \sigma \sim \text{Exp}(0.1) and \theta_j \sim \mathcal N(\dots,\sigma^2). This prior definitely doesn’t make sense for data generation, but seems to do fine in parameter recovery. In an SBC context, aren’t we just validating a subset of the parameter space if we do this?

I sense the assumption in the SBC paper that data generation and parameter recovery use the same priors, but isn’t this too limiting?

SBC calibrates the degree to which your algorithm can fit your model assuming your data generating process matches your model specification. It doesn’t sound like that’s what you’re looking for - what are you aiming to show? It sounds like you might be looking for tools like posterior predictive checks and loo.

When you say ‘algorithm’, surely you don’t just mean HMC, ADVI, or INLA? My impression was that SBC is also an important part of model development, that is, “a critical part of a robust Bayesian workflow.” I took this phrase to mean that we should use SBC when developing models. Certainly SBC should work if the data generating process matches the model specification. If it doesn’t then there is a bug in the model specification. However, the use of SBC seems to extend beyond this narrow situation.

Here’s an example. My interpretation is that SBC verified that a subset of the parameter space is well calibrated.

Given that there are situations where the data generating process is somewhat different from the parameter recovery model, SBC seems like a useful procedure to investigate whether the posterior will be accurate or not. Why would SBC not help here?

Posterior predictive checks and loo are great once you actually have real data, but SBC seems like a useful procedure to validate your model before you have data.

Here are my two cents, @seantalts and @jpritikin please feel free to kick my butt if I misspeak.

Nope. Only if the algorithm is working as expected, which is the point of running SBC. To find out if it is. Here, “algorithm” can be understood as, for instance, (i) running favourite MCMC to approximate target (ii) computing expectations and associated MCSE. If there is substantial autocorrelation, for example, the rank histogram will be horn-shaped (Fig. 4), indicating that one needs to tweak the thinning step of the “algorithm”. This also helps to study the interaction of a given target with a given approximation algorithm as for example when one has a funnel effect that prevents proper exploration of the space. This will also likely show up in SBC histogram as a horn pattern (Fig. 6), suggesting under-dispersion of the approximated target relative to the “true” one.

Not sure if it won’t help, but it is my impression that Theorem 1 in the paper only works out if the data generating process is the prior.

1 Like

I do! I should have said inference algorithm. It’s important to see how well your inference algorithm can work with your model - for example, even with exactly the same DGP and model specification, you see severe bias even with HMC on the eight schools model.

If you wanted to verify that some parameter subspace is fit appropriately with your inference algorithm, you would change both the DGP and model in lock-step to verify that. I’m not sure I understand your use case, though. It’s an interesting idea to see how well an incorrect model could recover a mis-specified DGP, but 1) if you had some better guess about the DGP, you would just use that as the model in most cases; and 2) if you had a lot of uncertainty about the DGP you would attempt to widen your priors in the model basically. One case it would be interesting is for e.g. discrete parameters - you might have a DGP that generates quickly and uses discrete parameters (or some other computationally impossible part) and then you might want to calibrate how well a model without those nasty parts can recover the rest of the parameters. Is that what you want to do here? It didn’t seem like it as the DGP you had down looked at the surface level to be just as computationally feasible as the model block, but maybe I missed something.

When I said that “SBC should work,” I meant that the SBC procedure to test the algorithm would be an appropriate test. So I agree with your post, but it doesn’t seem very relevant to the point that I was trying to discuss.

Oh! Yes, some additional clarification about the scope of the SBC paper might be helpful to readers.

Sure, and isn’t an SBC-like test a marvelous way to check whether the parameter subspace is fit appropriately?

I’m not sure you, or others here, have the time to understand the models that I’ve been playing with recently. That’s why I put in the link to the trivial lognormal / normal example (see above). Without getting into too much detail, I just wanted to provide some vague justification for mismatch between the DGP and parameter recovery prior. I like your example with discrete parameters. That’s another case where a mismatch seems unavoidable. The question I’m interested in getting answered is whether SBC is a reasonable procedure for validating parameter subspaces of these kinds of models.

1 Like

I think the key thing is just to make sure that you definitely can’t just make your model match the DGP - you would always prefer to do that if it had reasonable computational properties. If not, we didn’t really intend the paper to show this as the use-case seemed pretty rare, but I do think SBC can give a decent sense for how well your fits are recovering parameters. @maxbiostat is right that the proof doesn’t work if the DGP doesn’t match the model, so use with caution.

1 Like

Right – and it has to be the entire data generating process, including all priors.

Simulating data from one data generating process and fitting with another can be a useful calibration procedure, but unlike SBC there are no guarantees on what behaviors to expect see. For more see

1 Like

Yes, of course. The trouble starts when this seems intractable.

@betanalpha Ah ha! I’m glad I was on the right track. I’ll cite your link.

@seantalts Regarding Figure 7 in the SBC paper, I still get confused reading the caption, “biased in the opposite direction relative to the prior distribution.” Can you give some example numbers so there is no risk of misunderstanding? Figure 7(b) has numbers along the x-axis. Is it correct to say that this histogram would result if the prior was centered at 5 and the data-averaged posterior was centered at 2? Why is this the most intuitive way to present the results? Wouldn’t it be easier to interpret if the comparison was reversed so the histogram showed the bias in the same direction instead of the opposite direction?

1 Like

My thinking here is that you’re always computing the histogram of the rank of the prior within the posterior. So your resulting ranks always indicate the placement of the prior relative to the posterior. So that histogram in 7b is showing that the prior is concentrating to the right of the posterior. I agree that that caption does not present that information in a way that is easy to learn.