Questions and comments about SBC

Hi guys,

I have some questions / comments about the interesting SBC algorithm(s) / paper:

  1. I presume the random variable f is usually chosen to be the identity map, done for each parameter separately?
  2. When we have many parameters, I am thinking that it might be interesting to consider mappings from the joint parameter space to the reals( e.g. the sum of k'th powers of the parameters). This is motivated by the observation that a random variable f potentially can have a completely different “correlation structure” as each single parameter, meaning that it can also have a very different N_{\rm eff}. Might this be a way to get an improved SBC algorithm in the autocorrelated setting? As a side remark: In the statistical physics community we have observed drastic improvements in N_{eff} for high-dimensional parameter settings when considering these moments with k\geq 2 as compared to the N_{eff}'s of the parameters separately.
  3. Would one gain additional insights when applying SBC to the posterior, that is instead of sampling \theta from the prior, one actually samples it (after inference is completed) from a posterior? In other words would this potentially allow one to identify further calibration problems? Or is simply redundant and would not yield any additional calibration check?

Yep, or any other function from parameter space to the reals that maps more closely to the quantities you are actually interested in.

I haven’t tried out your 2nd point, but I would be inclined to use functions that line up closely with what you care about rather than ones chosen for their autocorrelatory characteristics without problem-domain considerations.

I’m not sure I understand the 3rd question - we compare samples of the parameters \theta from the prior to samples of \theta from the posterior. Are you saying to go and do a 2nd round of fits where your initial \theta are drawn from the 1st round of fits? I’m not sure what this would buy you - seems like it could either amplify problems or hide them depending on the model/algorithm pair.

Yes! We are looking into this at the moment and hopefully we will have an answer for you (and code in rstanarm and rstantools) soon. Maybe even a vignette!

I suspect that it would be enough to do random linear combinations, but I’m not completely sure. The key thing is that it’s cheap to do more linear combinations relative to the cost of computing the posteriors.

We’re thinking about the autocorrelation stuff, but it’s not completely clear how to do it. We need to look at the statistical properties of ranks of correlated samples, which I supsect will be hard…

1 Like

No. Take your algorithm 1 of the paper and replace \pi in the first step by the posterior you have obtained through your previous inferences on the real dataset and not the simulated one. So this is different to your possibility.

My intuition is that this would move the calibration closer to “relevant” parameters and hence might allow you to identify further calibration issues of your algorithm in this area of the parameter space.

So you a have a discrete time Markov chain on a discrete and finite state space, i.e. the segment \{0,1,\dots,L\}. Any chance it could be reversible (at least for HMC based inference)? This would open a lot of possibilities, that is spectral methods for such chains…

If it was a chain such that the rank can only change by at most one, it would be a birth and death chain and hence reversible.

Or did I get this wrong?

Gotcha. I think to do that you’d essentially need to update your priors in both the generating and the fitting models. It could be worth doing as a secondary check just to really drill into that particular area of parameter space – this would be useful to the extent that your original priors did not generate fake data covering the area suggested by your real data + model, but it relies on the original fit procedure’s computational faithfulness.

Why? It’s just probing a different, hopefully more “relevant”, region. Why would it require more assumptions about the inference method used as in the original SBC?

Sure - it’s just SBC with different priors specifying a different region. I’m saying the region’s relevance depends on the method used to choose that region. If that method is good at choosing regions you care about, running SBC on just that area could give you further confidence. To say that that new region is “more important” than the old region is to say that you believe your model usefully fits your data and that you already believe your inference algorithm is capable of faithfully producing the posterior for that model and data. That’s all - I don’t think I’m saying anything you don’t already know here, sorry if it’s coming across unclearly.

No worries, your answers helped me. Thank you. If I find some time, I want to check various Bayesian neural network approximations, like Bayes by Backprop. Would be interesting to see what I can learn from the SBC rank histogram plots in this case. However in this case there are many parameters (weights and biases) so I likely want to have an automized uniformity test or try using a “global” random variable f, maybe the sum of powers I mentioned or @anon75146577 linear combination suggestion.

1 Like

@anon75146577, just following up on this again: Would you consider a random convex combination of all parameters, a (random) convex combination of a random subset of parameters or any random linear combination? In general do you have any literature pointers which discuss this use of random linear combination to reduce autocorrelations?

The intuition is that a gaussian (multivariate or process) is specified uniquely by its one-dimensional projections, so this should give some idea of at least correlation between parameters in a non-gaussian setting.