Proposal: including a "canary" variable to illustrate poor exploration of the posterior

This idea was borne out of some work I’ve been doing on a fairly complex model with sampling issues (in PyMC3, although I normally work in Stan).

As it just so happens, this model has a parameter that is not impacted at all by the data (it has a prior distribution, and it interacts with some other parameters to create some deterministic outputs (the equivalent of generated quantities)

When looking at the densities of the sampled parameters, it is immediately obvious when the sampler took exceedingly small steps and didn’t really explore the posterior at all; compared with the known distribution of this parameter, the “posterior” from the sampler hardly covers any of the density. (The reason I mentioned this was in PyMC3 above is that Stan may have some smarter way to have different step sizes for different parameters)

I know this sampling issue is probably well-covered already by existing errors and diagnostics, but it really drove home to me the issue of the step size when I saw this.

It made me think that one could include a “canary” variable in the sampler with a known distribution unimpacted by the data, and one could use the posterior of this canary to at least rule out the issue of a very small step size.

This is trivial to do for a user, but I thought it may be interesting to include more generally. It could lead to some very straightforward visuals and diagnostics that may be easier to understand.


@bbbales2 @avehtari and @betanalpha

are u saying I should include a std normal in any of my Stan programs which is not tied to any other part of the data&model and then check if this variable std normal? Is that the idea here?

Yeah, that’s right. If it doesn’t come back std normal, then you know something went very wrong

I see the idea, but won’t you already know something has gone wrong just from the rhats and Neffs of the other variables?

so we could sample a std normal in the parameters block and sample one more in the generated quantities. Then we also munge these two together in a number of ways in the generated quants and then we get all sorts of diagnostics. Like that?

I wonder why the usual rhats and neffs don’t do already as @bbbales2 suggests…

I think the empirical question is whether this canary diagnostic would be more sensitive than rhat and the others already available.


they may! at the time i was working with PyMC3 and wasn’t getting the kind of reporting I was used to. But seeing this visually was really helpful for me.



Adding a new variable isn’t free – if the rest of the model isn’t scaled to have variation around 1 then introducing an independent variable with a normal(0, 1) density can for example stress adaptation. Indeed adding an independent variable can even render the “canary” problem more pathological and hence make it hard to identify the pathologies inherent to the original model.

The real power of Hamiltonian Monte Carlo comes from its sensitive diagnostics like divergences and the E-FMI. Combined with the multimodality sensitive of \hat{R} we should be able to identify all of the pathologies that we have so far encountered. While we can’t prove that every pathology will manifest in failures of at least one of these diagnostics empirically they have been sufficient. Exceptions would very much be exciting from a research perspective so if you encounter any please do share!

Ultimately we can’t add new diagnostic procedures that don’t offer additional sensitivity without considering their cost – the unfortunate addition of the very expensive and unavoidable bulk and tail ESS checks into RStan provides a clear demonstration.

1 Like

Anything I can read on this (specifically why it’s unfortunate) anywhere?

1 Like

I’ll just repeat that the additional diagnostics are very expensive (this is why RStan 2.19+ takes forever to return a fit object even after the Markov chains have finished) and there have been no demonstrations of failures not caught by other diagnostics.

1 Like

Ah, that makes sense. I was worried that there were other kinds of costs than “mere” time (which I agree is important). Thanks for clarifying!