Specifically I’m wondering about the use of ‘validated’. Also if anyone has time I’m curious how someone determines whether they are using a ‘valid’ R package generally?

Sorry about the general terms here.

Specifically I’m wondering about the use of ‘validated’. Also if anyone has time I’m curious how someone determines whether they are using a ‘valid’ R package generally?

Sorry about the general terms here.

Difference in variance Gibbs (R,C++) vs HMC (Stan) for simple normal gamma model

It is perhaps fair to say that Stan consists of a library of validated inference algorithms and ADVI. If you search the discourse forums for the name Cook, you will find a few threads about how we go about validating MCMC (although the technique is being rebranded SBAC or something and there were some mistakes in the original paper about it).

The L-BFGS-B, Newton, etc. optimization algorithms are self-validating in a manner of speaking, in the sense that you can check that the gradient is sufficiently close to zero at the optimum. But all those optimization algorithms can give you is a local optimum and curvature at the local optimum, which are not generally sufficient to do statistical inference with.

A couple of people are working on diagnostics for ADVI, but I think it will always be the case that you don’t know how close the variational approximation is to the posterior distribution unless you have already obtained a bunch of draws from the posterior distribution using NUTS (and in that situation why would you bother approximating). And it seems to be the case that if you take the ADVI draws and apply the technique formerly known as Cook-Gelman-Rubin, it will get flagged as invalid.

Reference for SBAC validation?

Not validated in the code validated sense. We do a lot of testing, but your inference is also going to depend on your data and model. Just using Stan (or even RStanArm) isn’t enough to make sure things work.

We do try to set things up so you get diagnostic error messages when things go wrong we can detect (which is most of the time—we can detect most problems).