Possible catches in interfacing Stan with black-box software?
Suppose for instance that someone has a model expressed as some black-box software, e.g. some numerical Fortran code or proprietary binary or whatnot. Now I know that Stan can be used with external C++ code, and in principle that code could call this outside software via
exec, etc. However, from what I’ve understood, Stan also needs gradients, and that might not play so nicely with calling outside code this way. I’ve seen this issue come up in a post on the old stan-users Google group, but that post dated from 2015, and I wasn’t sure if the information there had become dated over the last couple years.
Would there be a way to make calling this black box work, or would one be better off looking at different MCMC implementations for this use case?
(The main reason I’m asking is that I’m doing some cataloging of the capabilities and limitations of various UQ software, including MCMC codes, and I want to make sure that if I’m saying that Stan isn’t good for a particular application that I’m not blowing smoke.)
In principle there’s no reason you can’t use the black-box software to calculate a log-density for Stan but you would also need a black box that gave you the gradient of the log density with regards to the parameters. If you don’t have that second black box you can’t use gradient-based algorithms. This happens to be all (most?) of Stan’s algorithms.
You need to implement a C++ interface to the external function and it needs the value and the full Jacobian.
MCMC algorithms without gradients, like Gibbs, Metropols-Hastings and ensemble methods, scale poorly with dimension and often fail in high-dimensional problems. So you can run that black box, but I wouldn’t trust you’re exploring the posterior unless it’s very simple. I’d want to generate fake data (hopefully the black box will support that) according to the priors and make sure you can recover the right posteriors (e.g., using the Cook-Gelman-Rubin procedure).
Stan also doesn’t do discrete parameters (without marginalizing). The problem is that whenever they lead to combinatorially intractable posteriors, other software tends not to do well fitting them either. See the Stan manual chapter on latent discrete parameters and why sampling-based methods are inefficient and can’t calculate tail probabilities. So you should be marginalizing no matter what sampling method you use.