Hi everybody,
I’m trying to understand what exactly does Stan when it has to deal with constrained parameters.
I read the Stan Reference Manuel but things are steel a little bit obscure for me.
I understand how and why constrained parameters have to be transformed to unconstrained parameters (thanks to chapter 10: “constraint transforms”) but I have difficulties to imagine the Hamiltonian Monte Carlo dealing with constrained and unconstrained parameters forms at the same time.
Intuitively, it will be easier to unconstrain the constrained parameters and then, to use the unconstrained forms in the HMC until the end of the sampling process. And at the end, to apply the inverse transformation to have back the desired constrained parameters…. However, Chapter 8 “Program blocks” states that during the sampling process parameters are constrained and unconstrained one a regular basis …
Could anyone explain to me a little more about this topic or advise me other references?
Thank you in advance,
Deramaka
From the perspective of HMC algorithms, including the version of NUTS currently used in Stan, only the unconstrained space matters. From the perspective of Bayesian inference, the unconstrained space does not matter and the constrained parameters (that you declare in the parameters block) or functions thereof (that you declare in the transformed parameters or generated quantities blocks) might matter. So when using HMC algorithms to do Bayesian inference, you have to think about both the unconstrained and the constrained space. Fortunately, the transformation from one to the other is mostly bijective and Stan does this for you, so you don’t have to think much about the details of the transformations (although they are listed in the documentation). Basically, what you write in the model block of a Stan program defines a (log) density function over what you declare in the parameters block given what you declare in the data and transformed data blocks. And then the transformations from the unconstrained space to the constrained space alter that density function to obtain a (log) density function for the unconstrained parameters conditional on the data and transformed data that is easier to sample from for a wide class of models.
Thanks a lot bgoodri, it’s help a lot !! :))
In Stan, the HMC algorithm (and optimization and ADVI algorithms) work on the unconstrained space, whereas the model block works in the constrained space.
To make that possible, suppose we declare
parameters {
real<lower = 0> sigma;
}
What happens under the hood is that the sampler works on \log \sigma, which is unconstrained. It transforms the unconstrained value \log \sigma to \sigma = \exp(\log \sigma) and applies the Jacobian correction to the target log density behind the scenes. The model then works on \sigma, but the connection is kept to \log \sigma through the transform.
The JStatSoft paper on Stan has more details, and there are complete details in the reference manual, including all the transforms and Jacobians.
Thanks a lot Bob_Carpenter ;) It has helped as well ;)