Hello, Stan community,
My question is: in general, is it acceptable to exclude certain posterior draws that fall outside the interpretable range of a certain parameter in an attempt to condition the posterior distribution? For example, one may choose to model some strictly positive quantity using a model that allows it to take negative values as well, either for the sake of increased efficiency or flexibility (in my case, using a Gaussian Process). Is omitting any posterior draws that happen to take negative values from the posterior distribution a valid way to constrain the model? Is this the same as constraining the variable in the Stan model to only take on positive values since Stan would just reject any that didn’t meet that criterion?
Thanks in advance for any input/feedback!
I’d say that it is OK.
Consider an example that I have encountered in my own work: the aim is to model probability distributions that are consistent with a small number of observed quantiles. The model generates realisations of Gaussian processes that meet the required quantiles. But in addition to satisfying those constraints we want the realisation to be a valid probability distribution function.
One approach is to examine each posterior draw and check if it is a function that would be a valid probability distribution function. If it is, then accept it. If not, reject it.
There is nothing invalid about such an approach. It might be an inefficient way of modelling, depending on the frequency with which invalid posteriors are generated. In my own research, for example, I am developing ways of ensuring that the posteriors are all valid probability distribution functions.
- This would imply an effective prior with some weird properties. For example, if your “unrealistic” values are theta > 0, but for some reason you want to use an unconstrained prior normal(1, 2), your effective prior is the piecewise function
= \begin{cases}
&\text{if } \theta <= 0 \text{ then } 0\\
&\text{else } \alpha N(1, 2)
\end{cases}
where \alpha is a renormalization constant. That prior is funky, and can lead to some funky inferences
-
for the sake of increased efficiency or flexibility
When you run the projection on the posterior draws, you’re reducing your effective sample size by throwing away some of the data and increasing the MCMC error that propagates into your final inference. So, take that into account when you’re thinking about efficiency.
Using prior modeling and constraints seems like the right way to deal with this problem–indeed, it seems like the halfway point of bayesian computation to begin with. Can you elaborate on why you want to avoid it?
1 Like
-
In the case of constraining a Gaussian process (GP) to be positive, Stan sometimes fails because it rejects too many samples. One can get around this by first transforming the data, fitting using Stan, and then transforming back (e.g. by exponentiating so that all the values become positive). But in my experience, this performs worse in leave-one-out cross-validation than if I just let the model run unconstrained and neglect any posterior draws that contain negative values.
-
In some cases, I am not aware of any way to constrain the model so that it yields realistic results. For example, I am using a GP to model the drift function of a stochastic differential equation. I know that realistically the drift should have certain type of end behaviour. The GP however allows for all types of end behaviour even though only one is feasible.
Of course, the data drives the posterior mean of the drift function to have the correct end behaviour but many of the posterior draws do not have such nice properties. This becomes problematic when you want to calculate other quantities derived from the drift function e.g. the stationary density of the process. The stationary density calculated using the mean drift function will be reasonable but if I want to get a confidence interval for it using the posterior draws, I may get unrealistic results.
Have you come across any ways or have you any ideas about how to constrain a GP in such a way?
Seems like you could explicitly model the GP mean function. For example, the mean function could be a simple linear function \mu(x) = \alpha x and constraining \alpha < 0 would have it go to negative infinity in the limit