I would like to generate and infer values coming from a gamma distribution that need to stay within a certain range, say, between 50 and 200 (need not be exact, just rough), which is what makes sense for my domain. I came up with a way of doing it where I construct it using uniform distributions and then build the gamma distribution from there. However, I don’t think that this is the most principled way of doing it for SBC.
Stan lets you constrain variables to intervals. So if all you want to do is replace the gamma with a truncated gamma, it’s as simple as changing this line:
values ~ gamma(alpha, beta) T[50, 200];
You can use this to estimate alpha and beta and also to generate posterior predictive draws in the following way:
You might want to reparameterize the gamma distribution into something friendlier like a mean and scale for fitting and use zero-avoiding priors. There’s a discussion in BDA3 in chapter 5 around beta distributions that shows why not doing this can be problematic.
If we had quantile functions for all our distributions, that’d be a better way of generating y_new, but what I have used above should work.
Having said all this, we generally discourage people from trying to impose hard interval constraints as they often have counterintuitive behavior, especially if intuitions are built up around the MLE or constraints that are very wide. If the data is consistent with boundary values, what will happen is that mass will pile up near one or both boundaries and you’ll get a different answer than had you not constrained which is sensitive to where you put the boundaries.
Thank you! I am just not sure what the “best practice” is on this. My data runs fine with the model, but I am trying to generate priors and simulated data that is somewhat consistent with my data, which is hard to do with generating alpha and beta by themselves (or even reparameterizing with mean and scale as you mention, I just didn’t do it above for simplicity) as they can generate values in the thousands. Ultimately I want to do SBC and make sure things are ok. Do people just use truncated gammas in these kinds of situations?
If you really need values to stay in a fixed range, you have two options. One, make everything so low variance that the chance of falling outside the range will be so small as to never come up. Or two, add truncation.
I would again stress that unless there are physical constraints on the system, it’s probably going to work out better for just about everything from the statistical description to the sampling to use a distribution that can parameterized to roughly stick to that range.
I guess the real question I should have led with is why do you want to put these constraints on a gamma distribution? And why a gamma (vs. any of the other positive-constrained continuous distributions)?
The reason is that the actual data in the natural world more or less stays within those constraints, 50-200, more or less, it can go above some but just not in the thousands and never zero. It is a gamma as that is the actual shape that the data takes, a distribution that is skewed to the left a little bit such as this.
(the fitted line corresponds to a gamma distribution). Believe me, I know this would be so much easier with a normal… Do you see an alternative to a gamma distribution? I don’t need things to be strictly bound to 200, just for the tail to be towards that
I still don’t understand why you want to impose hard interval constraints. Isn’t that gamma distribution you plotted going to do this in the same way? It doesn’t sound like the [50, 200] interval is a logical constraint, but just a soft constraint.
If the gamma distribution isn’t right, why use a gamma? The other obvious candidates are inverse gamma, lognormal, and Pareto—they’re all continuous distributions on positive values.
To go back to your original question, with SBC, you want to define the generative process for the parameters and data and use that. The uniform and division thing is going to place alpha in [(60/40)^2, (100/20)^2] = [2.25, 25] and beta in [60/40^2, 100/20^2] = [0.0325, 0.25].
I think you’ll find it easier to first convert to a mean/variance parameterization of the gamma distribution.
I’ll just note that if you get some proportion of unrealistic datasets for SBC, then the best thing to do is just to ignore the datasets that are unrealistic and generate new (effectively truncating your prior) - see Rejection sampling in simulations • SBC
If on the other hand your prior is too vague and vast majority of your datasets is unrealistic, you may need to tighten the prior a bit.
If doing the prior by hand is hard (it often is), you can also do “posterior SBC” - pick a small dataset to represent your extra prior information (in your case e.g. 2-4 values in the target range), fit the model with vague priors to this small dataset and then use the posterior draws as your prior for SBC. You then add the artificial dataset to the one generated from the posterior draws.