RNG for truncated distributions

So I have a truncated normal in my model. But how do I do a posterior predictive check with such a model? I know there are two ways to force a truncated distribution in the model block, either by parameter bounds or by explicit truncation wtih the T[a,b] modifier. But the explicit truncation approach does not work for the _rng functions. So the question is: how do I draw samples from truncated distributions using _rng functions in the generated quantities block?

Or is there a simple trick to transform draws from non-truncated distribution to a truncated one, similar to the desugaring of the T[a,b] syntax for sampling statements shown in the docs? The best I have so far is rejection sampling which luckily is not slowing my model down a lot, but it feels hacky.

Thanks for any ideas.

2 Likes

I would use brute force. You sample from the rng and if it violates your truncation, then you sample again until you are happy. Not nice, but should work.

2 Likes

Yes, that’s what I am doing (maybe I should’ve clarified this is what I mean by “rejection sampling”), I just wondered whether there is a better way.

If the PRNG uses the inverse CDF method of drawing randomly from a standard uniform and then transforming that by the inverse CDF of the desired distribution, then it is often relatively straightforward to draw uniformally from some subset of the [0,1] interval and transform that by the inverse CDF of the distribution in order to obtain a truncated draw.

3 Likes

But I guess Stan does not support that right now - is it correct?

There is no syntax for it in the Stan language, but you can draw from a uniform distribution and there are some inverse CDFs implemented.

For concreteness, I’ll give the example I worked on this morning. I want to draw from a left-truncated Weibull to impute right-censored event times. Let’s say that I want a Weibull(alpha, sigma) truncated so that my random draws are never below some lower boundary t.

Then I should:
(1) Figure out the quantile corresponding to lower cutoff t. Let’s say it’s 0.2, or the 20th percentile. Call that p.
(2) Draw from a continuous uniform(0.2, 1). Call that draw u.
(3) Apply the inverse transform for the cdf, or “percent point” function. I did not feel like doing algebra on a Monday morning so I found mine at https://www.wolframalpha.com/input/?i=inverse+cdf+weibull. This will return a draw from the correct truncated weibull.

(The unstated step 4 is to post on the Stan discourse to check my logic. So far the draws look good when I plot them, but I have not checked it much beyond that.)

For a double truncation, you would add a step to calculate the cdf at the upper truncation point and draw from a uniform(p1, p2) instead of uniform(p, 1).

functions {
  real tweibull_rng(real alpha, real sigma, real t) {
    real p;
    real u;
    real x;
    p = weibull_cdf(t, alpha, sigma);
    u = uniform_rng(p, 1);
    x = sigma * (-log1m(u))^(1/alpha);
    return x;
  }
}

9 Likes

Thanks, that’s exactly what I need to do and with a lot of detail.

This is very nice. I’m going to add this example to the manual after a bit of renaming and collapsing:

real weibull_lb_rng(real alpha, real sigma, real lb) {
    real p = weibull_cdf(lb, alpha, sigma);  // cdf for bounds
    real u = uniform_rng(p, 1);
    return sigma * (-log1m(u))^(1 / alpha);  // inverse cdf for value
  }

Here’s the issue comment

3 Likes

I’m working on the same problem (truncated normal in RNG block). The Weibull example makes sense, but would you be able to share your code for the truncated normal? I’d like to check my work.

1 Like

It’d be something like this for a normal RNG with truncation at a lower bound lb:

real normal_lb_rng(real mu, real sigma, real lb) {
    real p = normal_cdf(lb, mu, sigma);  // cdf for bounds
    real u = uniform_rng(p, 1);
    return (sigma * inv_Phi(u)) + mu;  // inverse cdf for value
}

The inv_Phi function gives the inverse of a Normal(0,1) cdf, and you can apply the scale parameter and location shift to get back to a Normal(mu, sigma).

7 Likes

Hi! Would a normal rng with double truncation be correct like this?:

real normal_lb_ub_rng(real mu, real sigma, real lb, real ub) {
    real p1 = normal_cdf(lb, mu, sigma);  // cdf with lower bound
    real p2 = normal_cdf(ub, mu, sigma);  // cdf with upper bound
    real u = uniform_rng(p1, p2);
    return (sigma * inv_Phi(u)) + mu;  // inverse cdf 
}

Thanks!

4 Likes

Works fine like this, never mind!

I started to work with this, and one problem that I notice is that the normal cdf overflows quite easily. Is there another alternative? Will there be truncated rngs built-in in stan in the near future?

Where does it overflow? My best guess is that you have overflows when you are trying to sample at the very tails of the distribution (as on normal(0,1) truncated between 7 and 9. If this is the case, then I think it is likely you can find a better model for your data than truncated normal (in this case a right-truncated exp might work just fine). If you absolutely need to work with truncation at far tails you might be able to get some leverage from moving to log scale with normal_lcdf and normal_lccdf but then you’ll have to figure out how to sample on the log-transformed scale, and how to compute inv_Phi(exp(u)) without the explicit exponentiation - this should be possible, but would require some math.

I’m trying to generate data for the LBA with a truncated normal distribution as the model of the drifts (https://github.com/stan-dev/stancon_talks/tree/master/2018-helsinki/Contributed-Talks/nicenboim). So I’m using a version with only lb=0. I know that the model is failing when mu<0 and mu/sigma < 8.25 (https://mc-stan.org/docs/2_18/functions-reference/normal-distribution.html). But I can’t avoid this.

I’ll try the second approach, I’m sure that someone out there in the internet did the math for me. In any case, it’ll be nice to have this built in in stan…

Just a note that I ran into the cdf overflow issue. But if mu and sigma are fixed, there is another solution to generating truncated normal RVs: specify something like

parameters {
vector[n_random_numbers]<lower = lb, upper = ub> dummy_parameter;
}

model {
dummy_parameter ~ normal(mu, sigma);
}

That needs explicit truncation if mu and / or sigma are unknown parameters. If they are both data, then it is ok.

1 Like

Because the truncated distribution does not integrate to 1 (or some other constant) if mu and / or sigma are not constant.

1 Like

Ah, of course; thanks! Will edit my post to avoid tripping people up.