Help specifying a triangular prior

Hi all,

I am trying to specify a prior probability distribution in R Stan.
For e.g. rv ~ triangular(mid=1, lower=0, upper=2).
Am I right that there is currently no triangular distribution within the API.

Therefore I wanted to ask if anyone has any solutions to modeling this in Stan.


I know nothing about the triangular distribution, but if you have the explicit form of the density, you can update the log-posterior using

target += [log density for triangular distribution goes here]

Although it does sound like the sort of thing where the density is not differentiable at the apex of a triangle, in which case an algorithm that uses derivatives (like NUTS) may give some weird results.

The density for the triangle distribution is discussed in section 22.1 of the Stan User Manual, which does not imply using it as a prior is a good idea.

Thank you so much for the reference. I was looking through an older version of the user manual, which was why I hadn’t been able to find anything on a Triangle distribution.

Stronger warning than Ben gave: it’s very hard to sample from and if you read our prior recommendations, they strongly advise against truncating unless absolutely necessary to enforce logical or physical constraints (like a probalby being between 0 and 1, or a scale being positive). Otherwise, you get both computational and statistical problems if there winds up being mass near the boundaries.

So the question is, why that prior?

Bob, thanks for the heads up. The reason I want to implement a triangular distribution is because it is a prior commonly used in my field of study. I believe the reason is because the constraints actually represents true physical constraints (e.g. fan efficiency).

Nonetheless, what I am trying to do in my study is to compare different priors (triangular vs uniform vs weakly informative (normal with wide s.d) vs informative (normal with small s.d.)) and see the impact on the output.

Since we are on this topic, I would like to ask if you have any intuition as to the difference between a triangle prior and a truncated normal prior.

Also, the issue of being hard to sample from, is that specific to HMC due to gradient computations as suggested by Daniel. I am asking because previous studies have used random-walk Metropolis instead of NUTS and I am considering if I should just remove the thought of using a triangle distribution because it is just not a good prior to use with NUTS.

Thank you in advance.

The triangle can be fit okay with both RWM and NUTS provided that they operate on the immediate constrained space and not on the transformed, unconstrained space that Stan typically uses. That said, both algorithms have to be carefully adjusted to work with the boundaries.

The main problem with the triangle is that is is not smooth, and smoothness is required for nearly all of the proofs demonstrating that RWM or HMC or the like work sufficiently well, even in simple problems like this. We’d recommend trying to approximate the triangle with something more smooth, such as a Beta(2, 2) transformed from (0, 1) to (-1, 1).

The truncated normal’s going to be smoother, and it’s going to have non-zero density at the boundaries. In contrast, the rescaled Beta(a, a) Michael suggestions will also go to zero at the boundaries when a > 1. The way to code that is:

parameters {
  real<lower=0, upper=1> alpha_raw;
transformed parameters {
  real<lower=0, upper = 2> alpha = 2 * alpha_raw;
model {
  alpha_raw ~ beta(10, 10);  // or whatever number makes sense here

Is it popular becaue it’s simple or is there something about the discontinuity at the point somehow important?

What is the scale of (0, 2) that measures fan efficiency? And why can’t efficiency go below 0 or above 2?

Thank you Bob and Michael for the suggestions.

Sorry I might have misled you with my first post. I was actually trying to model it within the boundaries (0,1) not (0,2), which makes its specification even easier given that the Beta bounds it (0,1). Based on the replies, it does seem that I should try using the Beta to replace the triangle instead of a normal.

Last question, would truncating a normal distribution results in large number of samples being rejected since most of the samples would be outside the bounds and therefore more iterations need to be run.
For instance, a normal(0.8,0.5) prior with (0,1) bounds:

parameters {
  real<lower=0, upper=1> alpha;
model {
  alpha ~ normal(0.8, 0.5); 

and therefore something like a beta(2,0.5) might be more appropriate

In response to these questions:

Fan efficiency is the ratio between the power transferred to the air and the power used by the fan. Therefore it has a range of (0,1) since power cannot be less than 0 and the power transferred cannot be more than the maximum power of the fan.

I think it is popular because it bounds the distribution and it is easy to specify where the most likely value is going to be.

Thank you all so much for these insights.

What you’re describing is a rejection sampler, and no, that’s not what we do! That would be terribly inefficient. I do think our ~ notation is causing confusion yet again among our users, who seem to want to read it as “draw a random sample then do something with it (like rejection)”.

We transform a variable x constrained to (a, b), to y = a + (b - a) * inv_logit(x), so y is unconstrained. So if (a, b) == (0, 1), then y = inv_logit(x).

The main computational problem we run into with this is when mass piles up at the boundaries, because it forces y to plus or minus infinity.

This is a common confusion of density and probabilty mass. In general, posterior modes are not equal to posterior means and the region of highest probabilty mass isn’t necessary near either one. I wrote a case study to help people understand this rather subtle measure-theoretic concept with examples:

In general, the mean, mode, and median of a distirbution don’t necessarily line up, and none of these is about regions of highest probability mass. For instance, with Beta(a, b) the mean is a / (a + b) whereas the mode is at a - 1 / (a + b - 2). Even with the traingle distribution Triangle(low, mid, high), the mean isn’t aligned with the mode.

Thank you Bob for the clarifications and for sharing the paper.

@Bob_Carpenter I looked into your solution for defining a uniform prior, which seems very smart, but is the formula correct? If x is in [a,b], then define y as a + (b-a) * inv_logit(x)? I guess y and x should be replaced since y lies in [a,b]. Am I right or I have missed sth?

You’re correct that there’s a typo in the formula in @Bob_Carpenter’s post. For details of the unconstraining transforms used in Stan, see here 10.4 Lower and upper bounded scalar | Stan Reference Manual

Note, however, that this transformation by itself does not yield a uniform prior on x! It requires an appropriate Jacobian adjustment to do so. And on the other hand, it’s more general, in the sense that it can yield any desired prior on x (that respects the constraint) simply by applying the Jacobian adjustment and then incrementing the target density by the appropriate function of x corresponding to the desired sampling statement.

@jsocolar Thanks for the comment.

Can you elaborate on “not yielding a uniform distribution”? Do you mean that the posterior might be more concentrated on the intervals which are more likely?
It would be also very good if you could clarify the Jacobian adjustment as well.


The following Stan program:

parameters {
  real y; // unconstrained
transformed parameters {
  real x = logit(y); // constrained to the interval [0, 1]
model {
  x ~ uniform(0,1);

Does not yield a uniform posterior for x. The reason is because it is missing a Jacobian adjustment for the nonlinear logit transform. For more on Jacobian adjustments, see here Jacobian adjustments explained simply.

When you do real<lower=0, upper=1> x; in Stan, Stan automatically samples on the unconstrained scale and automatically applies the relevant Jacobian adjustment to make everything work out.


Thanks a lot @jsocolar! That is indeed an important point