Help specifying a triangular prior

adrian · August 26, 2017, 2:51am

Hi all,

I am trying to specify a prior probability distribution in R Stan.
For e.g. rv ~ triangular(mid=1, lower=0, upper=2).
Am I right that there is currently no triangular distribution within the API.

Therefore I wanted to ask if anyone has any solutions to modeling this in Stan.

Thanks.
Adrian

anon75146577 · August 26, 2017, 3:32am

I know nothing about the triangular distribution, but if you have the explicit form of the density, you can update the log-posterior using

target += [log density for triangular distribution goes here]

Although it does sound like the sort of thing where the density is not differentiable at the apex of a triangle, in which case an algorithm that uses derivatives (like NUTS) may give some weird results.

bgoodri · August 26, 2017, 4:08am

The density for the triangle distribution is discussed in section 22.1 of the Stan User Manual, which does not imply using it as a prior is a good idea.

adrian · August 28, 2017, 1:56pm

Thank you so much for the reference. I was looking through an older version of the user manual, which was why I hadn’t been able to find anything on a Triangle distribution.

Bob_Carpenter · August 29, 2017, 11:22pm

Stronger warning than Ben gave: it’s very hard to sample from and if you read our prior recommendations, they strongly advise against truncating unless absolutely necessary to enforce logical or physical constraints (like a probalby being between 0 and 1, or a scale being positive). Otherwise, you get both computational and statistical problems if there winds up being mass near the boundaries.

So the question is, why that prior?

adrian · August 30, 2017, 2:04am

Bob, thanks for the heads up. The reason I want to implement a triangular distribution is because it is a prior commonly used in my field of study. I believe the reason is because the constraints actually represents true physical constraints (e.g. fan efficiency).

Nonetheless, what I am trying to do in my study is to compare different priors (triangular vs uniform vs weakly informative (normal with wide s.d) vs informative (normal with small s.d.)) and see the impact on the output.

Since we are on this topic, I would like to ask if you have any intuition as to the difference between a triangle prior and a truncated normal prior.

Also, the issue of being hard to sample from, is that specific to HMC due to gradient computations as suggested by Daniel. I am asking because previous studies have used random-walk Metropolis instead of NUTS and I am considering if I should just remove the thought of using a triangle distribution because it is just not a good prior to use with NUTS.

Thank you in advance.

betanalpha · August 30, 2017, 9:34am

The triangle can be fit okay with both RWM and NUTS provided that they operate on the immediate constrained space and not on the transformed, unconstrained space that Stan typically uses. That said, both algorithms have to be carefully adjusted to work with the boundaries.

The main problem with the triangle is that is is not smooth, and smoothness is required for nearly all of the proofs demonstrating that RWM or HMC or the like work sufficiently well, even in simple problems like this. We’d recommend trying to approximate the triangle with something more smooth, such as a Beta(2, 2) transformed from (0, 1) to (-1, 1).

Bob_Carpenter · August 30, 2017, 8:00pm

The truncated normal’s going to be smoother, and it’s going to have non-zero density at the boundaries. In contrast, the rescaled Beta(a, a) Michael suggestions will also go to zero at the boundaries when a > 1. The way to code that is:

parameters {
  real<lower=0, upper=1> alpha_raw;
  ...
transformed parameters {
  real<lower=0, upper = 2> alpha = 2 * alpha_raw;
 ...
model {
  alpha_raw ~ beta(10, 10);  // or whatever number makes sense here
  ...

Is it popular becaue it’s simple or is there something about the discontinuity at the point somehow important?

What is the scale of (0, 2) that measures fan efficiency? And why can’t efficiency go below 0 or above 2?

adrian · August 31, 2017, 2:42am

Thank you Bob and Michael for the suggestions.

Sorry I might have misled you with my first post. I was actually trying to model it within the boundaries (0,1) not (0,2), which makes its specification even easier given that the Beta bounds it (0,1). Based on the replies, it does seem that I should try using the Beta to replace the triangle instead of a normal.

Last question, would truncating a normal distribution results in large number of samples being rejected since most of the samples would be outside the bounds and therefore more iterations need to be run.
For instance, a normal(0.8,0.5) prior with (0,1) bounds:

parameters {
  real<lower=0, upper=1> alpha;
  ...
model {
  alpha ~ normal(0.8, 0.5); 
  ...

and therefore something like a beta(2,0.5) might be more appropriate

In response to these questions:

Fan efficiency is the ratio between the power transferred to the air and the power used by the fan. Therefore it has a range of (0,1) since power cannot be less than 0 and the power transferred cannot be more than the maximum power of the fan.

I think it is popular because it bounds the distribution and it is easy to specify where the most likely value is going to be.

Thank you all so much for these insights.

Bob_Carpenter · August 31, 2017, 10:01am

What you’re describing is a rejection sampler, and no, that’s not what we do! That would be terribly inefficient. I do think our ~ notation is causing confusion yet again among our users, who seem to want to read it as “draw a random sample then do something with it (like rejection)”.

We transform a variable x constrained to (a, b), to y = a + (b - a) * inv_logit(x), so y is unconstrained. So if (a, b) == (0, 1), then y = inv_logit(x).

The main computational problem we run into with this is when mass piles up at the boundaries, because it forces y to plus or minus infinity.

This is a common confusion of density and probabilty mass. In general, posterior modes are not equal to posterior means and the region of highest probabilty mass isn’t necessary near either one. I wrote a case study to help people understand this rather subtle measure-theoretic concept with examples:

http://mc-stan.org/users/documentation/case-studies/curse-dims.html

In general, the mean, mode, and median of a distirbution don’t necessarily line up, and none of these is about regions of highest probability mass. For instance, with Beta(a, b) the mean is a / (a + b) whereas the mode is at a - 1 / (a + b - 2). Even with the traingle distribution Triangle(low, mid, high), the mean isn’t aligned with the mode.

adrian · September 6, 2017, 12:55am

Thank you Bob for the clarifications and for sharing the paper.

Majid_Mohammadi · September 17, 2021, 3:25pm

@Bob_Carpenter I looked into your solution for defining a uniform prior, which seems very smart, but is the formula correct? If x is in [a,b], then define y as a + (b-a) * inv_logit(x)? I guess y and x should be replaced since y lies in [a,b]. Am I right or I have missed sth?

jsocolar · September 19, 2021, 2:00am

You’re correct that there’s a typo in the formula in @Bob_Carpenter’s post. For details of the unconstraining transforms used in Stan, see here 10.4 Lower and upper bounded scalar | Stan Reference Manual

Note, however, that this transformation by itself does not yield a uniform prior on x! It requires an appropriate Jacobian adjustment to do so. And on the other hand, it’s more general, in the sense that it can yield any desired prior on x (that respects the constraint) simply by applying the Jacobian adjustment and then incrementing the target density by the appropriate function of x corresponding to the desired sampling statement.

Majid_Mohammadi · September 22, 2021, 11:29am

@jsocolar Thanks for the comment.

Can you elaborate on “not yielding a uniform distribution”? Do you mean that the posterior might be more concentrated on the intervals which are more likely?
It would be also very good if you could clarify the Jacobian adjustment as well.

Thanks!

jsocolar · September 22, 2021, 1:27pm

The following Stan program:

parameters {
  real y; // unconstrained
}
transformed parameters {
  real x = logit(y); // constrained to the interval [0, 1]
}
model {
  x ~ uniform(0,1);
}

Does not yield a uniform posterior for x. The reason is because it is missing a Jacobian adjustment for the nonlinear logit transform. For more on Jacobian adjustments, see here Jacobian adjustments explained simply.

When you do real<lower=0, upper=1> x; in Stan, Stan automatically samples on the unconstrained scale and automatically applies the relevant Jacobian adjustment to make everything work out.

Majid_Mohammadi · September 24, 2021, 2:24pm

Thanks a lot @jsocolar! That is indeed an important point

Topic		Replies	Views
Newbie question: Triangle custom distribution returning unexpected values for a and b parameters Modeling	7	632	August 16, 2017
Parameter bounds that depend on other parameters Modeling specification	9	364	August 21, 2024
Convergence problem with simple triangle distribution Modeling	1	347	March 16, 2023
Confusing sampling observations with challenging prior Modeling	4	85	September 22, 2024
Sampling from prior distributions of a hierarchical model Modeling specification	3	1384	April 13, 2021

Help specifying a triangular prior

Related topics