Understanding truncation in Stan

I’m trying to understand how truncation works in Stan.

I used to just define the truncation by setting a lower (or upper) value:

m1:

parameters {
    real<lower=0> sigma;
}
model {
    sigma ~ normal(0,1);
}

Then I started to use lpdf notation, and copying from brms I started to use the following notation:

m2:

parameters {
    real<lower=0> sigma;
}
model {
    target += normal_lpdf(sigma | 0, 1)  -
                  normal_lccdf(0 | 0, 1);
}

(I do understand why the ccdf, as explained here: https://mc-stan.org/docs/2_20/reference-manual/sampling-statements-section.html)

But then in that section of the manual, it says that truncation needs to be indicated with T. I had the impression that Stan could figure that just by looking at the limits, is m1 wrong? Should it be like the following m3 instead?

m3:

parameters {
    real<lower=0> sigma;
}
model {
    sigma ~ normal(0,1)T[0,];
}

But then why do I need to specify the lower boundary? And not just this:

m4:

parameters {
    real sigma;
}
model {
    sigma ~ normal(0,1)T[0,];
}

And if T[] is unnecessary in general (I don’t really understand when I need it), then is the following model m5 fine?

m5

parameters {
    real<lower=0> sigma;
}
model {
    target += normal_lpdf(sigma | 0, 100);
}

I have tried the 5 models, and I get roughly the same distribution for sigma (but different lps), and some divergent transitions for 4. But I’d like to understand what is Stan’s sampler really doing in each case. (I do know the difference between _lpdf and ~ versions is whether they keep all of the constant normalizing terms or not.)

1 Like

Without the T notation, Stan doesn’t do the cdf adjustment. The bounds on the parameters will only be used to reparameterize the sampler (which is separate).

The thing to keep in mind is that cause we sample from a distribution proportional to the thing we want, we can drop terms from target (the log of a distribution proportional to the thing we want) without worrying.

In all the models above, the parameters of the distribution are constant, so the truncation adjustment is constant, so it can be dropped.

If you do something like this with and without the T, you should see a difference because now the truncation adjustment is a function of the parameter:

parameters {
  real<lower = 0.0> a;
}
model {
  1 ~ normal(a, 1) T[0,];
}
1 Like

Aha, the T[0,] is equivalent to - normal_lccdf(0 | 0, 1), right?
And in all my cases they were constants, so it only matters if I need them (as in bridge sampling).

It may be worthwhile to explain in the manual, when the T[] or - *_cdf is needed…

But thanks, now it’s clear to me!

y ~ normal(0, 1) T[0, ]

is equivalent to

target += normal_lpdf(y | 0, 1) - normal_lccdf(0.0 | 0, 1);

Gotta have all the numbers there or it’s ambiguous :D

3 Likes