Truncated data, log probability increment

Loni92 · August 2, 2024, 3:51pm

Hi all,

I have truncated data (lower bound = 2) and therefore want to use a truncated normal likelihood in my model. I only found out how to do this if you use the sampling notation (y ~ normal(mu, sigma)): Truncated or Censored Data. However, I am using the log probability incrementation expression (target += normal_lpdf(y | mu, sigma)) in my model. Does anyone know how to use a truncated normal in this case?

Thanks!

Garren_Hermanus · August 2, 2024, 5:08pm

You can just define y as a lower bounded data type and continue to use normal_lpdf(y|mu,sigma). The difference in the log posterior would be up until an additive constant (based on the truncation) and hence would not effect sampling.

ssp3nc3r · August 2, 2024, 11:19pm

The equivalent notation for your example but using the target notation, you would divide by the permissible portion of the distribution. So for your example, normal_lcdf(2 | mu, sigma) would be the distribution up to 2 (negative values), which you don’t want, so divide by the complement (subtract on the log scale), something like so:

target += normal_lpdf(y | mu, sigma) - normal_lccdf(2 | mu, sigma);

Edit: fixed the cutoff to 2 as Bob corrected below. Need to read closer. And as Garren explains, you won’t find a difference if mu and sigma are not parameters, but will need this if either are, as Bob points out below.

Garren_Hermanus · August 2, 2024, 11:38pm

This only introduces additional overheads in stan with no benefits obtained by adding this. That is we will have for the truncated prior

P(y)=\frac{\mathcal{N}(y|\mu,\sigma^2)}{\int_{2}^{\infty}{\mathcal{N}(y|\mu,\sigma^2) dy}}=\frac{\mathcal{N}(y|\mu,\sigma^2)}{C}; y>2

Where we define C=\int_{2}^{\infty}{\mathcal{N}(y|\mu,\sigma^2) dy} since it is finite. Here C can be grouped together with the unknown normalization constant. Hence the truncated normal is equivalent to the normal distribution with support for y>2. The only time when we would include the truncation is if the upper or lower bound of the truncation is dependent on other variables, not when these are constant.

EDIT: @maxbiostat edited the LaTeX

Loni92 · August 5, 2024, 8:48am

Thanks everyone for your input!

Bob_Carpenter · August 5, 2024, 9:04pm

Thanks @Garren_Hermanus. This is true if mu and sigma are constants, because then normal_lccdf is also a constant. If either of mu or sigma are parameters, you need to also include the truncation adjustment:

target += -normal_lccdf(2 | mu, sigma);

This is close to what @ssp3nc3r wrote, but uses 2 as the lower bound as requested by @Loni92.

Just be sure to also declare y with <lower=2> for error checking.

I’m unmarking the solution until @Loni92 clarifies whether mu and sigma are data.

Loni92 · August 6, 2024, 9:28am

Hi and thanks!

sigma and mu are indeed parameters, so I’m going to add this term to my model. I’m not sure I understand the word “data” in your last sentence though.

(Also not sure which post to mark as the solution now.)

Bob_Carpenter · August 13, 2024, 7:27pm

By “data” I mean something whose value is known and which is declared in the data (or transformed data) block of a Stan program. If either of mu or sigma are parameters, then you need the explicit truncation adjustment.

Loni92 · August 14, 2024, 11:35am

Ok, thanks for the clarification! Both mu and sigma are parameters.

Topic		Replies	Views
Help with syntax for truncated distributions Modeling	1	78	May 5, 2025
Understanding truncation in Stan Modeling specification	3	1320	September 2, 2019
Reparameterizing a Truncated Normal Modeling	16	3726	July 19, 2019
How to specify likelihood when you don't have proper data points (but have bounds instead) Modeling specification	3	506	May 13, 2019
Example in computing 1D integral: My misunderstanding, or a typo? General	1	54	October 21, 2024

Truncated data, log probability increment

Related topics