Truncated data, log probability increment

Hi all,

I have truncated data (lower bound = 2) and therefore want to use a truncated normal likelihood in my model. I only found out how to do this if you use the sampling notation (y ~ normal(mu, sigma)): Truncated or Censored Data. However, I am using the log probability incrementation expression (target += normal_lpdf(y | mu, sigma)) in my model. Does anyone know how to use a truncated normal in this case?

Thanks!

You can just define y as a lower bounded data type and continue to use normal_lpdf(y|mu,sigma). The difference in the log posterior would be up until an additive constant (based on the truncation) and hence would not effect sampling.

1 Like

The equivalent notation for your example but using the target notation, you would divide by the permissible portion of the distribution. So for your example, normal_lcdf(2 | mu, sigma) would be the distribution up to 2 (negative values), which you don’t want, so divide by the complement (subtract on the log scale), something like so:

target += normal_lpdf(y | mu, sigma) - normal_lccdf(2 | mu, sigma);

Edit: fixed the cutoff to 2 as Bob corrected below. Need to read closer. And as Garren explains, you won’t find a difference if mu and sigma are not parameters, but will need this if either are, as Bob points out below.

1 Like

This only introduces additional overheads in stan with no benefits obtained by adding this. That is we will have for the truncated prior

P(y)=\frac{\mathcal{N}(y|\mu,\sigma^2)}{\int_{2}^{\infty}{\mathcal{N}(y|\mu,\sigma^2) dy}}=\frac{\mathcal{N}(y|\mu,\sigma^2)}{C}; y>2

Where we define C=\int_{2}^{\infty}{\mathcal{N}(y|\mu,\sigma^2) dy} since it is finite. Here C can be grouped together with the unknown normalization constant. Hence the truncated normal is equivalent to the normal distribution with support for y>2. The only time when we would include the truncation is if the upper or lower bound of the truncation is dependent on other variables, not when these are constant.

EDIT: @maxbiostat edited the LaTeX

1 Like

Thanks everyone for your input!

Thanks @Garren_Hermanus. This is true if mu and sigma are constants, because then normal_lccdf is also a constant. If either of mu or sigma are parameters, you need to also include the truncation adjustment:

target += -normal_lccdf(2 | mu, sigma);

This is close to what @ssp3nc3r wrote, but uses 2 as the lower bound as requested by @Loni92.

Just be sure to also declare y with <lower=2> for error checking.

I’m unmarking the solution until @Loni92 clarifies whether mu and sigma are data.

3 Likes

Hi and thanks!

sigma and mu are indeed parameters, so I’m going to add this term to my model. I’m not sure I understand the word “data” in your last sentence though.

(Also not sure which post to mark as the solution now.)

By “data” I mean something whose value is known and which is declared in the data (or transformed data) block of a Stan program. If either of mu or sigma are parameters, then you need the explicit truncation adjustment.

Ok, thanks for the clarification! Both mu and sigma are parameters.