Reparameterizing a Truncated Normal

I’m working with pipeline crack depths measured by an inspection tool. The tool error is normally distributed with a nonzero bias, and it cannot report a value <= 0 since that doesn’t make sense. As such, the tool measurements follow a truncated normal distribution.

The minimal example below (written in R) attempts to recover the bias and SD of the error distribution. It works reasonably well for 500 cracks, but for 15 cracks, it complains about divergent transitions after warmup. I’ve reparameterized other models following the Neal’s Funnel example before, but am not sure how to go about it here.

Interestingly, the same model without the T[-trueVal[i],] truncation does not complain about divergent transitions, but it overestimates the bias and underestimates the SD as expected.

Attached is the accompanying pairs plot, which shows there is a problem, but again I’m not sure how to solve it.

Any help is appreciated.


rm(list = ls())


measBias <- 0.2
measSD <- 1.2
N <- 15

modelStr <- "
  data { 
      int<lower=0> N; 
      real measVal[N];
      real trueVal[N];
    parameters {
      real measBias;
      real<lower=0> measSD;
    model {
      real error[N];
      for (i in 1:N) {
        error[i] = measVal[i] - trueVal[i];
        error[i] ~ normal(measBias, measSD) T[-trueVal[i],];

stanModel <- stan_model(model_code = modelStr)

trueVal <- qlnorm(p = (1:N)/(N+1), meanlog = -0.112, sdlog = 0.472)  # equivalent to mean = 1, SD = 0.5
measVal <- rtruncnorm(n = 1, a = 0, mean = trueVal + measBias, sd = measSD)

stanData <- list(N = N, trueVal = trueVal, measVal = measVal)

result <- sampling(stanModel, data = stanData, iter = 500,
                        chains = 1, cores = 1)

I think that the problem might actually be (at least partly) due to the fact that you don’t have a prior on measBias and measSD - which is likely to manifest only when you have too little data. Try figuring out a weakly informative prior (as in

Also note that you can move the computation of error to the transformed data block to save computing time.

This doesn’t make sense in Stan.

I would code it like this:

 data {
          int<lower=0> N; 
          vector[N] truevalue;
          vector[N] measVal;
    parameters {
      real measBias;
      real<lower=0> measSD;
    model {
       bias ~ normal(0, 10); // something big
       meaSd ~ cauchy(0, 10);
       truevalue ~ normal(measVal + bias, meaSd);
1 Like

Thanks for the suggestion. I will give it a shot and report back.

Unfortunately this wouldn’t work for my situation because (1) it doesn’t account for the truncation and (2) each measurement has its own lower bound truncation value. The only way I know how to model this is with the for loop. I’m definitely open to other ways of doing this.

You are right - AFAIK the truncated distributions cannot be vectorized in the current version of Stan.

Is using a truncated normal distribution without constrain a problem?

1 Like

Not sure what you mean - care to elaborate? AFAIK your code sample uses the untruncated normal distribution - unless I am missing something about the way Stan works…

Normally, when you truncate a distribution

a ~ normal(0, 1)T[-4,]

you also have to define it a as:

real<lower=-4> a;

And you could also omit the T[-4,].
In your case, I don’t know how to constrain a vector with each Element separately.

I have an idea, you could create a add the constraints by using normal_lccdf
If you adjust the log-likelihood by

target += - normal_lccdf(measBias | -truevalue, meaSd);
and change
meabias to
vector[N] measBias;

Ah, I see where the misunderstanding is - your approach is correct when the truncated distribution relates to parameters - then indeed you need parameter bounds and the truncation is implicit. But in the case here, it is data that is distributed according to a truncated distribution and this requires explicit truncation.


measBias ~ normal(0, 2);
measSD ~ normal(0, 1);

helped with convergence. I am going with this until the next round of refinement. Thanks for the help.

That’s true in 2.17 and isn’t going to change in 2.18. I know how to do it, but just haven’t had time to run it through the language.

The explicit truncation is required when the distribution being truncated involves parameters. Otherwise, the normalization for the truncation is constant and can be dropped. So usually if it’s data distributed according to some parameters, you need the truncation as @martinmodrak mentioned. Similarly, if it’s a prior with fixed (data) parameters, you don’t need the truncation. If everything’s variable, you need the truncation, too.

You mean the tool error looks like a half-normal or a normal with mean other than 0 that’s truncated?

Have you tried increasing the target acceptance rate (adapt_delta)? If the divergences are minor numerical errors, that can get rid of them.

It can be more efficient to vectorize the density and separately vectorize the truncation points. The material in the manual on truncated and censored distribution goes over the math, but doesn’t show how to do this ad-hoc vectorization.

I guess I should show you how to do it here:

for (n in 1:N)
  y[n] ~ normal(mu, sigma) T[L[n], ];

can be replaced with

y ~ normal(mu, sigma);
target -= normal_lccdf(L | mu, sigma);

Note the double cc in the lccdf as you need the complementary form for lower truncation.

1 Like

Yes to the latter - normal with nonzero mean that’s truncated at -TrueVal.

Yes, I’ve tried going up to 0.99, but I still get the divergent transitions complaint.

Thanks for the suggestion. I am a neophyte when it comes to stan, so I’ll need some time to digest this.

The CCDF is just one minus the CDF. The density of a lower truncation at L is

p(y | \theta, L) = \frac{\mathsf{foo}(y | \theta)}{1 - \mathsf{foo\_cdf}(L | \theta)}


\begin{array}{rcl} \log p(y | \theta, L) & = & \log \mathsf{foo}(y | \theta) - \log(1 - \mathsf{foo\_cdf}(L | \theta) \\ & = & \log \mathsf{foo}(y | \theta) - \log \mathsf{foo\_ccdf}(L | \theta) \end{array}

So when you have a whole bunch of y_n, you can vectorize the density and truncations seperately and add them together.

Sorry @Bob_Carpenter if I revive this topic, but I would like to have a confirmation

According to

The definitions for CDFs and CCDFs is:

    CDF: Pr[X <= x]
    CCDF: Pr[X > x]

I would assume that for a lower truncation L we would need to subtract CDF: Pr[X <= x]. Can you confirm that CCDF is the correct use for lower truncation?

[EDIT] I indeed verified on a dummy model that - CCDF is right for lower bound truncation (although I am still puzzled)

1 Like