Interval-censored normal model: integration vs estimation

mdonoghoe · July 1, 2025, 6:04am

Suppose y_i \sim N(\mu, \sigma) (i = 1, \ldots, N) but we observe y^*_i = \lfloor y_i \rfloor.

A model for inference about \mu, \sigma can be based on integration, i.e.:

p(y^*_i \mid \mu, \sigma) = \int^{y^*_i + 1}_{y^*_i} \mathrm{normal}(y \mid \mu, \sigma) dy

In Stan code:

data {
  
  int<lower=1> N;
  array[N] int y_star;
  
}

parameters {
  
  real mu;
  real<lower=0> sigma;
  
}

model {
  
  for (n in 1:N) {
    
    target += log_diff_exp(normal_lcdf(y_star[n] + 1 | mu, sigma), 
                           normal_lcdf(y_star[n] | mu, sigma));
    
  }
  
}

However, because I want to eventually adapt this idea to a different problem, I would like to “estimate” the unobserved y_i, similar to the approach taken in the Stan user’s guide.

In that example there is a fixed censoring value, so it uses a simple constraint in the parameter declaration. I note that it is possible to set varying constraints, but again because of my planned extension to another case, I would like to do it in a more complicated way: basically using an inverse transform sample:

y_i = \mu + \sigma \Phi^{-1}\left( \Phi(\frac{y^*_i - \mu}{\sigma}) + u_i (\Phi(\frac{y^*_i + 1 - \mu}{\sigma}) - \Phi(\frac{y^*_i - \mu}{\sigma})) \right) where u_i \sim U(0,1).

If I want y_i \sim N(\mu, \sigma), I need to compute the Jacobian:

\frac{dy_i}{du_i} = \sigma \frac{\Phi\left(\frac{y^*_i + 1 - \mu}{\sigma}\right) - \Phi\left(\frac{y^*_i - \mu}{\sigma}\right)}{\phi\left(\frac{y_i - \mu}{\sigma}\right)}

The other parameters aren’t transformed and the Jacobian is triangular, so its log-determinant is just the log of the value above. In Stan:

data {
  
  int<lower=1> N;
  array[N] int y_star;
  
}

parameters {
  
  real mu;
  real<lower=0> sigma;
  array[N] real<lower=0,upper=1> u;
  
}

transformed parameters {
  
  array[N] real y;
  
  for (n in 1:N) {
    
    y[n] = mu + sigma * inv_Phi(normal_cdf(y_star[n] | mu, sigma) + 
      u[n] * (normal_cdf(y_star[n] + 1 | mu, sigma) - normal_cdf(y_star[n] | mu, sigma)));
    
  }
  
}

model {
  
  y ~ normal(mu, sigma);  // (0)
  
  for (n in 1:N) {
    
    // Jacobian adjustment
    target += log(sigma);  // (1)
    
    target += log_diff_exp(normal_lcdf(y_star[n] + 1 | mu, sigma), 
                           normal_lcdf(y_star[n] | mu, sigma));  // (2)

    target += -normal_lpdf(y[n] | mu, sigma);  // (3)
    
  }
  
}

This model doesn’t work well. Line (2) is the same as in the integrated case. Line (3) cancels out Line (0). So the problem is Line (1) (the model works if I remove it), but I am not sure why.

Does my logic make sense? Where have I gone wrong?

aakhmetz · July 2, 2025, 4:22am

I think you just introduce a new parameter, y_raw between 0 and 1, y = y_raw + y_star[n] with prior y_raw ~ beta(1, 1). Then you fit it to your data such as (0). Maybe if this simpler model works you can move to more complicated variations

mdonoghoe · July 9, 2025, 5:45am

Thanks, I assume you meant that y_raw is a vector so y[n] = y_raw[n] + y_star[n]? That worked and is a nice approach that I hadn’t thought of, and although it doesn’t fit exactly into the more complex model that I’m working towards, it has given me some ideas.

In the meantime, I found the error I made above. My derivation of the Jacobian was correct, but I translated the -\log \phi((y_i - \mu) / \sigma) as -normal_lpdf(y[n] | mu, sigma) in line (3) when it should have been -std_normal_lpdf((y[n] - mu) / sigma) , or equivalently -normal_lpdf(y[n] | mu, sigma) - log(sigma), which would have cancelled out line (1).

Topic		Replies	Views
Jacobian on fandamental examples Modeling	6	509	July 23, 2020
Log normal distribution in Stan General	4	1298	January 15, 2021
Non-invertible change of variables? Modeling	24	7202	October 2, 2017
Clarification regarding Jacobians - BYM2 model Modeling	5	395	October 26, 2023
ICDF and Jacobian Modeling jacobian-adjustment	8	721	June 1, 2021

Interval-censored normal model: integration vs estimation

Related topics