Interval-censored normal model: integration vs estimation

Suppose y_i \sim N(\mu, \sigma) (i = 1, \ldots, N) but we observe y^*_i = \lfloor y_i \rfloor.

A model for inference about \mu, \sigma can be based on integration, i.e.:

p(y^*_i \mid \mu, \sigma) = \int^{y^*_i + 1}_{y^*_i} \mathrm{normal}(y \mid \mu, \sigma) dy

In Stan code:

data {
  
  int<lower=1> N;
  array[N] int y_star;
  
}

parameters {
  
  real mu;
  real<lower=0> sigma;
  
}

model {
  
  for (n in 1:N) {
    
    target += log_diff_exp(normal_lcdf(y_star[n] + 1 | mu, sigma), 
                           normal_lcdf(y_star[n] | mu, sigma));
    
  }
  
}

However, because I want to eventually adapt this idea to a different problem, I would like to “estimate” the unobserved y_i, similar to the approach taken in the Stan user’s guide.

In that example there is a fixed censoring value, so it uses a simple constraint in the parameter declaration. I note that it is possible to set varying constraints, but again because of my planned extension to another case, I would like to do it in a more complicated way: basically using an inverse transform sample:

y_i = \mu + \sigma \Phi^{-1}\left( \Phi(\frac{y^*_i - \mu}{\sigma}) + u_i (\Phi(\frac{y^*_i + 1 - \mu}{\sigma}) - \Phi(\frac{y^*_i - \mu}{\sigma})) \right) where u_i \sim U(0,1).

If I want y_i \sim N(\mu, \sigma), I need to compute the Jacobian:

\frac{dy_i}{du_i} = \sigma \frac{\Phi\left(\frac{y^*_i + 1 - \mu}{\sigma}\right) - \Phi\left(\frac{y^*_i - \mu}{\sigma}\right)}{\phi\left(\frac{y_i - \mu}{\sigma}\right)}

The other parameters aren’t transformed and the Jacobian is triangular, so its log-determinant is just the log of the value above. In Stan:

data {
  
  int<lower=1> N;
  array[N] int y_star;
  
}

parameters {
  
  real mu;
  real<lower=0> sigma;
  array[N] real<lower=0,upper=1> u;
  
}

transformed parameters {
  
  array[N] real y;
  
  for (n in 1:N) {
    
    y[n] = mu + sigma * inv_Phi(normal_cdf(y_star[n] | mu, sigma) + 
      u[n] * (normal_cdf(y_star[n] + 1 | mu, sigma) - normal_cdf(y_star[n] | mu, sigma)));
    
  }
  
}

model {
  
  y ~ normal(mu, sigma);  // (0)
  
  for (n in 1:N) {
    
    // Jacobian adjustment
    target += log(sigma);  // (1)
    
    target += log_diff_exp(normal_lcdf(y_star[n] + 1 | mu, sigma), 
                           normal_lcdf(y_star[n] | mu, sigma));  // (2)

    target += -normal_lpdf(y[n] | mu, sigma);  // (3)
    
  }
  
}

This model doesn’t work well. Line (2) is the same as in the integrated case. Line (3) cancels out Line (0). So the problem is Line (1) (the model works if I remove it), but I am not sure why.

Does my logic make sense? Where have I gone wrong?

1 Like

I think you just introduce a new parameter, y_raw between 0 and 1, y = y_raw + y_star[n] with prior y_raw ~ beta(1, 1). Then you fit it to your data such as (0). Maybe if this simpler model works you can move to more complicated variations

2 Likes

Thanks, I assume you meant that y_raw is a vector so y[n] = y_raw[n] + y_star[n]? That worked and is a nice approach that I hadn’t thought of, and although it doesn’t fit exactly into the more complex model that I’m working towards, it has given me some ideas.

In the meantime, I found the error I made above. My derivation of the Jacobian was correct, but I translated the -\log \phi((y_i - \mu) / \sigma) as -normal_lpdf(y[n] | mu, sigma) in line (3) when it should have been -std_normal_lpdf((y[n] - mu) / sigma) , or equivalently -normal_lpdf(y[n] | mu, sigma) - log(sigma), which would have cancelled out line (1).

1 Like