Integrating out Censored Values - Error following manual's example

Hello everyone,

I am trying to replicate John Kruschke’s blog post on censored data, by following the examples on the Stan manual (pages 188-190).

I have no issues when I try to estimate a model by imputing censored values, same approach as Kruschke’s blog post.

However, when I try to integrate them out by using target += N_censored * normal_lccdf(Censored Limit | mean, sigma), following Stan’s manual approach, I always run into this error:

Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.

Do you have any idea why this might be happening?

Bellow you’ll find the model code, a script to simulate data and replicate the problem.

Thank you for your help!

Stan model


data{
int<lower = 0> N_obs;
int<lower = 0> N_cens;
real y_obs[N_obs];
real<lower = max(y_obs)> U;
}

parameters {

real mu;
real<lower = 0> sigma;

}

model {
  y_obs ~ normal(mu, sigma);
  target += N_cens * normal_lccdf(U | mu, sigma);
}

generated quantities{
  real y_obs_hat[N_obs];

  for (i in 1:N_obs){
    y_obs_hat[i] = normal_rng(mu, sigma);
  }
}

Generate data

library(tidyverse)
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

set.seed(123)

# generate data from unit normal
y_raw <- rnorm(500)

actualMean <- mean(y_raw)
actualSD <- sd(y_raw)

desiredMean = 100
desiredSD = 15

#compute z-scores from unit normal

z_scores = (y_raw - actualMean)/actualSD 

# rescale unit normal to have desired mean and standard deviation

dy <- data.frame(value = desiredMean + z_scores * desiredSD)

# Get the censoring probability = mean + 1 SD

CensoringProb <- pnorm(1) 

# define censoring limit in our data set
CensoringLim <- quantile(dy$value, probs = CensoringProb)

# classify censored data points according to the censoring limit
dy <- dy %>% 
  mutate(censored = ifelse(value > CensoringLim, 1,0))

dy_cens <- filter(dy, censored == 0)

# Stan data
standata <- list(
  N_obs = nrow(filter(dy, censored == 0)),
  N_cens = nrow(filter(dy, censored == 1)),
  y_obs = filter(dy, censored == 0)$value, 
  U = CensoringLim[1]
)

Fit Stan model


IC_fit <- sampling(object = IntegratedCensored, 
                   data = standata, 
                   iter = 2000, 
                   chains = 4)

With init = list(list(mu = 100), list(mu = 100), list(mu = 100), list(mu = 100)), this works. The difficulty is that without a non-centered prior on mu and random initialization, it starts in the vicinity of zero and the normal_lccdf underflows.

2 Likes

Thank you very much.

In fact, a non-centered prior on mu also seems to work to.

So, basically, if mu initializes in a very small value, close to zero, the normal_lccdf will fail? Is it because there will be a very small initial normal_lccdf, given the censoring limit thatI am feeding in? (e.g., if the mean of the distribution initializes at 0 and my censoring limit is 120, the lccdf will be very unlikely?) Maybe this is totally wrong, but I am just trying to understand what is going on.

Thanks again!

The normal CDF will overflow if the argument is about 8.5 standard deviations above the mean. So, it was adding log(1 - 1) to your target for initial values in the unconstrained space on the (-2,2) interval.

1 Like

Ok, I understand it now. Thank you for the quick answers!