Hello everyone,
I am trying to replicate John Kruschke’s blog post on censored data, by following the examples on the Stan manual (pages 188-190).
I have no issues when I try to estimate a model by imputing censored values, same approach as Kruschke’s blog post.
However, when I try to integrate them out by using target += N_censored * normal_lccdf(Censored Limit | mean, sigma)
, following Stan’s manual approach, I always run into this error:
Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Do you have any idea why this might be happening?
Bellow you’ll find the model code, a script to simulate data and replicate the problem.
Thank you for your help!
Stan model
data{
int<lower = 0> N_obs;
int<lower = 0> N_cens;
real y_obs[N_obs];
real<lower = max(y_obs)> U;
}
parameters {
real mu;
real<lower = 0> sigma;
}
model {
y_obs ~ normal(mu, sigma);
target += N_cens * normal_lccdf(U | mu, sigma);
}
generated quantities{
real y_obs_hat[N_obs];
for (i in 1:N_obs){
y_obs_hat[i] = normal_rng(mu, sigma);
}
}
Generate data
library(tidyverse)
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
set.seed(123)
# generate data from unit normal
y_raw <- rnorm(500)
actualMean <- mean(y_raw)
actualSD <- sd(y_raw)
desiredMean = 100
desiredSD = 15
#compute z-scores from unit normal
z_scores = (y_raw - actualMean)/actualSD
# rescale unit normal to have desired mean and standard deviation
dy <- data.frame(value = desiredMean + z_scores * desiredSD)
# Get the censoring probability = mean + 1 SD
CensoringProb <- pnorm(1)
# define censoring limit in our data set
CensoringLim <- quantile(dy$value, probs = CensoringProb)
# classify censored data points according to the censoring limit
dy <- dy %>%
mutate(censored = ifelse(value > CensoringLim, 1,0))
dy_cens <- filter(dy, censored == 0)
# Stan data
standata <- list(
N_obs = nrow(filter(dy, censored == 0)),
N_cens = nrow(filter(dy, censored == 1)),
y_obs = filter(dy, censored == 0)$value,
U = CensoringLim[1]
)
Fit Stan model
IC_fit <- sampling(object = IntegratedCensored,
data = standata,
iter = 2000,
chains = 4)