Integrating out censored data in negative binomial model

http://discourse.mc-stan.org/t/right-censored-data-in-a-negative-binomial-model/562?source_topic_id=566&source_topic_id=567

By accident, my initial post did not go to a new topic. Please click on above the link.

I’m afraid that link’s broken.

Thank you Bob for letting me know that the link is broken. I can still access it, so it may be permission issue.
I have reformatted my post below.

I suspect that I am missing something really basic. This is my first Stan model. I am trying to apply the two procedures for right censored data in section 11.3 in the manual. I adapted the example for continuous data to my count data and have run into trouble.

The first procedure imputes the censored data while

Contents of SimpleNBrightCensored.stan

data{
    int<lower=0> N;
    int<lower=0> N_obs;
    int<lower=0> N_cens;
    int y_obs[N_obs];
    int y_cens[N_cens];
    int<lower=max(y_obs)> U;
}
parameters{
  real mu;
  real<lower=0.001> scale;
}
model{
  scale ~ exponential( 2 );
  y_obs ~ neg_binomial_2( mu, scale );
  y_cens ~ neg_binomial_2( mu, scale );
}
generated quantities{
  real dev;
  dev = 0;
  dev = dev + (-2)*neg_binomial_2_lpmf( y_obs | mu , scale );
}

Rcode for Model1

model_file = 'simpleNBrightCensored.stan'
iterations = 10000

# inits
N = 54
N_obs = 44
N_cens = 10
U = 601
mu = 256
scale = 1.7

# data 
y_obs=c(7,23,54,59,61,76,79,91,99,127,129,133,141,147,147,150,
151,155,176,184,187,189,207,225,238,239,251,276,280,300,305,332,
369,391,425,432,462,481,483,506,518,544,569,581)
y_cens=c(601,601,601,601,601,601,601,601,601,601)

stan_data = list(N=N, N_obs=N_obs, y_obs=y_obs,  N_cens=N_cens, y_cens=y_cens) # data passed to stan
# set up the model
stan_model = stan(model_file, data = stan_data, chains = 1)

# fit model
stanfit2 = stan(fit = stan_model, data = stan_data,
                iter=iterations) # run the model

print(stanfit2,digits=3)

The second procedure involves integrating out the censored data.

The model simpleNBrightCensored2.stan:

data{
    int<lower=0> N;
    int<lower=0> N_obs;
    int<lower=0> N_cens;
    int y_obs[N_obs];
    int y_cens[N_cens];
    int<lower=max(y_obs)> U;
}
parameters{
  real mu;
  real<lower=0.1> scale;
}
model{
  scale ~ exponential( 2 );
  y_obs ~ neg_binomial_2( mu , scale );
  target += N_cens * neg_binomial_2_lccdf(U | mu, scale);
}
generated quantities{
  real dev;
  dev = 0;
  dev = dev + (-2)*neg_binomial_2_lpmf( y_obs | mu , scale );
}

The Rcode to run this model:

model_file = 'simpleNBrightCensored2.stan'
iterations = 10000
# inits
N = 54
N_obs = 44
N_cens = 10
#U = 601
#mu = 256
#scale = 1.7
# data 
y_obs=c(7,23,54,59,61,76,79,91,99,127,129,133,141,147,147,150,151,155,176,184,187,189,207,225,238,239,251,276,280,300,305,332,369,391,425,432,462,481,483,506,518,544,569,581)
y_cens=c(601,601,601,601,601,601,601,601,601,601)

stan_data = list(N=N, N_obs=N_obs, y_obs=y_obs,  N_cens=N_cens, y_cens=y_cens) # data passed to stan
# set up the model
stan_model2 = stan(model_file, data = stan_data, chains = 1)

# fit model
stanfit3 = stan(fit = stan_model2, data = stan_data,
                iter=iterations) # run the model
print(stanfit3,digits=3)

I get the following error message:

"Rejecting initial value:
Log probability evaluates to log(0), i.e. negative infinity.
Stan can't start sampling from this initial value."

When I tried to run the second model I got an error about U missing. I added a value for U in the stan_data and got the error you’re describing.

I added bounds on mu and scale as specified in the back of the manual and things seem to work:

real<lower=0.0> mu;
real<lower=0.0> scale;

Hope that helps!

Thank you very much bbbales2!

In addition to updating the model and uncommenting U, I had to uncomment mu and scale in the R code and it then ran to completion.

The corrected model code is here

data{
  int<lower=0> N;
  int<lower=0> N_obs;
  int<lower=0> N_cens;
  int y_obs[N_obs];
  int y_cens[N_cens];
  int<lower=max(y_obs)> U;
  }
  parameters{
    real<lower=0.0> mu;
    real<lower=0.0> scale;
  }
  model{
    scale ~ exponential( 2 );
    y_obs ~ neg_binomial_2( mu , scale );
    target += N_cens * neg_binomial_2_lccdf(U | mu, scale);
  }
  generated quantities{
    real dev;
   dev = 0;
   dev = dev + (-2)*neg_binomial_2_lpmf( y_obs | mu , scale );
}

and the corrected R code is here:

rm(list=ls())
library(rstan)
model_file = 'simpleNBrightCensored2.stan'
iterations = 10000
# inits
N = 54
N_obs = 44
N_cens = 10
U = 601
mu = 256
scale = 1.7

# data 
y_obs=c(7,23,54,59,61,76,79,91,99,127,129,133,141,147,147,150,151,155,176,184,187,189,207,225,238,239,251,276,280,300,305,332,369,391,425,432,462,481,483,506,518,544,569,581)
y_cens=c(601,601,601,601,601,601,601,601,601,601)

stan_data = list(N=N, N_obs=N_obs, y_obs=y_obs,  N_cens=N_cens, y_cens=y_cens) # data passed to stan
# set up the model
stan_model2 = stan(model_file, data = stan_data, chains = 1)

# fit the model
stanfit3 = stan(fit = stan_model2, data = stan_data,
            iter=iterations)

But how can I get WAIC and AIC for censored data??

@ABUJARAD Your question deserves it’s own thread I think. People might not notice it here.

If possible help solve this problem

@ABUJARAD I don’t know a good answer or where to point you to for one. I can at least tell you the answer probably isn’t going to be simple, haha. Starting a new thread is your best bet.