Interval censored data fails with Weibull but not Gamma?

I attempt to run survival models with interval-censored data.

#War Data from Morrison and Schmittlein 1980

Dropped <- c(165, 46, 33, 22, 11, 11, 5, 4, 4, 3)
Tstart <- c(0:9);
Tend <- Tstart+1;

data_list <- list(
  Dropped = Dropped, 
  Tstart = Tstart,
  Tend = Tend,
  T = 10,
  N = 315,
  R = 315-sum(Dropped)
)

I tried the Gamma distribution which runs well:


//  Fit a Gamma model with interval censored data
//  


data {

  int<lower=0> T; // calibration period
  
  vector[T] Tstart;
  
  vector[T] Tend; 
  
  vector[T] Dropped; 
  
  int<lower=0> N; 
  
  int<lower=0> R; // remaining buyers

}


parameters {
  
  real<lower=0> lambda;
  real<lower=0> c;
  
}


model {


  lambda ~ normal(10,10);

  c ~ normal(1,10);
  
  for (i in 1:T) {
    target +=  Dropped[i] * log(gamma_cdf(Tend[i] | c, lambda) -  
                           gamma_cdf(Tstart[i] | c, lambda) ); 
                                                     
  }
  target += R * gamma_lccdf(T | c, lambda) ;

}

generated quantities{
 
   vector[11] predicted;
   
   for (i in 1:10) {
       predicted[i]=N*(gamma_cdf(Tend[i] | c, lambda) -  
                           gamma_cdf(Tstart[i] | c, lambda) );
   }
   predicted[11]=N*(1-gamma_cdf(10 | c, lambda));
     
}

I omitted the convergence diagnostics but they are really good. The fit of the Gamma, using the posterior mean of each histogram bar, is also good.

But when I tried to use the Weibull, the sampling cannot even get started. The code is nearly identical so that cannot be the issue.

//  Fit a Weibull model with interval censored data
//  


data {

  int<lower=0> T; // calibration period
  
  vector[T] Tstart;
  
  vector[T] Tend; 
  
  vector[T] Dropped; 
  
  int<lower=0> N; 
  
  int<lower=0> R; // remaining buyers

}


parameters {
  
  real<lower=0> lambda;
  real<lower=0> c;
  
}


model {


  lambda ~ normal(10,10);

  c ~ normal(1,10);
  
  for (i in 1:T) {
  target +=  Dropped[i] * log(weibull_cdf(Tend[i] | c, lambda) -  
                           weibull_cdf(Tstart[i] | c, lambda) ); 
  }
  target += R * weibull_lccdf(T | c, lambda) ;

}

Can someone try and see if they can determine the reason?

1 Like

It’s most likely a floating point error in the underlying cdf or lccdf. We updated the gamma distribution to be more stable in the last release so I’m glad to see that’s working as expected.

@rok_cesnovar @andrjohns
This line on the lcdf may be updated to use log1m_exp:
T_partials_return cdf_log = sum(log(1 - exp_n));

Try this. Double check that I got alpha and sigma right and not switched around.

functions {
  real my_weibull_lcdf(real y, real alpha, real sigma) {
    return log1m_exp(-pow(y / sigma, alpha));
  }
}
data {
  int<lower=0> T; // calibration period 
  vector[T] Tstart;
  vector[T] Tend;
  vector[T] Dropped;
  int<lower=0> N;
  int<lower=0> R; // remaining buyers
}
parameters {
  real<lower=0> lambda;
  real<lower=0> c;
}
model {
  lambda ~ normal(10, 10);
  c ~ normal(1, 10);
  
  for (i in 1 : T) {
    target += Dropped[i] * log_diff_exp(my_weibull_lcdf(Tend[i] | c, lambda), my_weibull_lcdf(Tstart[i] | c, lambda));
  }
  target += R * weibull_lccdf(T | c, lambda);
}

2 Likes

Thank you and the changes work!

Will the developers update the underlying Weibull functions? Thanks

Yes, issue that you can follow is update weibull lcdf/cdf to use `log1m_exp` · Issue #2810 · stan-dev/math · GitHub

@sonicking the 0 in your first interval is what’s causing the issue. I’m not sure how the UDF I made fixes the log(0) issue. The derivative of the weibull cdf at y = 0 is undefined because

\frac{\partial}{\partial \alpha} \bigg ( 1 - \exp \bigg(- \bigg(\frac{y}{\sigma}\bigg)^\alpha \bigg) \bigg) = \exp\bigg( -\bigg(\frac{y}{\sigma}\bigg)^{\alpha}\bigg) \bigg(\frac{y}{\sigma}\bigg)^{\alpha} \log\bigg(\frac{y}{\sigma}\bigg)

the \log(y/\sigma) is the problem. Maybe @nhuurre or someone else has a clever way to deal with this.

1 Like

Since \alpha > 0 we have \left(\frac{y}{\sigma}\right)^\alpha \log\left(\frac{y}{\sigma}\right) = 0. I think your UDF works because pow(0,...) probably handles zero just fine. (The code is rather complicated but has a comment to that effect:

1 Like

Thank you for letting me know about this issue. In light of it, another solution to circumvent this issue without resorting the user-defined function is the following:

//  Fit a Weibull model with interval censored data
//  


data {

  int<lower=0> T; // calibration period
  
  vector[T] Tstart;
  
  vector[T] Tend; 
  
  vector[T] Dropped; 
  
  int<lower=0> N; 
  
  int<lower=0> R; // remaining buyers

}


parameters {
  
  real<lower=0> lambda;
  real<lower=0> c;
  
}


model {

  lambda ~ normal(10,10);
  c ~ normal(1,10);
  
  //HANDLE THE FIRST INTERVAL INVOLVING 0 DIFFERENTLY
  target += Dropped[1] * weibull_lcdf(Tend[1] | c, lambda);
  
  for (i in 2:T) {
                          
   target += Dropped[i] * log_diff_exp(weibull_lcdf(Tend[i] | c, lambda),weibull_lcdf(Tstart[i] | c, lambda));
                           
  }
  target += R * weibull_lccdf(T | c, lambda) ;

}