Cumulative probability within interval [t, t+1]

Tom_C · July 30, 2024, 2:45pm

For a time-to-event (survival) analysis, I aim to calculate the probability of an event occurring within an interval, e.g. between time t and (t+1), where the time-to-event probability is drawn from a continuous distribution.

The time-to-event distribution could be Exponential, Gamma or Weibull. For this example I will use the simpler exponential distribution:

The probability of the event within the interval (conditional on the event not having previously occured) is given by \int_t^{t+1} 1-e^{- \lambda x} dx ; which can be written as (1-e^{- \lambda (t+1)}) - (1-e^{- \lambda t})

Or in stan syntax: exponential\_cdf(t+1~ |~ \lambda) - exponential\_cdf(t ~|~ \lambda)

The key part of model code on the log scale is therefore:

transformed parameters {
  vector[N] lp; //log probability for an interval

  for(i in 1:N)
      lp[i] = log_diff_exp(exponential_lcdf(t[i]+1 | lambda), 
                           exponential_lcdf(t[i]   | lambda));
}

However, fitting this model results in an inability to initialise model parameters:

Chain 1 Rejecting initial value:
Chain 1 Gradient evaluated at the initial value is not finite.
Chain 1 Stan can’t start sampling from this initial value.
Chain 1 Rejecting initial value:
Chain 1 Log probability evaluates to log(0), i.e. negative infinity.
Chain 1 Stan can’t start sampling from this initial value.
Chain 1 Initialization between (-2, 2) failed after 100 attempts.

I have fitted many variations of this model and it seems that subtracting the cumulative probabilities is causing the difficulties. A simplified version of the full code is given below.

Can anyone explain this behaviour?

data {
  int <lower=1> N;                  //number of age classes
  array[N] int<lower=1> tested;     //number tested per age class
  array[N] int<lower=0> positive;   //number positive per age class
  array[N] int<lower=1> age_lower;  //lower interval of age class
  array[N] int<lower=age_lower> age_upper;  //upper interval of age class
}

parameters {
   real<lower=0> lambda;  //rate of exponential distribution 
}

transformed parameters {
  vector[N] lp;  //log probability within an interval

  for(i in 1:N)
      lp[i] = log_diff_exp(exponential_lcdf(age_upper[i] | lambda), 
                           exponential_lcdf(age_lower[i] | lambda));
} 

model {
    for(i in 1:N)
       target += binomial_lpmf(positive[i] | tested[i], exp(lp[i]));
}

Bob_Carpenter · July 30, 2024, 6:18pm

Part of the problem may be that our log cdfs are for the most part implemented naively by taking the log of the cdf. This doesn’t make unstable in the tails.

You can use print(target()) to track when the target becomes -infinity. Did it not tell you where there was an error in the Stan code?

Also, it’s much more efficient to only compute the exponential lcdfs once. As is, you calculate most of them twice. That’d be something like this:

vector[N] exp_lcdf;
for (n in 1:N) {
  exp_lcdf[n] = exponential_lcdf(t[n] | lambda);
}

Did you really intend to write t[I] + 1 rather than t[i + 1]? Usually, you measure between time points in your time series, not between discrete indexes. So that’d be this:

vector[N - 1] lp;
for (n in 1:N - 1) {
  lp[n] = log_diff_exp(exp_lcdf[n + 1], exp_lcdf[n]);
}

with only N - 1 entries. Or maybe there’s a log(0) log cdf on the boundary?

Topic		Replies	Views
Model returning Log probability evaluates to log(0), i.e. negative infinity Modeling	3	348	October 7, 2021
Survival model for readout/interval data Modeling	1	309	September 1, 2020
Question about Log probability evaluates to log(0), i.e. negative infinity Modeling rstan , specification	5	568	July 30, 2021
Ragged data structure causing initialization problems for interval censored model Modeling fitting-issues	9	622	June 29, 2018
Imputing parameter from survival analysis (gamma distribution) Modeling rstan , hierarchical-model , survival	0	355	January 20, 2023

Cumulative probability within interval [t, t+1]

Related topics