Gompertz distribution cdf under-/overflow

Hi,

I am trying to write custom functions for the _cdf/_lcdf for the gompertz distribtion Wikipedia

I have been successful with other standard parametric survival functions which are either implemented already or by custom functions. The gompertz is however challenging for me and I am unable to avoid under-/overflow issues. I have tried algebraic manipulation and using log1m/log1p and logsumexp without any success. I have done this in JAGS previously with the zeroes trick and modeling survival and hazard separately. Translating the JAGS-code may be an option, however it would be neat if this has already been done or if there are obvious ways to solve this.

I have written the gompertz _lpdf and _lccdf which does not seem to under-/overflow. I assume it is due to the double exp in the cdf.

I have not been successful in translating the discussions on BRMS in this thread and have tried several alternative ways, including splitting up the equation into multiple steps. Copy-pasting the simplest version of my _cdf function below. Any help is highly appreciated!

functions {
  real gompertz_cdf (real y, real b, real eta) {   // Not implemented in STAN, custom cdf-function
        return 1 - exp(-eta*(exp(b*y) - 1)); 
        }
} 

There are more composed functions. I believe the best you can do is

real gompertz_lcdf(real y, real b, real eta) {
  return log1m_exp(-eta*expm1(b*y));
}
2 Likes

Thanks for the proposed code and for confirming this is the best I can do. I am only able to avoid the under-/overflow with unreasonable boundaries for b as shown below. I’ll try to translate my JAGS code instead and see whether that’s successful. I really appreciate the help as I can then park this approach as a dead end!

parameters {
      real <lower=0, upper=0.01> b; 
} 

Can you post a mwe with some simulated data?

Below is a mwe with some suggested input-data. The under-/overflow isn’t an issue here as the cdf is not included. However it doesn’t work properly for the mwe either. Hopefully something is obvious to you. Thanks!

OS.event.times <- c(1,1,1,3,3,4,6,7,7,8) 
OS.event <- c(0,0,0,0,1,0,1,0,1,0) 
trial.data <- data.frame(OS.months = OS.event.times, OS.event = OS.event)
trial.data

Folder for STAN models

Stanfolder <- "./R/STAN/STAN_models/" # What folder to create for JAGS-model-file
if(!dir.exists(Stanfolder)) {dir.create(Stanfolder)} # checks if dir exists, creates it if not

Setting up values for STAN

trial.N <- sum(trial.data$OS.event) # Number of datapoints
trial.events.N <- sum(trial.data$OS.event==1) # Number of events
trial.censored.N <- sum(trial.data$OS.event==0) # Number of censored
Trial.censored.vect <- rep(0, trial.censored.N) # Vector of censored events 
trial.event.times <- trial.data$OS.months[trial.data$OS.event==1] # Trial event times
trial.censored.times <- trial.data$OS.months[trial.data$OS.event==0] # Trial censored times
# Stan code
cat("functions {

      // shape b, scale eta
      real gompertz_cdf (real y, real b, real eta) {
        return 1 - exp(-eta*(exp(b*y) - 1));
      }
      
      real gompertz_lpdf (real y, real b, real eta) { 
        return log(b*eta) + eta + b*y - eta*exp(b*y);  
      }
      
     real gompertz_lcdf (real y, real b, real eta) {
          return log1m_exp(-eta*expm1(b*y));
     }
      
      real gompertz_lccdf (real y, real b, real eta){
        return -eta*(exp(b*y) - 1);
      }
      
    }
    data { 
        int N; 
        int<lower=0> Nobs; 
        int<lower=0> Ncen; 
        vector[Nobs] yobs; 
        vector[Ncen] ycen; 
    } 
    parameters {
      real <lower=0, upper=0.1> b;     // param 1
      real <lower=0, upper=0.1> eta;   // param 2 
    }
    model {
    
    // priors
    b ~ gamma(1, 20);     
    eta ~ gamma(1, 20); 
    
    for (i in 1:Nobs) {
    target += gompertz_lpdf(yobs[i] | b, eta);
    }
    for (j in 1:Ncen) {
    target += gompertz_lccdf(ycen[j] | b, eta);
    }
    
    }
", file=paste0("./R/STAN/STAN_models/Stan_Model_gompertz.stan")) 
fit <- stan("./R/STAN/STAN_models/Stan_Model_gompertz.stan", data=list(N = trial.N, Nobs = trial.events.N, 
                                                                       Ncen = trial.censored.N, yobs = trial.event.times, ycen = trial.censored.times, 
                                                                       trial_cen_vector = Trial.censored.vect), iter=5000, chains=4)
fit

[edit: escaped code—it requires back-ticks, not apostrophes (```)]

You may want to jointly constrain b * max(y) to be below whatever would overflow.

P.S. I’d strongly recommend putting your Stan code in standalone files. It then lets you use prints with quotes and transposes with apostrophes and gives you an easy way to find lines by line number in error messages. It also prints more neatly with syntax highlighting.

Thanks a lot for the help and for the tips on using standalone files. I’ll make sure to look into that, rather than creating the file from within R.

I rechecked my algebra, but even though I failed to find errors I was still unsuccessful with the equations I started with above. I ended up using another parametrization of the gompertz similar to the one used by Jackson in the flexsurv package github. I’ll paste the working functions for future readers if anyone is interested in my solution.

The cdf in particular over-/underflows without constraints. Using reasonable constraints on b, eta and potentially also b * max(y) to avoid this is necessary, your proposal above was simple but enlightening.

functions {

      real gompertz_lpdf (real y, real b, real eta) { 
        return log(eta) + (b * y) + (-eta/b * (expm1(b * y)));
      }
      
      real gompertz_cdf (real y, real b, real eta) {  
         return 1 - exp(-eta/b * (expm1(b * y))); 
      }
      
     real gompertz_lcdf (real y, real b, real eta) {  
          return log1m_exp(-eta/b * (expm1(b * y))); 
     }
      
      real gompertz_lccdf (real y, real b, real eta){ 
        return (-eta/b * (expm1(b * y)));
      }
      
    }

Thanks for following up. I’ll mark your own response as the solution.