Issue with Survival Models

phcooney · November 1, 2021, 3:06pm

I am running into an issue fitting a three parameter generalized gamma survival model.

I am using the parameterization from the flexsurv package and have checked that survival and density functions are correct. It seems that the model runs into convergence issues when evaluating the survival function because of the integral relating to the gamma cumulative density function? For other survival models (i.e. Weibull) which have survival functions which do not require a cumulative density function do not have such an issue.

I know that there is a way around this as per the survHE package, however, I am wondering if there is a way to still specify the model likelihood using the density and survival functions?

functions {
  real gen_gamma_lden(real x, real mu, real sigma, real Q) {
    // Uses the same parameterisation as flexsurv
    // mu = location
    // sigma = scale
    // Q = shape
    real lden;
     real w;
    // Constructs the log-density for each observation
    w = ((log(x)-mu))/sigma;
  
    lden = -log(sigma*x)+log(fabs(Q))+pow(Q,-2)*log(pow(Q,-2))+pow(Q,-2)*(Q*w-exp(Q*w))-lgamma(pow(Q,-2));
    
    return lden;
    
  }
  \\Survival function
    real Surv_gengamma(real time, real mu, real sigma, real Q){
    real Sind;
    real qi;
    real w;
    real expnu;
    
    qi = 1/(Q*Q);
    w = (log(time)-mu)/sigma;
    expnu = exp(Q*w)*qi;
    
    Sind =  1- gamma_cdf(expnu, qi,1);
    return Sind;
    
  }
  
  
  // Defines the sampling distribution
  real gengamma_lpdf (real t, real d, real d2,  real mu, real sigma, real Q) {
    real log_lik;
    // Issue relates to the inclusion of the survival function;=
    log_lik = d*gen_gamma_lden(t,mu,sigma, Q) + d2*log(Surv_gengamma(t,mu,sigma, Q));
    //log_lik = d*gen_gamma_lden(t,mu,sigma, Q) ;
    return log_lik;
  }
  
  
}

data {
  int<lower=1> n;                      // number of observations
  vector[n] t;               // fully observed times
  vector[n] d;              // censoring indicators
  int<lower=1> H;                         // number of covariates (including intercept)
  matrix[n,H] X;                  // matrix of categorical covariates for the valid cases (0/1 => dummy variables)
  vector[H] mu_beta;                      // vector of means for the covariates
  vector<lower=0>[H] sigma_beta;          // vector of sd for the covariates
  real mu_Q;                              // mean for the parameter Q
  real<lower=0> sigma_Q;                  // sd for the parameter Q
  real<lower=0> a_sigma;                  // first parameter for the scale distribution
  real<lower=0> b_sigma;                  // second parameter for the scale distribution
  
}

transformed data{
vector[n] d2;              // censoring indicators
for (i in 1:n){
  if(d[i] == 1){ 
   d2[i] = 0;
  }else{
   d2[i] =1;
  }
}


}

parameters {
  real Q;                                 // shape of the Generalised Gamma distribution
  real<lower=0> sigma;                    // scale of the Generalised Gamma distribution
  vector[H] beta;                         // coefficients for the covariates
 }

transformed parameters{
  vector[n] linpred;               // rescaled predictor (mu)

  linpred = X*beta;
  
}

model {
  // Prior distributions
  Q ~ normal(mu_Q,sigma_Q);
  sigma ~ gamma(a_sigma,b_sigma);
  beta ~ normal(mu_beta,sigma_beta);
  
  for(i in 1:n){
    t[i] ~ gengamma(d[i],d2[i], linpred[i],sigma,Q);
  }

}

generated quantities {
  real mu;
  mu = beta[1];
}

set.seed(123)

data.example = data.frame( times = flexsurv::rgengamma(100, 1.2,.9, .7),
                           status = 1)
#data.example$status[1:5] <-0



stan.data.example <- list()

stan.data.example$t <- data.example$times
stan.data.example$d <- data.example$status
stan.data.example$n <- length(data.example$times)


stan.data.example$H <- 2

stan.data.example$X <- matrix(c(rep(1,length(data.example$times)), 
                                rep(0,length(data.example$times))), ncol = 2)

stan.data.example$mu_beta <- c(0,0)
stan.data.example$sigma_beta <- c(100,100)
stan.data.example$a_sigma <- 0.1
stan.data.example$b_sigma <- 0.1
stan.data.example$mu_Q<- 0
stan.data.example$sigma_Q <- 100


fit <- rstan::stan(model_code = GenGamma.model, model_name = "GenGammaTest", 
                   data = stan.data.example, iter = 2000, chains = 3)

flexsurvreg(Surv(times,status)~1,data =data.example, dist = "gengamma")

martinmodrak · November 10, 2021, 1:41pm

Can’t really delve deeply into the problem, but it appears that there are two separate modes for Q, and chains get stuck in the individual modes:

In fact it appears, the model prohibits a range of Q values near zero:

(plots via ShinyStan)

Understanding why this is the case is probably important for moving forward.

Best of luck with your model!

maxbiostat · November 16, 2021, 12:31am

@hhau seems to have encountered something similar before. Any suggestions?

hhau · November 26, 2021, 11:16pm

~~Q is a shape parameter and must be positive, thus should have a <lower = 0> constraint. That would eliminate the symmetry / second mode~~. Edit: Perhaps not, I see now that flexsurv also permits negative values for Q. It also states that

a further class of models with negative q, and survivor function I(\gamma, u), where z is replaced by −z

but I don’t see this replacement in the code anywhere?

I am also suspicious of the numerical stability of this line:

lden = -log(sigma*x)+log(fabs(Q))+pow(Q,-2)*log(pow(Q,-2))+pow(Q,-2)*(Q*w-exp(Q*w))-lgamma(pow(Q,-2));

I would compute some of these terms separately and print out their value. Terms like log(pow(Q,-2)) will increase very quickly for small Q.

Topic		Replies	Views
Generalized Gamma Implementation Leads To Divergences Modeling	4	475	December 8, 2023
Weibull Survival model for strongly censored data - Stan does not recover sample distribution values Modeling	2	1243	May 11, 2019
Generalized Gamma distribution with absolute value on a parameter Modeling fitting-issues	0	656	September 21, 2022
Competing risks survival models Modeling	4	1322	April 25, 2018
Convergence issues with model for predicting survival counts based on increasing treatments Modeling rstan , fitting-issues , specification , binomial-response	9	958	February 10, 2022

Issue with Survival Models

Related topics