Generalized F distribution can reduce to Generalized Gamma distribution - but what prior to use?

Hello, I am working on the Generalized F distribution. I am following how the R package flexsurv handles it but try to code it up in Stan.

In R, you can call and run the following flexsurv code:

library("flexsurv")
fs2 <- flexsurvreg(Surv(recyrs, censrec) ~ 1, data = bc, dist = "genF")
fs2

The coolest thing about the Generalized F distribution is that it nests many distributions, and specifically, it reduces to the Generalized Gamma distribution when P approaches 0 (source). But P should in theory be positive. This is where my question is.

I coded up this Stan code for the Generalized F distribution:

// Generalized F Distribution
// No covariate in this version
//https://mpn.metworx.com/packages/flexsurv/1.1.1/reference/GenF.html

functions {

  real dgenf(real x, real mu, real sigma, real Q, real P) {

    real delta=sqrt(Q^2+2*P);
    
    real s1 = 2*inv((Q^2+2*P+Q*delta));
    
    real s2 = 2*inv((Q^2+2*P-Q*delta));

    real w = (log(x) - mu)*delta/sigma ; 

    real expw = exp(w);

    real logdens = log(delta) + s1*(log(s1) + w - log(s2)) -log(sigma*x)  - (s1+s2)*log(1+s1*expw/s2) - lbeta(s1, s2);
     
    return logdens;

   }

  real pgenf_S (real q, real mu, real sigma, real Q, real P) {

    real delta=sqrt(Q^2+2*P);
    
    real s1 = 2*inv((Q^2+2*P+Q*delta));
    
    real s2 = 2*inv((Q^2+2*P-Q*delta));
  
    real w = (log(q) - mu)*delta/sigma;
  
    // CDF = 1 - beta_cdf(s2/(s2 + s1*exp(w)) | s2, s1)
  
    real logprob = log(beta_cdf(s2/(s2 + s1*exp(w)) | s2, s1));
  
    return logprob;
  
  }

}

data{
  
  int Nuc;
  real yuc[Nuc];
  int Nc;
  real yc[Nc];

}

parameters{
  
  real mu;
  real sigma_log;
  real Q;
//  real P_log;  // I am unsure which version of P I should model and what prior to use
  real<lower=0> P;
}
  
  
transformed parameters{
  
  real sigma=exp(sigma_log);
//  real P = exp(P_log);  
  
}
  
model{
  
  mu ~ std_normal();
  sigma_log ~ std_normal();
  Q ~ std_normal();
//  P_log ~ std_normal();
  P ~ exponential(1);
  
  for (i in 1:Nuc) {
    target += dgenf(yuc[i], mu, sigma, Q, P ) ;
  }
  
  for (i in 1:Nc){

      target += pgenf_S(yc[i], mu, sigma, Q, P  );
     
              
  }
  
}


I use two ways to model P:

  1. I parameterize on P_log and use a std_normal() as prior. Everything runs nicely and fast.
  2. I parameterize on P and use a exponential(1) as prior. It runs a bit slowly.

Both parameterizations give nearly identical results to the point-estimate as the flexsurv. I also try simulation and obtain similar results. So far so good.

To run my Stan code, I use cmdstanr:

data_list <- list(
  Nuc = length(which(bc$censrec==1)), 
  yuc = bc[which(bc$censrec==1), c("recyrs")],
  Nc = length(which(bc$censrec==0)),
  yc = bc[which(bc$censrec==0),c("recyrs")]
)

file2 <- file.path("GenF.stan")
mod_f2 <- cmdstan_model(file2, stanc_options = list("O1"), quiet=TRUE)

fit_f2 <- mod_f2$sample(
  data = data_list,
  seed = 123,
  init = 1,
  chains = 4,
  parallel_chains = 4,
  refresh = 500,
  iter_warmup = 1000,
  iter_sampling = 1000
)

However, I would like to parameterize and choose a prior for either P or P_log so that it can inform the reduction from GenF to GenGamma. I would test by simulating GenGamma data and then run the GenF model on it to see where P is approaching 0.

Does anyone have a good suggestion? Thank you

1 Like

Nice!

If I understand correctly, you have a positively constrained variable and you want to allow values near zero (which is where it reduces to generalized gamma). For this, about the only choice you can make is something like a half-normal or half-t or exponential. The difference is that half-normal/half-t bounds the density at zero, whereas with something like exponential(1), the density is unbounded at zero. After that, it really just depends what you think large values might look like to be consistent to give you a sense of what scale to use.

As always, I’d recommend simulating and see how it works in cases where you know the answer. It can be very numerically unstable to try to push a positive-constrained value to zero, as it entails pushing the log of the value to negative infinity. If you try to put a prior on P_log, you won’t be able to make it consistent with zero and proper.

2 Likes