Hello, I am working on the Generalized F distribution. I am following how the R package flexsurv handles it but try to code it up in Stan.
In R, you can call and run the following flexsurv code:
library("flexsurv")
fs2 <- flexsurvreg(Surv(recyrs, censrec) ~ 1, data = bc, dist = "genF")
fs2
The coolest thing about the Generalized F distribution is that it nests many distributions, and specifically, it reduces to the Generalized Gamma distribution when P approaches 0 (source). But P should in theory be positive. This is where my question is.
I coded up this Stan code for the Generalized F distribution:
// Generalized F Distribution
// No covariate in this version
//https://mpn.metworx.com/packages/flexsurv/1.1.1/reference/GenF.html
functions {
real dgenf(real x, real mu, real sigma, real Q, real P) {
real delta=sqrt(Q^2+2*P);
real s1 = 2*inv((Q^2+2*P+Q*delta));
real s2 = 2*inv((Q^2+2*P-Q*delta));
real w = (log(x) - mu)*delta/sigma ;
real expw = exp(w);
real logdens = log(delta) + s1*(log(s1) + w - log(s2)) -log(sigma*x) - (s1+s2)*log(1+s1*expw/s2) - lbeta(s1, s2);
return logdens;
}
real pgenf_S (real q, real mu, real sigma, real Q, real P) {
real delta=sqrt(Q^2+2*P);
real s1 = 2*inv((Q^2+2*P+Q*delta));
real s2 = 2*inv((Q^2+2*P-Q*delta));
real w = (log(q) - mu)*delta/sigma;
// CDF = 1 - beta_cdf(s2/(s2 + s1*exp(w)) | s2, s1)
real logprob = log(beta_cdf(s2/(s2 + s1*exp(w)) | s2, s1));
return logprob;
}
}
data{
int Nuc;
real yuc[Nuc];
int Nc;
real yc[Nc];
}
parameters{
real mu;
real sigma_log;
real Q;
// real P_log; // I am unsure which version of P I should model and what prior to use
real<lower=0> P;
}
transformed parameters{
real sigma=exp(sigma_log);
// real P = exp(P_log);
}
model{
mu ~ std_normal();
sigma_log ~ std_normal();
Q ~ std_normal();
// P_log ~ std_normal();
P ~ exponential(1);
for (i in 1:Nuc) {
target += dgenf(yuc[i], mu, sigma, Q, P ) ;
}
for (i in 1:Nc){
target += pgenf_S(yc[i], mu, sigma, Q, P );
}
}
I use two ways to model P:
- I parameterize on P_log and use a
std_normal()
as prior. Everything runs nicely and fast. - I parameterize on P and use a
exponential(1)
as prior. It runs a bit slowly.
Both parameterizations give nearly identical results to the point-estimate as the flexsurv. I also try simulation and obtain similar results. So far so good.
To run my Stan code, I use cmdstanr
:
data_list <- list(
Nuc = length(which(bc$censrec==1)),
yuc = bc[which(bc$censrec==1), c("recyrs")],
Nc = length(which(bc$censrec==0)),
yc = bc[which(bc$censrec==0),c("recyrs")]
)
file2 <- file.path("GenF.stan")
mod_f2 <- cmdstan_model(file2, stanc_options = list("O1"), quiet=TRUE)
fit_f2 <- mod_f2$sample(
data = data_list,
seed = 123,
init = 1,
chains = 4,
parallel_chains = 4,
refresh = 500,
iter_warmup = 1000,
iter_sampling = 1000
)
However, I would like to parameterize and choose a prior for either P or P_log so that it can inform the reduction from GenF to GenGamma. I would test by simulating GenGamma data and then run the GenF model on it to see where P is approaching 0.
Does anyone have a good suggestion? Thank you