I am trying to optimise a Stan model and would appreciate any insights.
The model is a exponential time to event (i.e. constant hazard) model with a subject frailty term which allows the individual subject rates to vary multiplicatively according to a centered (at mean 1) gamma distribution. This model produces a negative binomial counting process.
I know that fitting a count model using say neg_binomial_2_glm is very efficient, but I’m exploring some ideas for which I need a TTE frailty specification.
Additionally, I think that specifying the gamma mixing distribution might be slowing things down somewhat compared to using say a log-normal distribution, but I need to keep the marginal count outcome exactly negative binomial.
The model code (which runs fairly well but) which I am trying to optimise is here
data {
int N;
vector[N] y;
int K;
vector[N] Censor;
matrix[N, K] X;
int id[N]; // index of ID categories
int<lower = 0> M; // number of ID categories
// priors
vector[K] betaMean;
vector[K] betaSD;
real iphiSD;
}
parameters {
vector[K] beta;
real<lower=0> iphi;
vector<lower=0>[M] etaID; // ID effect
}
transformed parameters {
vector[N] lambda; // linear predictor
vector[N] lPDF; // log PDF
lambda = X*beta + log(etaID[id]);
lPDF = lambda -y.*exp(lambda);
}
model {
//Frailty
etaID ~ gamma(1/iphi, 1/iphi);
//Priors
iphi ~ normal(0, iphiSD);
beta ~ normal(betaMean, betaSD);
//Likelihood
target += lPDF - Censor.*lambda;
}
I have tried to fit a likelihood using the exponential density function and an adjustment for censored patients.
y~exponential(lambda)
target += -Censor.*lambda;
Or, equivalently y_obs ~ exponential (…) and target += exponential_lccdf(y_cen | …).
However the probability functions do not seem to play well with the log() function. If I remove + log(etaID[id]) part or even just remove the log function from the linear predictor + etaID[id], then I can put lambda inside the probability functions, however Stan doesn’t seem to let them accept lambda with a logged term.
Additionally the lambda vector which I have defined seems to have to go in the transformed parameters block and can’t go into the model block, I think because of the log function too.
To get around all of this I have tried to write an exp_gamma_lpdf function, but I don’t think this can work for the frailty constraint because the user defined distribution functions can’t (?) be vectorised. Can anyone confirm?
I would appreciate any suggestions about whether this code can be sped up at all!
First, a general comment that may help members here reply to your question in a quicker an more effective way (I speak for myself, but also out of experience here in the Stan Discourse):
while R is one of the more popular language among Stan users, sometimes there won’t be someone with the specific knowledge of your model and of your preferred language (there usually is, but they may not be around much or have enough time to go through all of the code).
My suggestion is briefly stating the problem in mathematical notation (you can use \LaTeX directly between $ characters) – sometimes you will get help from different people too on different aspects of the model, implementation, etc.
So some specific comments:
the gamma distribution is coming in only as a prior, right? So it’s unlikely that the specific distribution is slowing down the chain
where are these coming from?
This is where a simple mathematical formulation, maybe it’s some odd spacing somewhere, unconventional formulation, or simply lack of familiarity with this specific model, but it’s not clear to me at a glance why you are specifying custom PDFs.
Maybe you can briefly clarify what the data looks like and what a non-frailty model would look like, and it will be easier to see what is needed.
In this model, the hazard or intensity h_i(t) of an event, indexed by patient i is assumed to be of the form: h_i(t) = h_0\exp(X_i\beta)\eta_i
where the patient level random effect \eta_i \sim Gamma(\text{shape}=\phi^{-1}, \text{rate}=\phi^{-1}), with E[\eta_i]=1 and V[\eta_i]=\phi and so that \log(\eta_i) which can be written inside the linear part of the model is exp-gamma distributed, \log(\eta_i) \sim expGamma(\text{shape}=\phi^{-1}, \text{rate}=\phi^{-1}).
[The hazard function is time-homogeneous h_i(t)=h_i, and we can choose to write h_0=\exp(\beta_0) to include an intercept term in the linear part of the model]
This hazard function gives rise to a negative binomial counting process with mean X\beta and a dispersion parameter \phi (sometimes 1/\phi is called the dispersion which is confusing).
So in answer to your questions, without a frailty term, the model would be just h_i(t) = h_0\exp(X_i\beta) an exponential time model (or a constant hazards model) which gives rise to a Poisson counting process. The PDF which I’ve written out isn’t a custom PDF, it’s the (log) PDF of the exponential distribution. This is rather the point of the question - I want to use y_obs~exponential() and target += exponential_lccdf for censored observations in the usual way, but I can’t seem to do this with a +log(etaID[id]) term included in the log rate. The only way I can get the model to fit is to write out the pdf “by hand”, which I would guess is not efficient compared with using the above functions. When I use the native functions I get a “Stan model … does not contain samples” error.
[I realise now, in the model code that I’ve used \lambda to represent a log rate (\lambda=\beta_0+x_1\beta_1+x_2\beta_2+x_3\beta_3+log(\eta)), which is confusing since lambda is generally used to represent a rate - sorry about this.]
The gamma distribution is not a prior (or at least it shouldn’t be), but is a constraint. The random effects are drawn from a centered gamma distribution.
I don’t think that the priors on the betas and inverse phi are important to the question but I’ve chosen to use the following hyperparameters in this instance. \beta \sim N(0, \text{sd}=10) \phi^{-1} \sim N(0, \text{sd}=10)
The way it’s written, you are making etaID an M-dimensional parameter and putting a gamma prior on it. If you need it to be part of the model skeleton I’d say you have to formulate the PDF explicitly in there (an example of introducing heterogeneity in this way that is reminiscent from frailty models can be found here); since you cannot attribute individual frailties, they would have to be integrated at some point to match the observations. It seems like this is different from what you are doing, which is treating frailty as a random effect – in this case I guess you’d have to rely on introducing the shape of the distribution as a prior (similar to a mixed-effects linear regression, where the normality imposed on the random effects appears as a gaussian prior).
I wouldn’t say the approach is not valid, just the assumptions need to be clear, and the consequences for implementation and interpretation.
Yes, that is clear. I meant custom in the sense that you wrote it yourself instead of using the build in options.
Typically, the process of building a model is to get the variables in data and parameters, string together a model (either under transformed parameters or model itself, and the last part of model is assigning priors and the likelihood within the latter block). For this kind of model you should be able to specify the hazard rate and put into a survival function that outputs your predictor, it should be straightforward to wrap that in an exponential likelihood like what you propose.
I’m still not clear on why you cannot do this. It may stem from you trying to assign individual frailties to individuals (i.e. estimating a vector of individual frailties), which I don’t know if it’s what you intend for some reason, but may increase the dimensionality of parameter space unnecessarily. I think what you are missing is writing the linear predictor in a way that includes the frailty effects in aggregate, then instead of writing a custom lPDF you’d just put into a y ~ exponential(y_pred) or something.
I hope that makes sense, it’s been a while with frailty models, I cannot tease apart all of your formulation/implementation right now.
I think there is some miscommunication of terminology here. Sorry for any confusion.
I know that I’m drawing M etaID variates from a gamma distribution. The gamma distribution is centered (at mean 1) and has variance phi - I’m (indirectly) putting a prior on phi. So there is a prior on the variance of the M, gamma-distributed, etaID variates. This is a sort of hierarchical model and it’s the way to specify a random effect. I’m referring to the random effect as a frailty, which is standard in survival models: A tutorial on frailty models - PubMed. I can confirm it is entirely valid, and that I do understand the consequences for interpretation etc. This particular model which I am specifying is exactly equivalent to a negative binomial count model. The negative binomial process arises as a result of putting a gamma distributed mean parameter into a Poisson process - this result is well known. Negative binomial distribution - Wikipedia
Your intuition about integrating out the individual frailties is a good one: this is exactly what a negative binomial count model actually does (only the dispersion parameter is estimated in such a model (i.e. iphi) and it is indeed much faster!). However I’m specifically interested in estimating them.
The model I have fits, and provides the correct estimates. I have checked that an equivalent negative binomial model (using say neg_binomial_2_glm) provides exactly the same results. My question is about optimisation of the existing code, and in particular why I can’t write:
Seems like a simpler problem to solve. There’s a priori (no pun intended) no reason why you couldn’t. \
The issue I can see here is that X\beta has dimensions N and \eta has dimensions M, some dimension shortcuts can be used, but in this case it seems inconsistent. Are you getting an error about that, or more generally, what is the Stan error/warning or issue you are having with writing that directly?
Yeah sorry, I wrote that too fast - I have stated the problem correctly in my question:
You will see if you run the model that lambda is a vector of length n - there is no issue with dimensions. As I explained in the question a (very different) model will actually fit fine if the log function is not used in the definition of the lambda. And as I explained, this is why I tried to program an exp_gamma distribution - but I don’t think it can be vectorised, so this approach, doesn’t, I think, work.
However I’ve now solved the problem, it was indeed simple: I was passing the log rate to the likelihood: y~exponential(), instead of the rate. This lead to some negative values in the exponential dist, which obviously won’t work