Dear Stan community - I’m hoping for some competent eyes on a modeling problem that I haven’t been able to figure out. First some context for the modeling problem and the data:
Therapist-assisted internet-based cognitive behavioral therapy is increasingly being used to provide treatment for a number of mental disorders. However, dropout from treatment is quite high. Identifying patients at risk of dropout would allow treatment providers to target these patients with extra intervention to increase adherence to the treatment. On the other hand, some patients who drop out may simply be rapid responders, who stop using the programme because they have benefited sufficiently from less treatment activity. Sudden gains in therapy is a known phenomenon, and related to good outcomes. Our data are repeated symptom ratings completed by patients over time, until the time they stop using the programme (drop-out).
We’d like to try to model this with a mixture model, where there is both a latent growth curve and a survival component, and with a set of predictor variables on group membership. The idea is to couple different trajectories of symptoms over time to different probabilities of dropping out for each group in the mixture, and then having variables that predict group membership. As exchangeable mixture models have well known difficulties (e. g. Forcing separation in location parameters for Gaussian mixture models - Modeling - The Stan Forums (mc-stan.org)), we’ll be using non-exchangeable prior information, making assumptions about the number and kind of groups based on theory and clinical observation, and comparing the fit of various model configurations.
My current problem is that while this model will compile and run for a very small N (say 5 patients, of course returning a posterior pretty much like the priors), but with a larger N it underflows to negative infinity. I’ve tested it with some print() statements, and I think there is something about the way the mixture model iterates over N and G. There is either some error in the implementation here, or I’m trying to do something impossible. Given that I’m primarily a child psychologist, and not a programmer or statistician, it’s likely the former, but I’m not able to figure it out, so I’d be grateful for any comments or thoughts!
data{
int N; // subjects in dataset
int G; // number of groups in mixture
int D; // number of predictor variables on group membership
int nrow; // number of observations across subjects and time
array[N] int ind_pos; // index of the first observation of each subject
array[N] int ind_nobs; // index of the number of observations of each subject
vector[nrow] obs; // symptom observations in long format
vector[nrow] time; // time variable
vector[N] last_obs; // standardized last observation time for dropout part of the model
vector[N] cens; // censoring indicator for the dropout part of the model
matrix[N, D] x; // matrix of predictor variables for group membership
matrix[G, 2] priors_slopes;
matrix[G, 2] priors_shape;
}
parameters{
// parameters for multi-logit model on class
matrix [D, G-1] beta_raw;
row_vector[G-1] alpha_raw;
// parameters for growth curve model
ordered[G] slopes;
vector[N] icpt_subj;
real<lower=0> subj_var;
vector<lower=0> [G] errors;
// parameters for survival model
vector<lower=0> [G] drop_shape;
vector<lower=0> [G] drop_scale;
}
transformed parameters{
// constructing coefficient matrix, with all coefficients for the class with
// the lowest slope parameter set to 0
matrix[D + 1, G] beta = append_col(
zeros_vector(D + 1),
append_row(alpha_raw, beta_raw));
}
model{
// adding intercept predictor term to predictor matrix and
// multiplying by coefficient matrix
matrix[N, G] beta_x = append_col(ones_vector(N), x) * beta;
alpha_raw ~ normal(0, 3);
to_vector(beta_raw) ~ normal(0, 3);
// priors on growth model parameters and dropout model parameters
for (g in 1:G){
target += normal_lpdf(slopes[g] | priors_slopes[g,1], priors_slopes[g,2]);
target += normal_lpdf(drop_shape[g] | priors_shape[g,1], priors_shape[g,2]);
target += normal_lpdf(drop_scale[g] | 1, 1); // days to dropout has been standardized
}
icpt_subj ~ normal(0, subj_var);
subj_var ~ student_t(3, 0, 1);
errors ~ inv_gamma(1, 1);
// multinomial/mixture model
// the approach here is to first take the log of the mixture weights predicted
// by the multinomial model for each subject, connecting the model for predicting
// group membership to the mixture model
for (n in 1:N){
vector[G] lpn = log_softmax(beta_x[n]'); // then the mixture weights per case is initiated
// then the observations and the time variable are segmented for each subject
vector[ind_nobs[n]] obs_segm = segment(obs, ind_pos[n], ind_nobs[n]);
vector[ind_nobs[n]] time_segm = segment(time, ind_pos[n], ind_nobs[n]);
// and then these segmented variables are used to calculate predicted values
// conditional on group membership, and used in the growth curve model. The
// resulting log-density is added to added to the vector of log mixture weights
for(g in 1:G){
vector[ind_nobs[n]] obs_pred =
rep_vector(icpt_subj[n], ind_nobs[n]) + time_segm * slopes[g];
lpn[g] += normal_lpdf(obs_segm | obs_pred, errors[g]);
// dropout component of the model
if(cens[n] == 0){
lpn[g] += weibull_lpdf(last_obs[n] | drop_shape[g], drop_scale[g]);
}
else { // log complementary cumulative distribution function to account for censoring
lpn[g] += weibull_lccdf(last_obs[n] | drop_shape[g], drop_scale[g]);
}
}
// and then the weighted log-likelihoods are added to target, still within the loop over N
target += log_sum_exp(lpn);
}
} // end of model block
generated quantities{
// computing the predicted probabilities of group membership per case
array[N] vector[G] group_pred;
matrix[N, G] beta_x = append_col(ones_vector(N), x) * beta;
for (i in 1:N){
group_pred[i] = softmax(beta_x[i]');
}
} // end of generated quantities
Edits: Update a bit of code to current version, clarify somwhat more about modeling approach.