Non-exchangeable mixture model. Problem with underflow?

Dear Stan community - I’m hoping for some competent eyes on a modeling problem that I haven’t been able to figure out. First some context for the modeling problem and the data:

Therapist-assisted internet-based cognitive behavioral therapy is increasingly being used to provide treatment for a number of mental disorders. However, dropout from treatment is quite high. Identifying patients at risk of dropout would allow treatment providers to target these patients with extra intervention to increase adherence to the treatment. On the other hand, some patients who drop out may simply be rapid responders, who stop using the programme because they have benefited sufficiently from less treatment activity. Sudden gains in therapy is a known phenomenon, and related to good outcomes. Our data are repeated symptom ratings completed by patients over time, until the time they stop using the programme (drop-out).

We’d like to try to model this with a mixture model, where there is both a latent growth curve and a survival component, and with a set of predictor variables on group membership. The idea is to couple different trajectories of symptoms over time to different probabilities of dropping out for each group in the mixture, and then having variables that predict group membership. As exchangeable mixture models have well known difficulties (e. g. Forcing separation in location parameters for Gaussian mixture models - Modeling - The Stan Forums (, we’ll be using non-exchangeable prior information, making assumptions about the number and kind of groups based on theory and clinical observation, and comparing the fit of various model configurations.

My current problem is that while this model will compile and run for a very small N (say 5 patients, of course returning a posterior pretty much like the priors), but with a larger N it underflows to negative infinity. I’ve tested it with some print() statements, and I think there is something about the way the mixture model iterates over N and G. There is either some error in the implementation here, or I’m trying to do something impossible. Given that I’m primarily a child psychologist, and not a programmer or statistician, it’s likely the former, but I’m not able to figure it out, so I’d be grateful for any comments or thoughts!


  int N; // subjects in dataset
  int G; // number of groups in mixture
  int D; // number of predictor variables on group membership
  int nrow; // number of observations across subjects and time
  array[N] int ind_pos; // index of the first observation of each subject
  array[N] int ind_nobs; // index of the number of observations of each subject
  vector[nrow] obs; // symptom observations in long format
  vector[nrow] time; // time variable
  vector[N] last_obs; // standardized last observation time for dropout part of the model
  vector[N] cens; // censoring indicator for the dropout part of the model
  matrix[N, D] x; // matrix of predictor variables for group membership
  matrix[G, 2] priors_slopes;
  matrix[G, 2] priors_shape;


  // parameters for multi-logit model on class
  matrix [D, G-1] beta_raw;
  row_vector[G-1] alpha_raw;
  // parameters for growth curve model
  ordered[G] slopes;
  vector[N] icpt_subj;
  real<lower=0> subj_var;
  vector<lower=0> [G] errors;
  // parameters for survival model
  vector<lower=0> [G] drop_shape;
  vector<lower=0> [G] drop_scale;

transformed parameters{

  // constructing coefficient matrix, with all coefficients for the class with
  // the lowest slope parameter set to 0
  matrix[D + 1, G] beta = append_col(
                                zeros_vector(D + 1),
                                append_row(alpha_raw, beta_raw));

  // adding intercept predictor term to predictor matrix and
  // multiplying by coefficient matrix
  matrix[N, G] beta_x = append_col(ones_vector(N), x) * beta;
  alpha_raw ~ normal(0, 3); 
  to_vector(beta_raw) ~ normal(0, 3);
  // priors on growth model parameters and dropout model parameters
  for (g in 1:G){

    target += normal_lpdf(slopes[g] | priors_slopes[g,1], priors_slopes[g,2]);
    target += normal_lpdf(drop_shape[g] | priors_shape[g,1], priors_shape[g,2]);
    target += normal_lpdf(drop_scale[g] | 1, 1); // days to dropout has been standardized
  icpt_subj ~ normal(0, subj_var);
  subj_var ~ student_t(3, 0, 1);
  errors ~ inv_gamma(1, 1);
   // multinomial/mixture model
  // the approach here is to first take the log of the mixture weights predicted 
  // by the multinomial model for each subject, connecting the model for predicting
  // group membership to the mixture model
  for (n in 1:N){
    vector[G] lpn = log_softmax(beta_x[n]'); // then the mixture weights per case is initiated
  // then the observations and the time variable are segmented for each subject
    vector[ind_nobs[n]] obs_segm = segment(obs, ind_pos[n], ind_nobs[n]);
    vector[ind_nobs[n]] time_segm = segment(time, ind_pos[n], ind_nobs[n]);
  // and then these segmented variables are used to calculate predicted values
  // conditional on group membership, and used in the growth curve model. The
  // resulting log-density is added to added to the vector of log mixture weights
    for(g in 1:G){
      vector[ind_nobs[n]] obs_pred = 
        rep_vector(icpt_subj[n], ind_nobs[n]) + time_segm * slopes[g];
      lpn[g] += normal_lpdf(obs_segm | obs_pred, errors[g]);
  // dropout component of the model
      if(cens[n] == 0){
        lpn[g] += weibull_lpdf(last_obs[n] | drop_shape[g], drop_scale[g]);
      else { // log complementary cumulative distribution function to account for censoring
        lpn[g] += weibull_lccdf(last_obs[n] | drop_shape[g], drop_scale[g]);
  // and then the weighted log-likelihoods are added to target, still within the loop over N
    target += log_sum_exp(lpn);

} // end of model block

generated quantities{
  // computing the predicted probabilities of group membership per case
  array[N] vector[G] group_pred;
  matrix[N, G] beta_x = append_col(ones_vector(N), x) * beta;
  for (i in 1:N){
    group_pred[i] = softmax(beta_x[i]');
} // end of generated quantities

Edits: Update a bit of code to current version, clarify somwhat more about modeling approach.

Just to give some generic advice: I recommend simulating fake data from your model and checking that the program can recover the parameters when you have enough data. Also, I recommend using informative priors for everything in the model. Also first try G=1 and then G=2 to make sure that the basics work, before going and fitting the general mixture model. Good luck!

1 Like

Thanks, @andrewgelman! That’s solid and always useful advice - fortunately for me, I’ve been following your blog for some time, so I was already doing that sort of workflow. I’ve not yet looked at the actual data I’d like to model, in fact, but I didn’t post my code for simulating data. I guess I took it to be self-evident that I was developing the model on synthetic data!