Growth Mixture Modeling (Latent Class Linear Mixed Model)

I’m trying to fit what I would call a “growth mixture model” (which I think is sometimes called a “latent class linear mixed model”). A binary outcome is measured at multiple timepoints for multiple participants. Each participant contributes a different number of observations, and the observations are not evenly spaced. The goal is to model trends in the probability of experiencing the binary outcome over time. We believe that there are a finite number of common trajectory classes, but we don’t know which participants fall in which class. So we need to combine mixture modeling (to handle the latent classes) with multilevel modeling (to handle repeated observations on the same participants).

I will try to describe the model likelihood formally: Let i index the participant and j index the observation. Let y_{ij} be the binary outcome and X the covariate matrix (here, the only covariates are polynomials of time). Let k be a categorical variable indicating membership in a particular latent class, and p be a simplex vector of latent class proportions. Let \beta_{ik} be the vector of regression coefficients for participant i if s/he is in class k. Finally, let \gamma_k and \sigma_k be the mean vector and covariance matrix of the multivariate normal from which participants’ \beta_{ik} are drawn if they are in class k. Then:

y_{ij}|k \sim Bernoulli(\mu_{ijk})
logit(\mu_{ijk}) = X_{ij} \beta_{ik}
\beta_{ik} \sim Gaussian(\gamma_{k},\sigma^2_{k})
k \sim Categorical(p)

Now here is some Stan code attempting to fit the model.

data {
  int<lower=0> I; // number of observations
  int<lower=0> J; // number of participants
  int<lower=1> K; // number of mixture components
  int<lower = 1,upper = J> id[I]; // id vector that maps observations to participants
  int<lower=0,upper=1> use[I]; // outcome
  matrix[I,4] model_matrix; // time covariates
parameters {
  simplex[K] lambda; // weight for each component of mixture
  matrix[J,4] beta[K]; // each participants’ coefficients, conditional on mixture
  vector[4] beta_mean[K]; // grand mean of coefficients, conditional on mixture
  cov_matrix[4]<lower=0> beta_variance[K]; // variance of coefficients, conditional on mixture
model {
  real lps[K];
  for(i in 1:I) { // loop over participants
    for(k in 1:K) { // loop over mixture components
      las[k] = log(lambda[k]);
      lps[k] += bernoulli_logit_lpmf(use[i]|model_matrix[i]*beta[k][id[i]]) +
      // add log probability of mixture
    target += log_sum_exp(lps); // update log probability of model by combining both mixture components

My questions:

  1. Does this actually fit the model I described? I can’t quite figure out how to loop over the participants, or correctly marginalize out the latent classes.
  2. Is there a way to make this code more efficient?
  3. Are there smart, non-informative prior choices? (Right now, I think the code gives everything a flat prior by default?)

This is my first Stan project, so thanks in advance for your guidance and your patience.

(P.S. If anyone knows any Stan users at Johns Hopkins, please let me know.)

I was having trouble connecting the text description to the generative model. I’m pretty sure each participant i gets a class k_i and each participant only gets one \beta_i (rather than one for each k). So I think the generative story should start with priors on the parameters:

p \sim \ ?

\gamma_k \sim \ ?

\sigma_k \sim \ ?

Then move onto generating the latent classes and coefficients

k_i \sim \mathsf{Categorical}(p)

\beta_i \sim \mathsf{Normal}(\gamma_{k[i]}, \sigma_{k[i]})

y_{i, j} \sim \mathsf{BernoulliLogit}(X_{i, j} \cdot \beta_{i})

I’m using Stan’s normal with scale (standard deviation) parameterization and the built-in Bernoulli on the log-odds scale.

Was that your intention?

If so, the normal for beta needs to be pulled out and then indexed with beta[id[i]].

was this resolved? I am interested these types of models but can’t follow that code.