How to fit Latent Class Regression using STAN

Hi everyone, I am trying to build a simple latent class regression. However, my MCMC never converged.

I have one latent class membership model which is determined by covariate X

Pr(C_i = c) = \frac{exp(X\gamma_c)}{\sum exp(X\gamma_c)}

Also, I have an outcome Y is determined by latent class
E(Y_i|C_i=c) = \mu_c

I am trying to estimate \mu_c and \gamma_c jointly using stan. Any suggestion would be greatly appreciated. The problem is not only about label switching when I add the regression part to latent class analysis. The parameter estimation goes crazy to -5000 to 5000 large number.

data {
  int<lower=1> G;       // Number of subjects
  int<lower=1> K;       // Number of latent classes
  vector[G] Y;            // Final Outcome
  matrix[G, 2] X1;        // Design matrix for covarite-latent
  matrix[G, 1] X3;        // Design matrix for latent-outcome (Intercept)
}


parameters {
  matrix[K-1, 2] gamma_raw;  
  
  matrix[1, K] mu;        
  real<lower=0> sigma_Y;  // Residual standard deviation
  
}

transformed parameters {
  matrix[G, K] Yhat;
  matrix[K, 2] gamma;
  matrix[G, K] logits; 
  row_vector[2] zeros = rep_row_vector(0, 2);
  matrix[G, K] prob;
  
  gamma = append_row(zeros, gamma_raw);
  logits = X1*gamma';

  // prob of latent class
  for (g in 1:G) {
    prob[g, ] = to_row_vector(softmax(to_vector(logits[g, ])));
  }
  
  // mean of Y
  Yhat = X3*mu;
}


model {
  for (g in 1:G){
    real lmix[K];
    for (k in 1: K) {
      lmix[k] = log(prob[g, k]) + normal_lpdf(Y[g] |Yhat[g,k], sigma_Y);
               }
    target += log_sum_exp(lmix);
  }
}

Everything looks good to me (minus the lack of priors). Is there clear separation between the classes in your data? And which parameter estimates go crazy? Do they correspond to a class that is getting 0% of the respondents allocated to them? If that’s the case, then there’s nothing stopping those parameters from going wherever they please, since they really aren’t changing the likelihood of the model.

Latent class models (Bayesian or otherwise) can run into all sorts of issues, partly driven by the fact that the likelihood can converge to infinity at multiple points (i.e. no maximum likelihood solution). This is one case where priors can be helpful to keep the solution within a reasonable range. As a start, I would suggest shrinking the latent class probability intercepts towards zero so that the model prefers solutions where each class is at least somewhat represented.

2 Likes

I have a few comments on the stan code. Not so much on the factor model, for which @simonbrauer provided good first steps.

For efficiency, you want to minimize reshaping, transposition, promotion to autodiff, etc.

  1. Move zeros to transformed data.
transformed data {
  row_vector[2] zeros = [0, 0];  // rep_row_vector(0, 2) also OK
}

This will make them data variables (double in C++) so they won’t undergo expensive and unnecessary autodiff.

  1. Use vector types rather than 1 x N or M x 1 matrices.
vector[G] X3;  // was matrix[G, 1]

row_vector[K] mu; // was matrix[1, K] mu;   

matrix[G, K] Yhat = X3 * mu;  // this is just an outer product
  1. Declare types to avoid transposition.

Rather than matrix[K, 2] gamma, use matrix[2, K] gamma and then you don’t need to transpose gamma.

  1. You can append a column of zeros after you multiply for logits in transformed parameters.

  2. In the model, initialize lmix to log(prob[g]) above the k loop and then use += in the loop.

1 Like

Appreciate your reply!

  1. I simulated data by myself in R, so I observed each class number (not very small for each class).

  2. I tried prior normal(0,2) for gamma intercept, but the values still go crazy for both beta and gamma.

  3. I found one paper that talked about latent class regression using multiple imputation MCMC (Latent class regression: inference and estimation with two-stage multiple imputation). I don’t see any example for stan to do latent class regression. I guess maybe there are some problems with this regression part. Thanks again!

  4. I think this post may be helpful for someone also trying stan for latent class regression

Here I provide my simulation code.

gamma = matrix(c(0,0,0, 0, 1.5, 2), nrow = 3, ncol = 2)
beta = matrix(c(0, 1, 2), nrow = 3, ncol = 1)
sigma_Y = 1
# Number of subjects
n = 500
# Exposure
A1 = rbinom(n,1, 0.5)
# X1 matrix
X1 = cbind(intercept = rep(1, n), A1)
# logits value
logits = exp(X1  %*% t(gamma))
prob = logits/rowSums(logits)
# Latent Class Membership
lc = as.matrix(t(apply(prob, 
                       1, 
                       function(p) rmultinom(1, size = 1, prob = p))))
colnames(lc) = c("lc1", "lc2", "lc3")
# X2 matrix
X2 = rep(1, n)
Y_hat = rowSums(X2  * (lc %*% beta))
res_Y = rnorm(n, 0, sigma_Y) 
Y = Y_hat + res_Y
data = cbind(A1, Y, lc) %>% 
  as_tibble() %>% 
  pivot_longer(cols = "lc1":"lc3", names_to = "lc", values_to = "lc_10") %>% 
  filter(lc_10 != 0)

ggplot(data,
       aes(y = Y, color = lc)) +
  geom_boxplot() +
  theme_minimal()

data %>% group_by(lc, A1) %>% 
  summarise(n = n()) %>% 
  group_by(A1) %>% 
  mutate(sum = sum(n),
         prop = n/sum)

My updated stan code

data {
  int<lower=1> G;       // Number of subjects
  int<lower=1> K;       // Number of latent classes
  vector[G] Y;            // Final Outcome
  matrix[G, 2] X1;        // Design matrix for covarite-latent
  matrix[G, 1] X3;        // Design matrix for latent-outcome (Intercept)
}


parameters {
  matrix[2, K-1] gamma_raw;  
  
  matrix[1, K] beta;        
  real<lower=0> sigma_Y;  // Residual standard deviation
  
}

transformed parameters {
  matrix[G, K] Yhat;
  matrix[G, K] logits; 
  vector[G] zeros = rep_vector(0, G);
  matrix[K, G] prob;
  
  logits = append_col(X1*gamma_raw, zeros);
  
  // prob of latent class
  for (g in 1:G) {
    prob[, g] = softmax(to_vector(logits[g, ]));
  }
  
  // mean of Y
  Yhat = X3*beta;
}


model {
  // Prior
  gamma_raw[1,] ~ normal(0, 2);
  
  for (g in 1:G){
    vector[K] lmix = log(prob[,g]);
    for (k in 1: K) {
      lmix[k] += normal_lpdf(Y[g] |Yhat[g,k], sigma_Y);
               }
    target += log_sum_exp(lmix);
  }
}
  

Thanks for your reply. I have updated my code accordingly.

The data generation code is really helpful. I think what’s going on here is that the classes are poorly defined (see plot below), so there isn’t much cost only having 1 or 2 classes. So if the sampler starts with, effectively, 2 classes, there won’t be much to pull that third class in.

So from a data perspective, you could try making the classes more distinct, either by shrinking the residual variance or increasing the mean differences. From a model perspective, I think the only solution is to put stronger priors on the coefficients. Both approaches work for me. As a gut-check, you might do a quick Frequentist LCA with 1-, 2-, and 3-class solutions and compare the fits. My guess is that you original data doesn’t actually need 3 classes, which would explain why one class’ parameters are exploding.