# How to fit Latent Class Regression using STAN

Hi everyone, I am trying to build a simple latent class regression. However, my MCMC never converged.

I have one latent class membership model which is determined by covariate X

Pr(C_i = c) = \frac{exp(X\gamma_c)}{\sum exp(X\gamma_c)}

Also, I have an outcome Y is determined by latent class
E(Y_i|C_i=c) = \mu_c

I am trying to estimate \mu_c and \gamma_c jointly using stan. Any suggestion would be greatly appreciated. The problem is not only about label switching when I add the regression part to latent class analysis. The parameter estimation goes crazy to -5000 to 5000 large number.

data {
int<lower=1> G;       // Number of subjects
int<lower=1> K;       // Number of latent classes
vector[G] Y;            // Final Outcome
matrix[G, 2] X1;        // Design matrix for covarite-latent
matrix[G, 1] X3;        // Design matrix for latent-outcome (Intercept)
}

parameters {
matrix[K-1, 2] gamma_raw;

matrix[1, K] mu;
real<lower=0> sigma_Y;  // Residual standard deviation

}

transformed parameters {
matrix[G, K] Yhat;
matrix[K, 2] gamma;
matrix[G, K] logits;
row_vector[2] zeros = rep_row_vector(0, 2);
matrix[G, K] prob;

gamma = append_row(zeros, gamma_raw);
logits = X1*gamma';

// prob of latent class
for (g in 1:G) {
prob[g, ] = to_row_vector(softmax(to_vector(logits[g, ])));
}

// mean of Y
Yhat = X3*mu;
}

model {
for (g in 1:G){
real lmix[K];
for (k in 1: K) {
lmix[k] = log(prob[g, k]) + normal_lpdf(Y[g] |Yhat[g,k], sigma_Y);
}
target += log_sum_exp(lmix);
}
}


Everything looks good to me (minus the lack of priors). Is there clear separation between the classes in your data? And which parameter estimates go crazy? Do they correspond to a class that is getting 0% of the respondents allocated to them? If thatâ€™s the case, then thereâ€™s nothing stopping those parameters from going wherever they please, since they really arenâ€™t changing the likelihood of the model.

Latent class models (Bayesian or otherwise) can run into all sorts of issues, partly driven by the fact that the likelihood can converge to infinity at multiple points (i.e. no maximum likelihood solution). This is one case where priors can be helpful to keep the solution within a reasonable range. As a start, I would suggest shrinking the latent class probability intercepts towards zero so that the model prefers solutions where each class is at least somewhat represented.

2 Likes

I have a few comments on the stan code. Not so much on the factor model, for which @simonbrauer provided good first steps.

For efficiency, you want to minimize reshaping, transposition, promotion to autodiff, etc.

1. Move zeros to transformed data.
transformed data {
row_vector[2] zeros = [0, 0];  // rep_row_vector(0, 2) also OK
}


This will make them data variables (double in C++) so they wonâ€™t undergo expensive and unnecessary autodiff.

1. Use vector types rather than 1 x N or M x 1 matrices.
vector[G] X3;  // was matrix[G, 1]

row_vector[K] mu; // was matrix[1, K] mu;

matrix[G, K] Yhat = X3 * mu;  // this is just an outer product

1. Declare types to avoid transposition.

Rather than matrix[K, 2] gamma, use matrix[2, K] gamma and then you donâ€™t need to transpose gamma.

1. You can append a column of zeros after you multiply for logits in transformed parameters.

2. In the model, initialize lmix to log(prob[g]) above the k loop and then use += in the loop.

1 Like

1. I simulated data by myself in R, so I observed each class number (not very small for each class).

2. I tried prior normal(0,2) for gamma intercept, but the values still go crazy for both beta and gamma.

3. I found one paper that talked about latent class regression using multiple imputation MCMC (Latent class regression: inference and estimation with two-stage multiple imputation). I donâ€™t see any example for stan to do latent class regression. I guess maybe there are some problems with this regression part. Thanks again!

4. I think this post may be helpful for someone also trying stan for latent class regression

Here I provide my simulation code.

gamma = matrix(c(0,0,0, 0, 1.5, 2), nrow = 3, ncol = 2)
beta = matrix(c(0, 1, 2), nrow = 3, ncol = 1)
sigma_Y = 1
# Number of subjects
n = 500
# Exposure
A1 = rbinom(n,1, 0.5)
# X1 matrix
X1 = cbind(intercept = rep(1, n), A1)
# logits value
logits = exp(X1  %*% t(gamma))
prob = logits/rowSums(logits)
# Latent Class Membership
lc = as.matrix(t(apply(prob,
1,
function(p) rmultinom(1, size = 1, prob = p))))
colnames(lc) = c("lc1", "lc2", "lc3")
# X2 matrix
X2 = rep(1, n)
Y_hat = rowSums(X2  * (lc %*% beta))
res_Y = rnorm(n, 0, sigma_Y)
Y = Y_hat + res_Y
data = cbind(A1, Y, lc) %>%
as_tibble() %>%
pivot_longer(cols = "lc1":"lc3", names_to = "lc", values_to = "lc_10") %>%
filter(lc_10 != 0)

ggplot(data,
aes(y = Y, color = lc)) +
geom_boxplot() +
theme_minimal()

data %>% group_by(lc, A1) %>%
summarise(n = n()) %>%
group_by(A1) %>%
mutate(sum = sum(n),
prop = n/sum)


My updated stan code

data {
int<lower=1> G;       // Number of subjects
int<lower=1> K;       // Number of latent classes
vector[G] Y;            // Final Outcome
matrix[G, 2] X1;        // Design matrix for covarite-latent
matrix[G, 1] X3;        // Design matrix for latent-outcome (Intercept)
}

parameters {
matrix[2, K-1] gamma_raw;

matrix[1, K] beta;
real<lower=0> sigma_Y;  // Residual standard deviation

}

transformed parameters {
matrix[G, K] Yhat;
matrix[G, K] logits;
vector[G] zeros = rep_vector(0, G);
matrix[K, G] prob;

logits = append_col(X1*gamma_raw, zeros);

// prob of latent class
for (g in 1:G) {
prob[, g] = softmax(to_vector(logits[g, ]));
}

// mean of Y
Yhat = X3*beta;
}

model {
// Prior
gamma_raw[1,] ~ normal(0, 2);

for (g in 1:G){
vector[K] lmix = log(prob[,g]);
for (k in 1: K) {
lmix[k] += normal_lpdf(Y[g] |Yhat[g,k], sigma_Y);
}
target += log_sum_exp(lmix);
}
}