Identification issues with Latent Class Analysis/Mixture model

Hi all

I am working on creating a general framework to work with mxiture models, specifically with Latent Class Analysis (LCA) and its extensions.
Now, I am working on a LCA model, using the bernoulli_logit_lpmf functions from Stan. This model so far works well when I estimate 2 clases, but when i estimate more than classes it generates bimodal posteriors.
I have looked at the post about Mixture Models Identification, and the solutions there dont work so far.
What I have done so far is to constraint the mixing proporcions to be ordered, as in this example ordered simplex. In addition to this, I have defined hyper priors for the parameters.
With these definition, the model provides good results for 2 classes, but cant figure out how to define it for larger number of classes.


Here is the Stan model so far an 9 item example

data{ 
  int<lower=1> I;                    // number of items
  int<lower=1> J;                    // number of respondents
  int<lower=0, upper=1> y[J,I];      // score for obs n
  int<lower=1> C;           // # of attribute profiles (latent classes) 
}
parameters {
  matrix[I,C] pi;   // log-odds of answering 1 for each item in each class
  positive_ordered[C] lambda; 
  real mu_pi; // prior mean for the log-odds
  real<lower=0> sigma_pi; // prior sd for the log-odds
  real<lower=0> alpha_lambda; // prior alpha for lambda
  real<lower=0> beta_lambda; // prior beta for lambda
}

transformed parameters {
vector[C] Ys[J]; // sum log for each subject for each class
matrix[I,C] pi_probs; // probability of answering 1 for each item on each class
simplex[C] nu = lambda / sum(lambda); // ordered mixing proportions

for(c in 1:C){
for(j in 1:J){
Ys[j,c] = bernoulli_logit_lpmf(y[j,1] | pi[1,c])+
          bernoulli_logit_lpmf(y[j,2] | pi[2,c])+
          bernoulli_logit_lpmf(y[j,3] | pi[3,c])+
          bernoulli_logit_lpmf(y[j,4] | pi[4,c])+
          bernoulli_logit_lpmf(y[j,5] | pi[5,c])+
          bernoulli_logit_lpmf(y[j,6] | pi[6,c])+
          bernoulli_logit_lpmf(y[j,7] | pi[7,c])+
          bernoulli_logit_lpmf(y[j,8] | pi[8,c])+
          bernoulli_logit_lpmf(y[j,9] | pi[9,c]);
}
}

pi_probs = inv_logit(pi);

}

model{

  // Priors

for(i in 1:I){
pi[i] ~ normal(mu_pi, sigma_pi);
}
target += gamma_lpdf(lambda | alpha_lambda, beta_lambda);
  mu_pi ~ normal(0,2);
  sigma_pi ~ cauchy(0,2.5);
  alpha_lambda ~ cauchy(1,2.5);
  beta_lambda ~ cauchy(1,2.5);
  
  target += log_mix(nu, Ys);

}

And here is the R code to run this example with a data set

library(poLCA)
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

### https://stats.idre.ucla.edu/mplus/dae/latent-class-analysis/

dat <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/lca1.dat",
                  header=F)

colnames(dat) <- c("id", paste0("item",1:9))
head(dat)

apply(dat[,2:10], 2, table)

## 9 item data
dat2 <- dat[,2:10]
head(dat2)
apply(dat2, 2, table)

wide_data <- list(
  I = ncol(dat2),
  J = nrow(dat2),
  y = dat2,
  C = 3
)

pars <- c("nu","pi","pi_probs")
iter <- 6000
warmup <- iter - 1000
wide_model_bn <- stan(model_code=lca_wide_bn, 
                   data = wide_data,
                   pars=pars,
                   chains = 3, 
                   iter = iter,
                   warmup = warmup,
                   control = list(adapt_delta = 0.99))
wide_model_bn

sm_lca <- data.frame(summary(wide_model_bn)$summary)
sm_lca

bayesplot::mcmc_areas(wide_model_bn, 
                      pars = rownames(sm_lca)[4:10])

### Here is the solution from the poLCA package
ch3 <- poLCA(cbind(item1, item2, item3, item4, item5, item6, item7, item8, item9)~1,
             dat2+1,nclass=3, maxiter=20000)
ch3

Some of the bimodal posteriors look like this

One last thing I tried to identified the model, but made no difference, was to restric the log-odds for 1 item to be ordered, intending for it to define the direction of the classifications.

Any help would be appreciated!

1 Like

I’ve never worked with latent class allocation, so I might be missing something, but if I understand the model correctly, I fear this type of models (including also clustering etc.) is unfortunately hopelessly multimodal in full generality. Enforcing an ordering of thecomponents prevents “label switching” (posteriors that are qualitatively the same, but differ in the labels assigned to classes/clusters/components). But even this gets tricky with more than one latent dimension as there is no general ordering. If your only problem is label switching, there is anR package (which I never used personally) to post-process your chains and make the labels match: CRAN - Package label.switching

But there can be more problems than label switching.
A simple example: imagine the data being generated from three latent classes roughly equidistant in the latent space (or by chance looking as-if it was generated this way) and you fit a two-class model. Then you are likely to get three modes, each squashing together different pairs of classes. Those are three qualitatively different modes that imply different conclusions.

So unfortunately, proper uncertainty quantification can be very hard for such models as the uncertainty might encompass multiple qualitatively different patterns and I am not aware of any method to fit such models reliably (except maybefor exhaustive enumeration of all possible class assignments)

In general these will be diffcult as @martinmodrak points out. You already know that constraints can help, like adding the positive_ordered on lambda.

Looking at the bayesplots suggests an additional ordered constraint on pi. If I do that the bimodality disappears and I was able to run the program with adapt_delta = 0.8 without any issues. I removed the cauchy priors to make sure that wasn’t an additional issue and some other cosmetic adjustments.

data{ 
  int<lower=1> I;                    // number of items
  int<lower=1> J;                    // number of respondents
  int<lower=0, upper=1> y[J,I];      // score for obs n
  int<lower=1> C;           // # of attribute profiles (latent classes) 
}
parameters {
  ordered[C] pi[I];   // log-odds of answering 1 for each item in each class
  positive_ordered[C] lambda; 
  real mu_pi; // prior mean for the log-odds
  real<lower=0> sigma_pi; // prior sd for the log-odds
  real<lower=0> alpha_lambda; // prior alpha for lambda
  real<lower=0> beta_lambda; // prior beta for lambda
}

transformed parameters {
vector[C] Ys[J]; // sum log for each subject for each class
matrix[I, C] pi_probs; // probability of answering 1 for each item on each class
simplex[C] nu = lambda / sum(lambda); // ordered mixing proportions

for(c in 1:C) {
  pi_probs[, c] = to_vector(pi[, c]);
  for(j in 1:J) {
    Ys[j, c] = 0;
    for (i in 1:I) {
      Ys[j, c] += bernoulli_logit_lpmf(y[j, i] | pi[i, c]);
    }
  }
}

}

model{
// Priors
  for (i in 1:I)
    pi[i] ~ normal(mu_pi, sigma_pi);

  lambda ~ gamma(alpha_lambda, beta_lambda);
  
  mu_pi ~ normal(0,2);
  sigma_pi ~ exponential(1);
  alpha_lambda ~ gamma(2, 0.1);
  beta_lambda ~ gamma(2, 0.1);
  
  target += log_mix(nu, Ys);
}

The bayesplot looks like:

Though nu becomes much less spread out (due to the reduction in the bimodality, I’m guessing) and is quite different than polCA

> mod_out$summary("nu")
# A tibble: 3 x 10
  variable  mean median     sd    mad    q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 nu[1]    0.301  0.306 0.0237 0.0199 0.259 0.328  1.00     560.     695.
2 nu[2]    0.340  0.337 0.0137 0.0110 0.321 0.363  1.00     978.     612.
3 nu[3]    0.360  0.357 0.0168 0.0154 0.339 0.388  1.00     550.     570.

I ran with cmdstan 2.27

dat <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/lca1.dat",
                header=F)

colnames(dat) <- c("id", paste0("item",1:9))
head(dat)

apply(dat[,2:10], 2, table)

## 9 item data
dat2 <- dat[,2:10]
head(dat2)
apply(dat2, 2, table)

wide_data <- list(
  I = ncol(dat2),
  J = nrow(dat2),
  y = dat2,
  C = 3
)

fp <- file.path("~/latent_class.stan")
lc_mod <- cmdstan_model(fp, force_recompile = T)
mod_out <- lc_mod$sample(
  data = wide_data,
  chains = 2,
  init = 2,
  adapt_delta = 0.8,
  parallel_chains = 4,
  iter_warmup = 400,
  iter_sampling = 400
)

mod_out$summary("nu")
dr <- mod_out$draws(c("pi[3,1]", "pi[3,2]", "pi[3,3]", 
                "pi[4,1]", "pi[4,2]", "pi[4,3]"))

bayesplot::mcmc_areas(dr)

### Here is the solution from the poLCA package
ch3 <- poLCA(cbind(item1, item2, item3, item4, item5, item6, item7, item8, item9)~1,
             dat2+1,nclass=3, maxiter=20000)
ch3

1 Like

I have been working with this type of model in Stan for the last year. I haven’t found a single solution that always works. It is always about fiddling around with the model for a particular dataset. I’ve had some luck recently with ordered priors rather than ordered constraints but I suspect that it is also problem-specific.

I do wonder whether your shared, estimated prior on pi might be squeezing the classes together and making them switch more. Maybe worth testing your current model against one that presets the prior to something sufficiently wide.

Note that this is not just an identifying assumption but a much more restricted model. It assumes the logits are ordered across classes for all items. That might be why the results differ from poLCA.

A middle ground would be to order only the first item and allow the others to be freely estimated. In practice, you might then want to reorder the items so that the most distinctive item is first. Otherwise you could end up putting an ordering constraint on an item whose means overlap significantly.

2 Likes

@martinmodrak Was afraid something like this would be the case. Reading from the book /Bayesian Psychometric Modeling by Roy Levy, Robert J. Mislevy; they describe that either ordering the latent classes or a single item would identify the model. But this have only worked for smaller simpler examples

Will check this packege in case the label swicthing helps enough. Because we have use a trick like this for factor analysis, but its easier to adjust for continues factors, as it needs to switch the sign of factor loadings, but the adjustment is more complicated with categorical factors

@spinkney constraining all the pi to be ordered would be a really string constraint, as we expect different items to have diffeent relations with the latent class allocation. A suggestion has been to fix only one item to be ordered, but this hasnt worked consistently either. This is probably why the results are so different from poLCA, for this case we would expect to see similar results

@simonbrauer this is not encouraging, but good to know I am not alone

I have also tried this as suggested by Levy and Mislevy book, but it hasnt worked consistently. I have seen this to be sensitive to starting values as well, as running the same model once worked, then runed it again and then it didnt worked, with the only meaningful difference being the random starting values

1 Like

Mixture models are evil and difficult to identify. This rabbit hole is deep but I found myself sucked in again.

@simonbrauer thanks for clarifying the model and why that restriction is too much.

I played around more and I got something to work without that restriction and seems to work on different data sets with class sizes of 3 and 4. This most likely will fail if the classes collapse on top of each other (though that suggests fewer classes, maybe?). Which means that it will be harder to identify as the number of classes goes up.

I added pi as a simplex and a fairly strong priors on lambda to keep the classes separated. I tested on the data provided by @Mauricio_Garnier-Villarre and the values and carcinoma data in the poLCA package. To compare to the poLCA package I rebased the “item” probs to sum to 1 over the classes, ie

ch3 <- poLCA(cbind(item1, item2, item3, item4, item5, item6, item7, item8, item9)~1,
             dat2+1,nclass=3, maxiter=20000)
ch3
pol_output <- matrix(0, 9, 3)

j <- 1
for (i in ch3$probs){
  pol_output[j, ] <- t(i[, 1] / sum(i[, 1]))
  j <- j + 1
}
# now compare to stan
pol_output
stan_out <- matrix(mod_out$summary("pi")$mean, 9 ,3)
stan_out

However, I did change the sum-to-1 using a softmax instead of the rebasing in the stan program. Apparently it works though I’ve seen something in the forums about this being unidentified…

data{ 
  int<lower=1> I;                    // number of items
  int<lower=1> J;                    // number of respondents
  int<lower=0, upper=1> y[J,I];      // score for obs n
  int<lower=1> C;           // # of attribute profiles (latent classes) 
}
parameters {
  simplex[C] pi[I]; 
  positive_ordered[C] lambda;
}

transformed parameters {
  vector[C] Ys[J]; // sum log for each subject for each class
  vector[C] nu = softmax(lambda);

  for(c in 1:C) {
    for(j in 1:J) {
      Ys[j, c] = 0;
      for (i in 1:I) {
        Ys[j, c] += bernoulli_lpmf(y[j, i] | pi[i, c]);
      }
    }
  }
}
model{
  lambda[1] ~ normal(0, 0.05);
  for (c in 2:C)
    lambda[c] ~ normal(lambda[c - 1], 0.1) T[lambda[c - 1], ];

  target += log_mix(nu, Ys);
}

Below is the R code to run the different examples:

fp <- file.path("~/latent_class_pi.stan")
lc_mod <- cmdstan_model(fp, force_recompile = T)
mod_out <- lc_mod$sample(
  data = wide_data,
  chains = 4,
  init = 0.5,
  adapt_delta = 0.9,
  parallel_chains = 4,
  iter_warmup = 400,
  iter_sampling = 400
)
dr_nu <- mod_out$draws("nu")
bayesplot::mcmc_areas(dr_nu)
dr <- mod_out$draws(c("pi[3,1]", "pi[3,2]", "pi[3,3]", 
                "pi[4,1]", "pi[4,2]", "pi[4,3]"))

bayesplot::mcmc_areas(dr)

### Here is the solution from the poLCA package
ch3 <- poLCA(cbind(item1, item2, item3, item4, item5, item6, item7, item8, item9)~1,
             dat2+1,nclass=3, maxiter=20000)

pol_output <- matrix(0, 9, 3)

j <- 1
for (i in ch3$probs){
  pol_output[j, ] <- t(i[, 1] / sum(i[, 1]))
  j <- j + 1
}
row.names(pol_output) <- paste0("item", 1:9)

pol_output
stan_out <- matrix(mod_out$summary("pi")$mean, 9 ,3)
stan_out

### example 2 with "values" data
data(values)
f <- cbind(A,B,C,D)~1

wide_data <- list(
  I = ncol(values),
  J = nrow(values),
  y = values - 1,
  C = 3
)

mod_out <- lc_mod$sample(
  data = wide_data,
  chains = 4,
  init = 0.5,
  adapt_delta = 0.9,
  parallel_chains = 4,
  iter_warmup = 400,
  iter_sampling = 400
)

M2 <- poLCA(f,values,nclass=3,maxiter=8000)
M2
mod_out$summary("nu")

#### example 3 with "carcinoma" data

data(carcinoma)
f <- cbind(A, B, C, D, E, F, G) ~ 1
lc4 <- poLCA(f, carcinoma, nclass = 4)


wide_data <- list(
  I = ncol(carcinoma),
  J = nrow(carcinoma),
  y = carcinoma - 1,
  C = 4
)

mod_out <- lc_mod$sample(
  data = wide_data,
  chains = 4,
  init = 0.5,
  adapt_delta = 0.9,
  parallel_chains = 4,
  iter_warmup = 400,
  iter_sampling = 400
)

mod_out$summary("nu")

pol_output <- matrix(0, 7, 4)

j <- 1
for (i in lc4$probs){
  pol_output[j, ] <- t(i[, 1] / sum(i[, 1]))
  j <- j + 1
}
pol_output
stan_out <- matrix(mod_out$summary("pi")$mean, 7 ,4)
stan_out
1 Like

Unfortunately that is another identifying assumption that substantially changes the meaning of the model. It assumes that the class probabilities for each item sum to 1. For example, if Class 1 has a 90% chance of answering item 1 correctly, Class 2 has a 10% chance (in a 2-class solution). As you might imagine, this does not usually follow from the theoretical models motivating LCA. Nor would flipping the two indices so that pi is a simplex within classes across items.

I get unimodal distributions when I drop the model-estimated prior for pi in favor of a preset one. So that remains my best guess why your specific example ends up multi-modal. I doubt that it fully solves the problem of identifying the model, though. (Note that the x-axis should actually read pi, i.e. the item probabilities in log-odds.)

model{
    // Priors
    to_vector(pi) ~ normal(0, 5);
    target += gamma_lpdf(lambda | alpha_lambda, beta_lambda);
    alpha_lambda ~ cauchy(1,2.5);
    beta_lambda ~ cauchy(1,2.5);
    
    target += log_mix(nu, Ys);
}

1 Like

@spinkney sadly the ordering of pi doesnt work theoretically as it changes the structure of the model.

Thanks @simonbrauer , I tried with the change in priors, and result in some runs working fine and other dont. Still sensitive to starting values between runs

Coming back at this, saw this other post in the forum Latent class previous post where thet adjust the DINA model code

And after testing found that this method to estimate latent classes from binary data works well for increasing numbers of classes. With a small change in the constraints between C = 2 and C > 2.

I am not clear why this approach works but using the bernoulli_lpmf distribution doesnt work. I was looking to use the Stan distributions because that way its easier to combine different types of items, as for example this code only works for binary items

Here is the code for C=2

data {
  int<lower=1> I;                                     // number of items
  int<lower=1> J;                                     // number of respondents
  matrix[J, I] y;                                     // score for obs n
  int<lower=1> C;           // # of attribute profiles (latent classes) 
}
parameters {
  simplex[C] nu;
  vector<lower=0>[C] alpha_dir;
  // average intercept and main effect
  real mean_intercept;
  real<lower=0> mean_maineffect;
  // item level deviations from average effects
  real dev_intercept[I];
  real<lower=-mean_maineffect> dev_maineffect[I];
}

transformed parameters {
  matrix[I,C] pi;    // Probability of correct response for each class on each item
  vector[C] ps[J];
  real log_items[I];
  
  real intercept[I];
  real maineffect[I];
  vector[I] master_pi;
  vector[I] nonmaster_pi;
  
  for (i in 1:I) {
    intercept[i] = mean_intercept + dev_intercept[i];
    maineffect[i] = mean_maineffect + dev_maineffect[i];
    
    nonmaster_pi[i] = inv_logit(intercept[i]);
    master_pi[i] = inv_logit(intercept[i] + maineffect[i]);
  }
  
    // Probability of correct response for each class on each item
  for (c in 1:C) {
    for (i in 1:I) {
      pi[i,c] = master_pi[i]^(c - 1) * nonmaster_pi[i]^(1 - (c - 1));
    }
  }
  
      // Define model
  for (j in 1:J) {
    for (c in 1:C) {
      for (i in 1:I) {
        log_items[i] = y[j,i] * log(pi[i,c]) + (1 - y[j,i]) * log(1 - pi[i,c]);
      }
      ps[j][c] = sum(log_items);
    }
      }
  
}
model{

  // Priors
  nu ~ dirichlet(alpha_dir);
  alpha_dir ~ gamma(1, 0.5);
   
  mean_intercept ~ normal(0, 5);
  mean_maineffect ~ normal(0, 5);
  
  dev_intercept ~ normal(0, 3);
  dev_maineffect ~ normal(0, 3);
  
  target += log_mix(nu, ps);

}

And when C > 2, the parameter block changes to

parameters {
  simplex[C] nu;
  vector<lower=0>[C] alpha_dir;
  // average intercept and main effect
  real mean_intercept;
  real mean_maineffect; 
  // item level deviations from average effects
  real dev_intercept[I];
  real dev_maineffect[I]; 
}

And here is the R code for runing it

library(poLCA)
library(loo)
library(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

dat <- read.csv("https://stats.idre.ucla.edu/wp-content/uploads/2016/02/lca1.dat",
                  header=F)

colnames(dat) <- c("id", paste0("item",1:9))
head(dat)

apply(dat[,2:10], 2, table)

dat2 <- dat[,2:10]
head(dat2)

wide_data <- list(
  I = ncol(dat2),
  J = nrow(dat2),
  y = dat2,
  C = 3
)
  
pars <- c("nu","pi")
iter <- 1000
warmup <- iter - 500
wide_model <- stan(model_code=lca_wide, 
                   data = wide_data,
                   pars=pars,
                   chains = 3, 
                   iter = iter,
                   warmup = warmup)
wide_model

## results with poLCA
ch2 <- poLCA(cbind(item1, item2, item3, item4, item5, item6, item7, item8, item9)~1,
             dat2+1,nclass=4, maxiter=20000)
ch2

Now I am trying to adjust that approach to include categorical indicators with more than 2 answers, and continuous items

2 Likes

Interesting! I’m surprised it works better than other methods too. Interested in hearing what you learn as you extend it.

I would be leery of this line for C > 2. It assumes an odd structure for how item probabilities are related across classes that I suspect will be too constraining. Take the C=3 example where P_1 = \text{master_pi}_1 and P_2 = \text{nonmaster_pi}_2. The item probability for class 1 would by P_1^{1-1} \times P_2^{1 - (1 - 1)} = P_2. The item probability for class 2 would by P_1^{2-1} \times P_2^{1 - (2 - 1)} = P_1. And the item probability for class 1 would by P_1^{3-1} \times P_2^{1 - (3 - 1)} = P_1^2 \times P_2^{-1} = \frac{P_1^2}{P_2}.

1 Like

Removing that restriction and coding the model seems to recover the poLCA parameters and with no issues detected. But it’s so much slower (takes about 14 mins on my laptop).

data {
  int<lower=1> I;                                     // number of items
  int<lower=1> J;                                     // number of respondents
  matrix[J, I] y;                                     // score for obs n
  int<lower=1> C;           // # of attribute profiles (latent classes) 
}
parameters {
  simplex[C] nu;
  // average intercept and main effect
  real mean_intercept;
  // item level deviations from average effects
  vector[I] dev_intercept;
  vector<lower=rep_array(-fabs(dev_intercept), C), upper=rep_array(fabs(dev_intercept), C)>[I] class_interaction[C];
  ordered[C] class_effect_non;
  
}

transformed parameters {
  matrix[I, C] log_prob;    // Probability of correct response for each class on each item
  // Probability of correct response for each class on each item
  for (c in 1:C) 
     log_prob[, c] = log_inv_logit(mean_intercept + dev_intercept + class_interaction[c] + class_effect_non[c]);
}
model{
  // Priors
  mean_intercept ~ normal(0, 5);
  
  for (c in 1:C)
    class_interaction[c] ~ std_normal();
  
  class_effect_non ~ normal(0, 3);
  dev_intercept ~ normal(0, 3);
 
  {
    for (j in 1:J) {
      row_vector[C] ps = log(nu') + y[j] * log_prob + (1 - y[j]) *  log1m_exp(log_prob);
      target += log_sum_exp(ps);
    }
  }
}
generated quantities {
  matrix[I, C] probs;
  
  for (c in 1:C)
    probs[, c] = inv_logit(mean_intercept + dev_intercept + class_interaction[c] + class_effect_non[c]);
}

The poLCA probs with 3 latent classes:

Estimated class population shares 
 0.5576 0.3631 0.0793 

> matrix(do.call("rbind", ch2$probs)[, 2], nrow = 9, ncol = 3, byrow = T)
            [,1]       [,2]      [,3]
 [1,] 0.90827110 0.31215950 0.9232933
 [2,] 0.33740620 0.16402789 0.5461474
 [3,] 0.06674344 0.03573206 0.4263940
 [4,] 0.06548031 0.05604210 0.4179348
 [5,] 0.21915046 0.04436150 0.7654777
 [6,] 0.31963272 0.18293996 0.4710176
 [7,] 0.11281159 0.09768093 0.5123801
 [8,] 0.13989171 0.10987954 0.6192120
 [9,] 0.32487311 0.18780498 0.3488297

the Stan model (swap rows 1 and 2)

> wide_model$summary("nu")
# A tibble: 3 x 10
  variable  mean median     sd    mad     q5   q95  rhat ess_bulk ess_tail
  <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 nu[1]    0.312 0.291  0.137  0.136  0.123  0.563  1.02     212.     281.
2 nu[2]    0.585 0.608  0.134  0.134  0.329  0.772  1.02     223.     393.
3 nu[3]    0.102 0.0980 0.0317 0.0302 0.0590 0.160  1.03     113.     600.
> matrix(wide_model$summary("probs")$mean, nr = 9, nc = 3, byrow = F)
            [,1]       [,2]      [,3]
 [1,] 0.34482077 0.82991176 0.9347736
 [2,] 0.12325590 0.31869247 0.5406846
 [3,] 0.03166788 0.06463380 0.3658653
 [4,] 0.04844336 0.06352205 0.3757007
 [5,] 0.04460407 0.20768053 0.6291819
 [6,] 0.15387589 0.32565031 0.5479768
 [7,] 0.08637647 0.11032362 0.4652101
 [8,] 0.10067586 0.13870967 0.5374320
 [9,] 0.14795669 0.31827212 0.4234829
1 Like

This simplifies the model and only tries to find parameters for the item by class interactions. It takes four min 2 min on my machine. Though you still may not like the ordered vector restriction.

data {
  int<lower=1> I;                                     // number of items
  int<lower=1> J;                                     // number of respondents
  matrix[J, I] y;                                     // score for obs n
  int<lower=1> C;           // # of attribute profiles (latent classes) 
}
parameters {
  simplex[C] nu;
  ordered[C] class_interaction[I];
}
transformed parameters {
  matrix[I, C] log_prob;    // Probability of correct response for each class on each item
  // Probability of correct response for each class on each item
  for (c in 1:C) 
    log_prob[, c] = log_inv_logit( to_vector(class_interaction[ , c]));
}
model{
  // Priors
  for (c in 1:C)
     to_vector(class_interaction[ , c]) ~ std_normal();
  {
    for (j in 1:J) {
      row_vector[C] ps = log(nu') + y[j] * log_prob + (1 - y[j]) *  log1m_exp(log_prob);
      target += log_sum_exp(ps);
    }
  }
}
generated quantities {
  matrix[I, C] probs;
  
  for (c in 1:C)
    probs[, c] = inv_logit( to_vector(class_interaction[ , c]));
}

1 Like

@simonbrauer thanks for pointing out the issue with that part of the code.

@spinkney these are very intereting changes to the code for the model. I will play with them, thank you very much

1 Like

@Mauricio_Garnier-Villarre Did you happen to find a solution for polytomous variables?

@Kevin_Linares no, I havent been able to extend this for polytomous items. Also, to be fair, for the sake of the project I am working on I drop this so I havent worked on it for a bit

If you want to try this, I would susgget to take this code, that nis written for the log probability of answering 1 (like logistic regression)

  for (c in 1:C) 
     log_prob[, c] = log_inv_logit(mean_intercept + dev_intercept + class_interaction[c] + class_effect_non[c]);
}

And change it for multinomial logistic approach for example

@Kevin_Linares i did a quick test, and if you use the modified code shared by @spinkney with polytomous items its works as long as you separate them into all possible binary/dummy variables

Something like this example

data {
  int<lower=1> I;                                     // number of items
  int<lower=1> J;                                     // number of respondents
  matrix[J, I] y;                                     // score for obs n
  int<lower=1> C;           // # of attribute profiles (latent classes) 
}
parameters {
  simplex[C] nu;
  // average intercept and main effect
  real mean_intercept;
  // item level deviations from average effects
  vector[I] dev_intercept;
  vector<lower=-7, upper=7>[I] class_interaction[C];
  ordered[C] class_effect_non;
}

transformed parameters {
  matrix[I, C] log_prob;    // Probability of correct response for each class on each item
  // Probability of correct response for each class on each item
  for (c in 1:C) 
     log_prob[, c] = log_inv_logit(mean_intercept + dev_intercept + class_interaction[c] + class_effect_non[c]);
}
model{
  // Priors
  mean_intercept ~ normal(0, 5);
  
  for (c in 1:C)
    class_interaction[c] ~ std_normal();
  
  class_effect_non ~ normal(0, 3);
  dev_intercept ~ normal(0, 3);
 
  {
    for (j in 1:J) {
      row_vector[C] ps = log(nu') + y[j] * log_prob + (1 - y[j]) *  log1m_exp(log_prob);
      target += log_sum_exp(ps);
    }
  }
}
generated quantities {
  matrix[I, C] probs;
  
  for (c in 1:C)
    probs[, c] = inv_logit(mean_intercept + dev_intercept + class_interaction[c] + class_effect_non[c]);
}
library(psych)
library(poLCA)
data(election)


f2a <- cbind(MORALG,CARESG,KNOWG)~1
nes2a <- poLCA(f2a,election,nclass=2,nrep=10,  maxiter=20000)
nes2a


d2 <- election[,c("MORALG","CARESG","KNOWG")]
head(d2)
apply(d2, 2, table)
summary(d2)

table(d2[,1],dummy.code(d2[,1])[,1] )

d3 <- cbind(dummy.code(d2[,1]),
            dummy.code(d2[,2]),
            dummy.code(d2[,3]) )

head(d3)
d4 <- na.omit(d3)
dim(d4)
head(d4)

wide_data <- list(
  I = ncol(d4),
  J = nrow(d4),
  y = d4,
  C = 2
)

pars <- c("nu","probs")
iter <- 1000
warmup <- iter - 500
wide_model <- stan(model_code=lca_wide, 
                   data = wide_data,
                   pars=pars,
                   chains = 3, 
                   iter = iter,
                   warmup = warmup)
wide_model

2 Likes

I finally had a chance to test this stan code with polytomous variables, and as long as I use all categories dummy coded into binary data it works well. Has anybody extended this code that spinkney shared to write out predicted class memberships in the generated quantities code chunk?

1 Like

Kevin

This added sections in the generated quantities should work, by having probs_sub as the probability of beloging on each class per subject

generated quantities {
  matrix[I, C] probs;
  matrix[J, C] probs_sub;
  
  for (c in 1:C){
    probs[, c] = inv_logit(mean_intercept + dev_intercept + class_interaction[c] + class_effect_non[c]);
  }
     
     {
       matrix[J, C] temp_log;
      vector[J] temp_sum;
      for (j in 1:J) {
       temp_log[j,] = log(nu') + y[j] * log_prob + (1 - y[j]) *  log1m_exp(log_prob);
       temp_sum[j] = sum(temp_log[j,]);
       
       for(c in 1:C){
       probs_sub[j,c] = temp_log[j,c] / temp_sum[j];
       }}
     }

}

I can definetely be improved to be faster, I am not great optimizing Stan code, and I rely too much on loops

@Kevin_Linares / @Mauricio_Garnier-Villarre would either of you (or both) like to put together a case study for this with me?

1 Like