I work a Latent Class model following the post LCA previous post. The model works fine and identify the parameters estimates that make sense, replicating the example from LCA ML , with the respective data set
The issue that I am stuck now is on how to define the log_lik object in the generated quantities block. This because I want to use LOO to compare models to find the best number of classes.
As the element define target += log_sum_exp(ps) has size C (number of classes) instead of the size of the data set ( matrix[J, I] y)
Thanks for any guidance
data {
int<lower=1> I; // number of items
int<lower=1> J; // number of respondents
matrix[J, I] y; // score for obs n
int<lower=1> C; // # of attribute profiles (latent classes)
}
parameters {
simplex[C] nu;
// average intercept and main effect
real mean_intercept;
real mean_maineffect;
// item level deviations from average effects
real dev_intercept[I];
real dev_maineffect[I];
}
transformed parameters {
matrix[I,C] pi; // Probability of correct response for each class on each item
vector[C] log_nu = log(nu);
real intercept[I];
real maineffect[I];
vector[I] master_pi;
vector[I] nonmaster_pi;
for (i in 1:I) {
intercept[i] = mean_intercept + dev_intercept[i];
maineffect[i] = mean_maineffect + dev_maineffect[i];
nonmaster_pi[i] = inv_logit(intercept[i]);
master_pi[i] = inv_logit(intercept[i] + maineffect[i]);
}
// Probability of correct response for each class on each item
for (c in 1:C) {
for (i in 1:I) {
pi[i,c] = master_pi[i]^(c - 1) * nonmaster_pi[i]^(1 - (c - 1));
}
}
}
model{
real ps[C];
real log_items[I];
// Priors
mean_intercept ~ normal(0, 5);
mean_maineffect ~ normal(0, 5);
dev_intercept ~ normal(0, 3);
dev_maineffect ~ normal(0, 3);
// Define model
for (j in 1:J) {
for (c in 1:C) {
for (i in 1:I) {
log_items[i] = y[j,i] * log(pi[i,c]) + (1 - y[j,i]) * log(1 - pi[i,c]);
}
ps[c] = log_nu[c] + sum(log_items);
}
target += log_sum_exp(ps);
}
}
generated quantities {
matrix[J,C] prob_resp_class; // posterior probabilities of respondent j being in latent class c
real log_items[I];
row_vector[C] prob_joint;
for (j in 1:J){
for (c in 1:C){
for (i in 1:I){
log_items[i] = y[j,i] * log(pi[i,c]) + (1 - y[j,i]) * log(1 - pi[i,c]);
}
prob_joint[c] = nu[c] * exp(sum(log_items));
}
prob_resp_class[j] = prob_joint/sum(prob_joint);
}
}
Usually the latent number of classes is weakly identified and I expect you would see a smooth elbow in ELPD values when the number of classes is increased.
g
target += log_sum_exp(ps) is inside the loop over J respondents. If in the generated you replace that with
log_lik[j] = log_sum_exp(ps);
and use those log_lik values for LOO, then you get leave-one-subject-out cross-validation.
In the LCA literature the methods to select number of classes is not clear cut, and judgement call based on information criteria. I havent seen comparisons on how ELPD behaves on this step of LCA, and how easy is to identify the elbow
I should have been more specific that the weak identifiability doesn’t depend on method used. It’s possible to add additional cost for higher number of classes to make the criterion to have more clear mode, but that doesn’t mean it would identify the “true” number of classes.
There’s also a vectorised implementation of the log_mix function (that I still need to add to the manual), which you can use with more than >2 latent classes (i.e., no need for log_sum_exp).
For the case where you have N individuals and K latent classes, you just need an array of vectors with the log-likelihood contribution of each observation to each latent class which you pass to the log_mix function with the class probabilities. For example:
model {
vector[K] contrib[N];
theta ~ dirichlet(rep_vector(10, K));
for(n in 1:N)
for(k in 1:K)
contrib[n][k] = normal_lpdf(u[n] | u_mu[k], u_var[k]);
target += log_mix(theta,contrib);
}
I am not clear how to do this for this type of model. In regression I have seen this by adding regularizing priors. But in this case I dont have priors for the classes. Could it be by adding priors in the log_mix function, \theta mixing proportions?
Some people do it with priors, but priors should be about what you know, and not about what you prefer. I’m just leaving for vacation, and this starts to get bit too deep in the differences in priors, inference, cost functions, decision theory and myriad of approximate information criteria which due to the approximation may penalize more complex more models more than the exact criterion would, so that unfortunately I’ll leave you waiting for me to get back in few weeks and you can then remind me to get back to this.