I’m attempting to fit a mixed membership model a la Erosheva, Feinberg, and Lafferty (2004). In the model, I persons answer J items and each item has Lj response levels (i.e., not all items must have the same number of response levels). Each person belongs probabilistically to one of K response profiles and each profile has as distribution of possible responses to each item.
I borrowed quite a bit from the Stan manual to set up marginalization and avoid data augmentation. I also hacked a ragged array into a vector of values and one of indices, (cs and Slj), but that technique has worked for me in the past.
Code:
data {
int<lower=1> K; // profiles
int<lower=1> I; // persons
int<lower=1> J; // items
int<lower=2> L[J]; // response levels per item
int<lower=2> Slj; // sum of L, J; total number of response levels across aSlj items
int<lower=1> cs[J + 1]; // cumulative sum of levels per item, with '1' added at index 1.
int<lower=0> y[I,J]; // responses
}
transformed data {
real<lower=0> pg;
pg = 1.0 / K;
}
parameters {
positive_ordered[K] eta1;
vector<lower=0>[K] eta[I - 1];
vector<lower=0>[Slj] gamma[K];
}
transformed parameters {
vector[K] log_q_z[I];
vector<lower=0>[Slj] beta[K];
vector<lower=0>[K] theta[I];
theta[1] = eta1 / sum(eta1);
for (i in 2:I) {
theta[i] = eta[i - 1] / sum(eta[i - 1]);
}
for (k in 1:K) {
for (j in 1:J) {
beta[k, cs[j]:cs[j + 1]] = gamma[k, cs[j]:cs[j + 1]] / sum(gamma[k, cs[j]:cs[j + 1]]);
}
}
for (i in 1:I) {
log_q_z[i] = log(theta[i]);
for (j in 1:J) {
for (k in 1:K) {
log_q_z[i,k] = log_q_z[i,k] + log(beta[k,cs[j] + y[i,j]]);
}
}
}
}
model {
for (i in 1:I) {
if (i == 1) {
target += exponential_lpdf(eta1 | pg);
} else {
target += exponential_lpdf(eta[i - 1] | pg);
}
target += log_sum_exp(log_q_z[i]);
}
for (k in 1:K) {
target += exponential_lpdf(gamma[k] | pg);
}
}
I believe the problem I’m having is label-switching – I consistently get reasnoably N_eff and Rhat for all parameters in up to K-2 of the possible profiles, but then terrible mixing for at least two of the remaining profiles.
I attempted to constrain the estimates a bit. First, I made the probabilities for K at i = 1 ordered, hoping that the ordering the profiles relative to the probability of i = 1’s membership in them would help. I also attempted initializing profile membership by first running K-means on the responses and setting the first value on each subject i’s membership in a category based on that. I also tried refactoring to only cover cases where all items have the same number of responses, which made a lot of the model simpler, but had the same problem.
Have I misspecified the model? Are there other tricks I’m missing?
Code:
mm.stan (1.3 KB)
Simulated Data and R script:
Thanks.