I am trying to fit a mixture of two Bernoulli, I know I have to marginalize out the discrete latent variable this case.
However, the estimate is far from correct.
I just assume each of the four binary items is generated from a mixture of Bernoulli.
The code below was just to simulate the data.
# simulate data with four items and two classes
N <- 500
components <- sample(1:2,prob=c(0.2,0.8),size=N,replace=TRUE)
#parameters
p1 <- c(0.6,0.7)
p2 <- c(0.2,0.7)
p3 <- c(0.4,0.6)
p4 <- c(0.35,0.8)
item1 <- rbinom(n=N, size = 1, prob = p1[components])
item2 <- rbinom(n=N, size = 1, prob = p2[components])
item3 <- rbinom(n=N, size = 1, prob = p3[components])
item4 <- rbinom(n=N, size = 1, prob = p4[components])
items <- tibble::tibble(item1, item2, item3, item4)
data <- list(
y= items,
J = 500,
I = 4,
C = 2
)
stan_fit <- stan(file = "2d.stan", data = lca_data,
chains = 1, iter = 2000)
stan code
data {
int<lower=1> I; // # of items
int<lower=1> J; // # of respondents
int<lower=1> C; // # of groups
int y[J,I]; // response matrix
}
parameters {
real <lower = 0, upper = 1> lambda; // probabilities of being in one group
real <lower = 0, upper = 1> p[I, C];
}
model {
for (j in 1:J){
for (i in 1:I){
target += log_mix(lambda,
bernoulli_lpmf(y[j, i] | p[i, 1]),
bernoulli_lpmf(y[j, i] | p[i, 2]));
}
}
}