# Mixture Bernoulli

I am trying to fit a mixture of two Bernoulli, I know I have to marginalize out the discrete latent variable this case.
However, the estimate is far from correct.
I just assume each of the four binary items is generated from a mixture of Bernoulli.
The code below was just to simulate the data.

``````# simulate data with four items and two classes
N <- 500

components <- sample(1:2,prob=c(0.2,0.8),size=N,replace=TRUE)
#parameters
p1 <- c(0.6,0.7)
p2 <- c(0.2,0.7)
p3 <- c(0.4,0.6)
p4 <- c(0.35,0.8)

item1 <- rbinom(n=N, size = 1, prob = p1[components])
item2 <- rbinom(n=N, size = 1, prob = p2[components])
item3 <- rbinom(n=N, size = 1, prob = p3[components])
item4 <- rbinom(n=N, size = 1, prob = p4[components])

items <- tibble::tibble(item1, item2, item3, item4)

data <- list(
y= items,
J = 500,
I = 4,
C = 2
)

stan_fit <- stan(file = "2d.stan", data = lca_data,
chains = 1, iter = 2000)
``````

stan code

``````data {
int<lower=1> I;			// # of items
int<lower=1> J;			// # of respondents
int<lower=1> C;			// # of groups
int y[J,I];			// response matrix
}

parameters {
real <lower = 0, upper = 1> lambda; 					    // probabilities of being in one group
real <lower = 0, upper = 1> p[I, C];
}

model {
for (j in 1:J){
for (i in 1:I){

target += log_mix(lambda,
bernoulli_lpmf(y[j, i] | p[i, 1]),
bernoulli_lpmf(y[j, i] | p[i, 2]));
}
}
}
``````

any ideals? or anything that I can clarify?

I think it would be appreciated if you formatted your code using markdown using code blocks (triple quotes ```).

How does the model fail? I am somewhat skeptical that this model is identifiable - even if I observe a lot of items drawn from a Bernoulli mixture with 0.2 and 0.7, won’t it be identical to draws from a single Bernoulli with some probability in-between?

Note that at the moment you draw each item `i` independently from a univariate two-component mixture. Given that you use `C` to denote group, don’t you actually want to draw all `I` items jointly from one 4-dimensional mixture component?

I haven’t verified that the below syntax works, but I am thinking something like:

``````model {
for (j in 1:J){
target += log_mix(lambda,
bernoulli_lpmf(y[j, ] | p[, 1]),
bernoulli_lpmf(y[j, ] | p[, 2]));
}
}
``````

i.e. draw all items simultaneously before mixing.

1 Like

Hello @Random ,
Did you get the solution finaly ?

Thanks,

Yes, I am having an exam right now, will follow up with you soon.

1 Like

I’m using the following model with one class. I would like to integrate mixture parameters to isolate specific groups.

Would you have an idea of how to change the code below so that it include groups parameters?

``````model <- "
data {
int<lower=0> n; // number of subjects
int<lower=0> k; // number of items
int<lower=0, upper=1> y1[n]; // manifestation variables
int<lower=0, upper=1> y2[n];
int<lower=0, upper=1> y3[n];
int<lower=0, upper=1> y4[n];
int<lower=0, upper=1> y5[n];
int<lower=0, upper=1> y6[n];
int<lower=0, upper=1> y7[n];
int<lower=0, upper=1> y8[n];

}
parameters {
vector[k] alpha;
real<lower=0> beta[k];
vector[n] x;    // this is theta but its easier to see as x in the code
}

model {
// priors
x ~ normal(0,1);
alpha ~ normal(0,4);
beta ~ gamma(4,3);

// items
y1 ~ bernoulli_logit(alpha + beta * x);
y2 ~ bernoulli_logit(alpha + beta * x);
y3 ~ bernoulli_logit(alpha + beta * x);
y4 ~ bernoulli_logit(alpha + beta * x);
y5 ~ bernoulli_logit(alpha + beta * x);
y6 ~ bernoulli_logit(alpha + beta * x);
y7 ~ bernoulli_logit(alpha + beta * x);
y8 ~ bernoulli_logit(alpha + beta * x);

}

"``````