Mixture Bernoulli

I am trying to fit a mixture of two Bernoulli, I know I have to marginalize out the discrete latent variable this case.
However, the estimate is far from correct.
I just assume each of the four binary items is generated from a mixture of Bernoulli.
The code below was just to simulate the data.

# simulate data with four items and two classes
N <- 500

components <- sample(1:2,prob=c(0.2,0.8),size=N,replace=TRUE)
#parameters
p1 <- c(0.6,0.7)
p2 <- c(0.2,0.7)
p3 <- c(0.4,0.6)
p4 <- c(0.35,0.8)

item1 <- rbinom(n=N, size = 1, prob = p1[components])
item2 <- rbinom(n=N, size = 1, prob = p2[components])
item3 <- rbinom(n=N, size = 1, prob = p3[components])
item4 <- rbinom(n=N, size = 1, prob = p4[components])

items <- tibble::tibble(item1, item2, item3, item4)

data <- list(
  y= items,
  J = 500,
  I = 4,
  C = 2
)

stan_fit <- stan(file = "2d.stan", data = lca_data,
     chains = 1, iter = 2000)

stan code

data {
  int<lower=1> I;			// # of items
  int<lower=1> J;			// # of respondents	
  int<lower=1> C;			// # of groups
  int y[J,I];			// response matrix
}

parameters {
real <lower = 0, upper = 1> lambda; 					    // probabilities of being in one group
real <lower = 0, upper = 1> p[I, C];
}


model {
for (j in 1:J){
  for (i in 1:I){

target += log_mix(lambda,
bernoulli_lpmf(y[j, i] | p[i, 1]),
bernoulli_lpmf(y[j, i] | p[i, 2]));
}
}
}

any ideals? or anything that I can clarify?

I think it would be appreciated if you formatted your code using markdown using code blocks (triple quotes ```).

How does the model fail? I am somewhat skeptical that this model is identifiable - even if I observe a lot of items drawn from a Bernoulli mixture with 0.2 and 0.7, won’t it be identical to draws from a single Bernoulli with some probability in-between?

Note that at the moment you draw each item i independently from a univariate two-component mixture. Given that you use C to denote group, don’t you actually want to draw all I items jointly from one 4-dimensional mixture component?

I haven’t verified that the below syntax works, but I am thinking something like:

model {
for (j in 1:J){
    target += log_mix(lambda,
                      bernoulli_lpmf(y[j, ] | p[, 1]),
                      bernoulli_lpmf(y[j, ] | p[, 2]));
}
}

i.e. draw all items simultaneously before mixing.

1 Like

Hello @Random ,
Did you get the solution finaly ?

Thanks,

Yes, I am having an exam right now, will follow up with you soon.

1 Like

I’m using the following model with one class. I would like to integrate mixture parameters to isolate specific groups.

Would you have an idea of how to change the code below so that it include groups parameters?

model <- "
  data {
      int<lower=0> n; // number of subjects
      int<lower=0> k; // number of items
      int<lower=0, upper=1> y1[n]; // manifestation variables
      int<lower=0, upper=1> y2[n];
      int<lower=0, upper=1> y3[n];
      int<lower=0, upper=1> y4[n];
      int<lower=0, upper=1> y5[n];
      int<lower=0, upper=1> y6[n];
      int<lower=0, upper=1> y7[n];
      int<lower=0, upper=1> y8[n];
  
  }
  parameters {
      vector[k] alpha;
      real<lower=0> beta[k];
      vector[n] x;    // this is theta but its easier to see as x in the code
  }
  
  model {
      // priors
      x ~ normal(0,1); 
      alpha ~ normal(0,4); 
      beta ~ gamma(4,3); 
      
      // items
  y1 ~ bernoulli_logit(alpha[1] + beta[1] * x);
  y2 ~ bernoulli_logit(alpha[2] + beta[2] * x);
  y3 ~ bernoulli_logit(alpha[3] + beta[3] * x);
  y4 ~ bernoulli_logit(alpha[4] + beta[4] * x);
  y5 ~ bernoulli_logit(alpha[5] + beta[5] * x);
  y6 ~ bernoulli_logit(alpha[6] + beta[6] * x);
  y7 ~ bernoulli_logit(alpha[7] + beta[7] * x);
  y8 ~ bernoulli_logit(alpha[8] + beta[8] * x);
  
  }
  
  "