Mixture model for person-level effects

Our data are longitudinal, with multiple observations per person i over time t.

First, we fit a standard multilevel logistic model:
E[y_{it} | x_{it}, \eta_i] = \text{logit}^{-1}(\beta_0 + \beta_1 x_{1,it} + \beta_2 x_{2,it} + \beta_3 x_{3,it} + \eta_i)
\eta_i \sim N(0,\sigma_\eta)

This model appeared to converge (no obvious fitting issues). But our posterior predictive check failed. We got advice to try a mixture model for \eta_i:

z_i \sim \text{Categorical}(\theta_1,\theta_2)
\eta_i | z_i \sim N(\mu_{z_i},\sigma_{z_i})

where we enforce:
E[\eta_i] = \theta_1\mu_1 + \theta_2\mu_2 = 0

However, the model (as coded below) does not converge (we used 4 chains, 2000 iterations). We cannot share the data, but will try simulated data next. Any advice just based on the code? Errors?

Thanks so much!

data {
  int<lower=1> num_obs;
  int<lower=1> num_people;
  int y[num_obs];
  int<lower=1,upper=num_people> ii[num_obs]; // person
  real x1[num_obs];
  real x2[num_obs]; 
  real x3[num_obs];
}
parameters {
  real person_effect[num_people];
  simplex[2] theta;          // mixing proportions
  real mu_subset;      // first location of mixture components
  vector<lower=0>[2] sigma;  // scales of mixture components
  real intercept; 
  real beta_1; 
  real beta_2; 
  real beta_3; 
}
model {
  vector[2] log_theta = log(theta);  // cache log calculation
  vector[2] mu;      // locations of mixture components
  real y_mean_logit[num_obs]; 
  mu[1] = mu_subset;
  mu[2] = (- theta[1] * mu[1])/theta[2];
  intercept~normal(0,4);
  beta_1~normal(0,4);
  beta_2~normal(0,4);
  beta_3~normal(0,4);
  mu~normal(0,4);
  sigma~normal(0,4);
  for (i in 1:num_people) {
    vector[2] lps = log_theta;
    for (k in 1:2)
      lps[k] += normal_lpdf(person_effect[i] | mu[k], sigma[k]);
    target += log_sum_exp(lps);
  }
  for(n in 1:num_obs)
    y_mean_logit[n] = intercept+beta_1*x1[n]+beta_2*x2[n]+beta_3*x3[n]+person_effect[ii[n]];
  y ~ bernoulli_logit(y_mean_logit);
}

Having the non-centered parameterization on person_effect is probably pretty important. I’m assuming that was non-centered in the previous code? If not then I guess I’m wrong haha.

Anyway I don’t know how to do something like that with a mixture. I’m assuming that’s where the problem is coming from.

Yup, you are correct, we used non-centered parameterization with the previous (non-mixture, single component) model. But I also don’t know what parameterization makes sense here !

What is the exact reason to introduce the mixture model? I wonder whether there are no better ways to introduce whatever the mixture model is trying to capture. They are hard to estimate/make identifiable. Maybe you can get away with fixing the standard deviations in the components or having a mixture of “logit precision parameters”.

3 Likes

great question !

The person effects \eta_i in the standard multilevel logistic model are not even close to Normally distributed. There appear to be at least two components, so it is reasonable to consider a mixture model. However, as you point out, they are hard to estimate.

what do you mean by “a mixture of “logit precision parameters”"?

thanks!!

  • If you see empirically that there are two subsamples than maybe your initial approach with fixed standard deviations might be the best next step.

  • In behavioral economic type models, multiplying the whole linear part with a parameter is quite popular. If you let the parameter vary by individual, it can capture the idea that people some people are more decisive in their decisions. That’s why I asked for the reason to expand the model.

Sorry, I am in a hurry. If things are not clear, just respond here.

You probably need to impose a constraint to identify the mixture. E.g., require that mu_subset is negative: real<upper=0> mu_subset.
Otherwise, it can oscillate between or within-chain between mu[1] >0, mu[2] < 0; and mu[1] < 0, mu[2] > 0. There’s no difference whatsoever to the likelihood, or to the posterior, so you get a multimodal, aliased posterior. You have to break that apart using priors (which never worked for me), hard constraints (usually works for me, if data can separate it enough), or ordered simplex constraints (which never works for me, and is excruciatingly slow).

3 Likes