Mixture of latent variables with a reference class


Hi All,

I’m trying to estimate a mixture of latent variables, but I’m having trouble parameterising the mixing proportions.

In these models, for identification the factor mean in one class is fixed to zero (i.e. a reference class) and the factor means in the other classes are freely estimated. The mixing proportions are then parameterised as the probability of belonging to a given class over the reference class.

Let’s say I begin with the ‘raw’ mixing proportions:

parameters {
  simplex[K] theta;

model {
  theta ~ dirichlet(rep_vector(10,K));

How would I transform these to the probabilities that I need? My intuition would be softmax(theta - theta[K]), but I’m not sure if that’s what I’m after.



Why would the mixing proportions need to depend explicitly on assumptions about the parameters of the things that were being mixed? Can’t the mixture probabilities just be theta? It shouldn’t really change anything that one of the variables to be mixed has a fixed mean, maybe?


Right, the idea is that by setting the mixing proportions to be the probability of belonging to a given class k over the reference class K, that the estimated factor means become factor mean differences with respect to the arbitrarily chosen reference class.

The background and math is covered in more detail here: http://statmodel2.com/download/Lubke1.pdf


Ack, it’ll take me a bit to get around to looking more closely at this. I shoulda known it wasn’t something simple if you were the one posting it :P.


Alright I had a look. Lemme make some mathematical errors with this delightful inline LaTeX.

So I think we want to set the column of A in eq. 3 that corresponds to our reference class equal to zero, right? That’s what you said in your first post?

And then the question is, if we’re going to estimate p(c_{ik} | x_{ik}), how to parameterize that?

So they give the eqs:

\log \left( \frac{p(c_{ik} | x_{i})}{p(c_{iK} | x_{i})} \right) = \lambda_{c_{k}} + \Gamma_{c_{k}} x_{i}

That implies:

\log \left( \frac{p(c_{iK} | x_{i})}{p(c_{iK} | x_{i})} \right) = 0 = \lambda_{c_{K}} + \Gamma_{c_{K}} x_{i}

So can’t you choose \lambda_{c_{K}} = 0, \Gamma_{c_{K}} = 0, p_\text{unnormalized}(c_{iK} | x_{i}) = 1 and then just compute everything else from that?

So there’ll be K - 1 \lambda s and \Gamma s


Ah thanks so much for taking the time to look over that! So in the case where there aren’t predictors of the latent class, that is, where Eq. 3 becomes:

\eta_i = \bf{Ac_i} + \bf{\xi_i}

What’s the best way of parameterising that vector of probabilities \bf{c_i}?

Something along the lines of:

\ln\left[\frac{P(c_{ik} = 1 )}{P(c_{iK} = 1)}\right] = \lambda_{ck}


\ln\left[\frac{P(c_{iK} = 1 )}{P(c_{iK} = 1)}\right] =0

Does that look about right?


Looks good to me.

You could just use a simplex for the probabilities and back out what \lambda should be as a generated quantity if you care. A simplex[K] only uses K - 1 parameters, and I think they’re by default all centered up n’ such to sample well. I dunno which’d be better.