Hi,
log_mix
is actually conceptually reasonably easy, if we work on the probability scale, not on the log scale. So if we want to express a mixture model on the probability/density scale, we have a mixing probability \theta and an indicator binary variable z which is 1 (with probability \theta) or 2 (with probability 1 - \theta) to indicate which of the two components the observation came from, i.e.:
\rm{P}(Y = y | \theta) = \theta \rm{P}(Y = y | z = 1) + (1 - \theta) \rm{P}(Y = y | z = 2)
Now we need to move to the log scale, where Stan (for good reasons) works. If we say that \lambda_1 = \log {\rm{P}(Y = y | z = 1)} and \lambda_2 = \log {\rm{P}(Y = y | z = 2)} then we get Stan’s log_mix
as:
\begin{eqnarray*}
\mathrm{log\_mix}(\theta, \lambda_1, \lambda_2) = \log {\rm{P}(Y = y | \theta)} & = & \log \!\left(
\theta \exp(\lambda_1) + \left( 1 - \theta \right) \exp(\lambda_2)
\right) \\[3pt] & = & \mathrm{log\_sum\_exp}\!\left(\log(\theta) +
\lambda_1, \ \log(1 - \theta) + \lambda_2\right). \end{eqnarray*}
I made some slight abuses of notation above, but I hope the idea is clear enough - feel free to ask for clarifications.
Next, I will assume what you want to use is directly plug the result of log_mix
into the model log density, e.g. target += log_mix(something);
We know that Stan only needs the target up to an additive constant, so the question is "If I use x_lupdf
instead of x_lpdf
will the result of log_mix
only change by a constant?
So what happens, when use lupdf
and modify \lambda_1 and \lambda_2 by constants c_1, c_2. Using lupdf
would be safe if the difference between the two versions is itself a constant:
d = \rm{log\_mix}(\theta, \lambda_1, \lambda_2) - \rm{log\_mix}(\theta, \lambda_1 + c_1, \lambda_2 + c_2) = \\
\log \!\left( \theta \exp(\lambda_1) + \left( 1 - \theta \right) \exp(\lambda_2) \right) - \log \!\left( \theta \exp(c_1)\exp(\lambda_1) + \left( 1 - \theta \right) \exp(c_2)\exp(\lambda_2) \right)
If we can assume c = c_1 = c_2, then we can get the constants out of the \rm{log}, the last expression simplifies, we have d = c, the difference is constant and the posterior distribution of model parameters will be the same. But if c_1 \neq c_2, then d depends on all of \theta, \lambda_1 and \lambda_2 and the posterior will differ.
So the TL;DR is: using x_lpdf
with log_mix
is always safe, swapping for x_lupdf
is safe only if the constants that are dropped are identical for both terms. This is both a) unlikely to hold for practical models and b) hard to check unless you understand a lot about how the functions are implemented and c) can change if the implementation changes with updates to Stan. Using x_lupdf
with log_mix
thus should be avoided unless you really know what you are doing and really need to squeeze all the last bits of performance - there are usually many safer modifications that will speed up your model much more than using x_lupdf
.
Also note that if you want to do model comparison with loo
or some other fancy post-processing that uses the samples of posterior log-density, you actually cannot omit the constants anyway.
EDIT: The above statement was misleading, see below.
Best of luck with your modelling work!