Hi,

`log_mix`

is actually conceptually reasonably easy, if we work on the probability scale, not on the log scale. So if we want to express a mixture model on the probability/density scale, we have a mixing probability \theta and an indicator binary variable z which is 1 (with probability \theta) or 2 (with probability 1 - \theta) to indicate which of the two components the observation came from, i.e.:

\rm{P}(Y = y | \theta) = \theta \rm{P}(Y = y | z = 1) + (1 - \theta) \rm{P}(Y = y | z = 2)

Now we need to move to the log scale, where Stan (for good reasons) works. If we say that \lambda_1 = \log {\rm{P}(Y = y | z = 1)} and \lambda_2 = \log {\rm{P}(Y = y | z = 2)} then we get Stan’s `log_mix`

as:

\begin{eqnarray*}
\mathrm{log\_mix}(\theta, \lambda_1, \lambda_2) = \log {\rm{P}(Y = y | \theta)} & = & \log \!\left(
\theta \exp(\lambda_1) + \left( 1 - \theta \right) \exp(\lambda_2)
\right) \\[3pt] & = & \mathrm{log\_sum\_exp}\!\left(\log(\theta) +
\lambda_1, \ \log(1 - \theta) + \lambda_2\right). \end{eqnarray*}

I made some slight abuses of notation above, but I hope the idea is clear enough - feel free to ask for clarifications.

Next, I will assume what you want to use is directly plug the result of `log_mix`

into the model log density, e.g. `target += log_mix(something);`

We know that Stan only needs the target up to an additive constant, so the question is "If I use `x_lupdf`

instead of `x_lpdf`

will the result of `log_mix`

only change by a constant?

So what happens, when use `lupdf`

and modify \lambda_1 and \lambda_2 by constants c_1, c_2. Using `lupdf`

would be safe if the difference between the two versions is itself a constant:

d = \rm{log\_mix}(\theta, \lambda_1, \lambda_2) - \rm{log\_mix}(\theta, \lambda_1 + c_1, \lambda_2 + c_2) = \\
\log \!\left( \theta \exp(\lambda_1) + \left( 1 - \theta \right) \exp(\lambda_2) \right) - \log \!\left( \theta \exp(c_1)\exp(\lambda_1) + \left( 1 - \theta \right) \exp(c_2)\exp(\lambda_2) \right)

If we can assume c = c_1 = c_2, then we can get the constants out of the \rm{log}, the last expression simplifies, we have d = c, the difference is constant and the posterior distribution of model parameters will be the same. But if c_1 \neq c_2, then d depends on all of \theta, \lambda_1 and \lambda_2 and the posterior will differ.

So the TL;DR is: using `x_lpdf`

with `log_mix`

is always safe, swapping for `x_lupdf`

is safe only if the constants that are dropped are identical for both terms. This is both a) unlikely to hold for practical models and b) hard to check unless you understand a lot about how the functions are implemented and c) can change if the implementation changes with updates to Stan. Using `x_lupdf`

with `log_mix`

thus should be avoided unless you really know what you are doing and really need to squeeze all the last bits of performance - there are usually many safer modifications that will speed up your model much more than using `x_lupdf`

.

~~Also note that if you want to do model comparison with ~~`loo`

or some other fancy post-processing that uses the samples of posterior log-density, you actually cannot omit the constants anyway.

**EDIT:** The above statement was misleading, see below.

Best of luck with your modelling work!