The “unnormalized” version does not drop only the normalization constants but also everything else that is not relevant for computing the autodiff gradient. That means the output of an unnormalized pdf is nondeterministic and context dependent.
Normalized PDFs are easy.
target += normal_lpdf(x| mu, sigma);
is always equivalent to
target += -0.5*square(x-mu)/sigma - log(sigma) - 0.5*log(2*pi());
Now let’s consider unnormalized PDF.
target += normal_lupdf(x| mu, sigma);
When both mu
and sigma
are unknown parameters the above is equivelent to
target += -0.5*square(x-mu)/sigma - log(sigma);
The obvious constant 0.5*log(2*pi())
is removed for efficiency. So far so good.
But supposed that sigma
is known data. In that case the unnormalized PDF becomes
target += -0.5*square(x-mu)/sigma;
In the context of this particular model log(sigma)
is constant and can be dropped. Note that there is no way to decide whether a given variable is data or parameter other than to look up where it was originally defined, possibly following it through function calls. The same function may have different results when called from transformed data block and model block in the same model. The restrictions on _lupdf
s are there to avoid this weirdness.
Even 0.5*log(2*pi())
isn’t always dropped. RStan and PyStan expose the model’s log_prob
method to users. When called through the interface it doesn’t use autodiff so it wouldn’t make sense to calculate the target “up to a constant”–the result is a constant. Consequently the exposed method computes full normalization.
The unpredictable nature of _lupdf
s makes them a bit tricky to use correctly. Here’s an example.
real mixed1_lpdf(real x, real mu, real sigma1, sigma2) {
return log_mix(0.5,
normal_lpdf(x | mu, sigma1),
normal_lpdf(x | mu, sigma2));
}
real mixed2_lpdf(real x, real mu1, real mu2, real sigma) {
return log_mix(0.5,
normal_lupdf(x | mu1, sigma),
normal_lupdf(x | mu2, sigma));
}
mixed1
requires normalizing because the two normal
s have different normalizations if either sigma
is constant. Even if you don’t need to normalize the result of mixed1
you still need to normalize the two normal
s used in the calculation.
On the other hand, in mixed2
the two distributions have the same normalization for any possible inputs. If you don’t need to normalize mixed2
then these normal
s can also be left unnormalized.
Although the user-defined function is defined with _lpdf
suffix it can be invoked either as mixed2_lpdf
or mixed2_lupdf
. The former means that the caller needs full normalization (and thus forces the inner normal_lupdf
calls to become normal_lpdf
s) while the latter leaves the normalization optional.