Request for final feedback: User controlled unnormalized (propto) distribution syntax

The “unnormalized” version does not drop only the normalization constants but also everything else that is not relevant for computing the autodiff gradient. That means the output of an unnormalized pdf is nondeterministic and context dependent.

Normalized PDFs are easy.

target += normal_lpdf(x| mu, sigma);

is always equivalent to

target += -0.5*square(x-mu)/sigma - log(sigma) - 0.5*log(2*pi());

Now let’s consider unnormalized PDF.

target += normal_lupdf(x| mu, sigma);

When both mu and sigma are unknown parameters the above is equivelent to

target += -0.5*square(x-mu)/sigma - log(sigma);

The obvious constant 0.5*log(2*pi()) is removed for efficiency. So far so good.
But supposed that sigma is known data. In that case the unnormalized PDF becomes

target += -0.5*square(x-mu)/sigma;

In the context of this particular model log(sigma) is constant and can be dropped. Note that there is no way to decide whether a given variable is data or parameter other than to look up where it was originally defined, possibly following it through function calls. The same function may have different results when called from transformed data block and model block in the same model. The restrictions on _lupdfs are there to avoid this weirdness.

Even 0.5*log(2*pi()) isn’t always dropped. RStan and PyStan expose the model’s log_prob method to users. When called through the interface it doesn’t use autodiff so it wouldn’t make sense to calculate the target “up to a constant”–the result is a constant. Consequently the exposed method computes full normalization.

The unpredictable nature of _lupdfs makes them a bit tricky to use correctly. Here’s an example.

real mixed1_lpdf(real x, real mu, real sigma1, sigma2) {
  return log_mix(0.5,
                 normal_lpdf(x | mu, sigma1),
                 normal_lpdf(x | mu, sigma2));
}
real mixed2_lpdf(real x, real mu1, real mu2, real sigma) {
  return log_mix(0.5,
                 normal_lupdf(x | mu1, sigma),
                 normal_lupdf(x | mu2, sigma));
}

mixed1 requires normalizing because the two normals have different normalizations if either sigma is constant. Even if you don’t need to normalize the result of mixed1 you still need to normalize the two normals used in the calculation.
On the other hand, in mixed2 the two distributions have the same normalization for any possible inputs. If you don’t need to normalize mixed2 then these normals can also be left unnormalized.

Although the user-defined function is defined with _lpdf suffix it can be invoked either as mixed2_lpdf or mixed2_lupdf. The former means that the caller needs full normalization (and thus forces the inner normal_lupdf calls to become normal_lpdfs) while the latter leaves the normalization optional.

5 Likes