Target += vs. sampling: does dropping constants preserve sampling behavior?

jtassaro · October 13, 2022, 12:08am

Section 7.4 of the Stan reference manual says that using the sampling statement y ~ normal(mu, sigma) and incrementing the target probability directly with target += normal_lpdf(y | mu,sigma) will “both lead to the same sampling behavior” but the sampling statement drops constant terms from the log probability function when adding to the target.

Can someone explain why this necessarily leads to the same sampling behavior as the manual suggests? I used to think that the argument was something like: dropping these additive constants would just change the normalizing constant of the model, and for the sampling procedures used in Stan, the model block only needs to specify the probability function up to a normalizing constant.

But that argument seems to assume that the sampling statement is executed unconditionally. If execution of the sampling statement is guarded by a conditional or a loop that depends upon parameter values, then dropping these additive constants would not just be changing the normalizing constant.

Concretely, consider the snippet

if(b) {
  y ~ normal(mu, sigma)
}
else {
  target += normal_lpdf(y | mu,sigma)
}

This does not seem equivalent to

target += normal_lpdf(y | mu,sigma)

because in the first snippet we drop the extra additive constant terms only on the subsets of parameter/data values which cause b to evaluate to true.

Am I missing something or is the remark in the manual missing some caveats?

andrjohns · October 13, 2022, 6:31am

It’s helpful to remember that these constant values have exactly the same value at every iteration of sampling, and so they are simply shifting the log-probability by a fixed, constant, value.

Additionally, HMC/NUTS uses the gradients of parameters to inform the sampling procedure. This means that any constant values in the likelihood that don’t interact with a parameter are not considered in the derivatives w.r.t. parameters, and do not influence the sampling

For your example above, the same posterior and gradients are implied for the parameters regardless of the conditional, and so the sampling is identical

jtassaro · October 13, 2022, 10:57am

It’s helpful to remember that these constant values have exactly the same value at every iteration of sampling, and so they are simply shifting the log-probability by a fixed, constant, value.

But I think that assumes the sampling statement is executed in every evaluation of the log probability function specified by the model. If the sampling statement is in a conditional block, as in my example, that may not necessarily be true: the constant is only added for parameter values that cause the sampling statement to be executed.

For example, if we consider:

if (mu > 0) {
  target += 0;
}
else {
  target += 1;
}

this does not specify the same log probability function as

if (mu > 0) {
  target += 0;
}
else {
  target += 0;
}

Right? But if we return to the original conditional example from my post, that example is equivalent to

if(b) {
  y ~ normal(mu, sigma);
}
else {
  y ~ normal(mu, sigma);
  target += c;
}

(where I’ve replaced target += normal_lpdf(y | mu, sigma) in the else branch with a sampling + adding back in the dropped constant c). So a similar effect occurs here. Even though it’s a “constant” being added or not added, whether the addition occurs is control flow dependent, so it’s as if you’re adding a varying a value.

nhuurre · October 13, 2022, 11:20am

That’s correct; the normalizing constants dropped by sampling statements are not truly constant if the statement is conditioned on the parameters and that does change sampling behaviour.
Note that if (mu > 0) (when mu is a parameter) also interacts badly with automatic differentiation and step size adaptation. I forget if the user guide tells you not to do it but at least the compiler does emit a warning when invoked with --warn-pedantic flag. (Conditionals depending only data are always fine, however.)

jtassaro · October 13, 2022, 2:30pm

Thanks for confirming! As your comment alludes to, it seems like the issue only arises today in models that are probably problematic for inference anyway.

But if, say in the future, Stan were to support discrete parameters, the current behavior would be a bit confusing. For example, if you have a mixture model where the functional form of the distribution you sample from depends on the mixture such as:

id[i] ~ mixture_distribution();

if (id[i] == 1) {
  x[i] ~ normal(mu1, sigma);
}
else {
  x[i] ~ cauchy(mu2, sigma2);
}

then dropping the additive constants means this does not really do what it looks like.

I think it might be better to only drop additive constants when a static analysis shows there’s no control-flow dependency on parameters. But perhaps that would be too much of a change for backwards compatibility reasons (or maybe not, since most models should probably not be doing this today anyway).

nhuurre · October 13, 2022, 2:46pm

I think that hypothetical future is very unlikely but if it does happen then yes, static code analysis to find what constants can be dropped is a good idea.

Topic		Replies	Views
Option to keep constant terms in log probability via standard sampling syntax? Developers	4	465	January 17, 2021
What is the difference between specifying X ~ N(mu, simga2) and target += normal_lpdf(mu, sigma2) General	17	948	July 26, 2020
Why does dropping constants give different parameter estimates if seed and inits are the same? General specification	3	463	July 24, 2020
Is there an issue with mixing "~" and "target +=" Modeling	6	1027	August 1, 2017
Target += vs. sampling for model comparison using loo Modeling loo	7	1321	July 31, 2021

Target += vs. sampling: does dropping constants preserve sampling behavior?

Related topics