Imposing constraint in transformed parameters block

Hi all,

I am doing an artificial exercise to learn about how Stan deals with constraint in the transformed parameters block.

First, I generate data under a simple linear regression model with \beta=(1.0, 0.3)^T and \sigma = 0.5. Then I fit model_Ex3a.stan, a correct model. Everything is fine, of course. The estimates are:

     mean  2.5% 97.5%

beta[1] 1.005 0.957 1.049
beta[2] 0.316 0.271 0.361
sigma 0.509 0.478 0.542

Now I do an experiment. Suppose now that I have a constraint that \beta_1*\beta_2 + \sigma has to greater than a fixed number. To do that I put the following into model_Ex3b.stan.

transformed parameters {
	real<**lower=0.0**> constraint;
	constraint = beta[1] * beta[2] + sigma;
}

We can see that the estimates in the output above obviously satisfy the constraint: real<lower=0.0> constraint. Therefore when I run this code, everything is fine. The estimates remains the same.

The story changes when I do

transformed parameters {
	real<**lower=0.9**> constraint;
	constraint = beta[1] * beta[2] + sigma;
}

Obviously, the true values, \beta=(1.0, 0.3)^T and \sigma = 0.5, do not satisfy this constraint that is greater than 0.9. When I fit this (model_Ex3c) there are two things:

  • Biased estimates (of course)
  • There were 1903 divergent transitions after warmup.

I am curious and would like to know about the second. So I would have several questions:

  • How does Stan know in this case to flag divergent or not?
  • Does Stan know that the space, imposed by the constraint in this case, is smaller than the parameter space (without constraint).

Thank you!

Kind regards,
Trung Dung.

Ex3.txt (7.5 KB)
model_Ex3b.stan (397 Bytes)
model_Ex3c.stan (397 Bytes)
model_Ex3a.stan (296 Bytes)
R codes Ex3.R (1.8 KB)

I think you are misunderstanding some of the internal workings of Stan. In particular constraints in parameters block have a different role than in other blocks, in particular only the constraints in parameter block are enforced by the sampler.

Stan manual, section 6.6 (on transformed parameters block) says

Like the constraints on data, the constraints on transformed parameters is meant
to catch programming errors as well as convey programmer intent. They are not
automatically transformed in such a way as to be satisfied. What will happen if a
transformed parameter does not match its constraint is that the current parameter
values will be rejected.

What this means is that your model is not “avoiding” the prohibited part of parameter space or that it uses the constrain to somehow truncate the posterior distribution. It just skips any steps that would lead it inside this space. The weird thing (and this may actually be a bug) is that you don’t get any warning about failing a constraint.

Now for the divergences: Divergences happen when the log density changes in a drastically different way than what would be expected from the derivative of the log density. Since failing a constraint is implemented by assigning -Inf to the log density, the difference between actual and expected is infinite (which is indeed considered drastical) and divergence is signalled everytime the constraint is broken - which is fairly often. (note: the last sentence is my best guess, it would be great if @Bob_Carpenter or someone else from the dev team checked it is correct).

You don’t get divergences in the second model, because divergences are only signalled after warmup (as divergences before warmup may just mean the parameters of the sampler need to be adjusted) and in the second model, the chain stays in the permitted region for all the post-warmup samples.

If you want to learn more I wrote a “popular” explanation of divergences on my blog: http://www.martinmodrak.cz/2018/02/19/taming-divergences-in-stan-models/ and a more detailed understanding can be found in the conceptual introduction paper: https://arxiv.org/abs/1701.02434

Enforcing constraints
If you want to perform constrained regression, you need to update your parameters to make the constraint always true, e.g. a (quite stupid) solution would be:

parameters {
	vector[2] beta;			// regression coefficients
	real<lower=0> sigma_raw;		// SD
}

transformed parameters {
	real constraint = beta[1] * beta[2] + sigma_raw;
	real sigma;
	if(constraint < 0.9) {
		sigma = sigma_raw + (0.9 - constraint);
	} else {
		sigma = sigma_raw;
	}
}

This still has some divergences, probably because the transformation is not smooth and because the data are in gross disagreement with the model, but I hope you get the idea.

2 Likes

Thank you so much, @martinmodrak, for an useful and detained explanation. There are several things I would like to make them clear to me:

  • The constraint in the transformed parameters block is used to reject the current proposal if the proposal does not satisfy. I find in page 96

“Rejections in the transformed parameters and model blocks are not in and of themselves instantly fatal. The result has the same effect as assigning a −1 log probability,
which causes rejection of the current proposal in MCMC samplers and adjustment of
search parameters in optimization.”

  • Thus, 1903 divergent transitions after warmup means that there are 1903 times that the proposal is rejected because it violates the constraint.

  • The second model (3b) is fine because that no proposal violates the constraint.

  • In modeling, if the model has a constraint, and fitting with Stan gives many divergent transitions, it might indicate that the data does not agree with the model, i.e. there might be another better model

  • If the parameter space (S) is a subset of the space (S1) implied by the constraint, then there might be no divergent transitions. In this case, the constraint might not be necessary.

Please correct me if I am wrong in some or all bullets!

Trung Dung.

I think you are mostly correct, only two things to stress:

  • Constraints in transformed parameters are designed for error checking, not for modelling. In particular, there is AFAIK no guarantee that a posterior distribution of a model with such a constraint generated by Stan will be the same as the theoretically correct distribution as (I think) some of the necessary conditions for NUTS to converge no longer hold.
  • Divergent transitions are only recorded after warmup, constraint may be violated also during warmup but that would not register as divergent transition. Constraint violation during warmup might still have subtle effects on the fit as warmup determines some parameters of the sampler.
1 Like

Stan assumes that every parameter value that satisfies the declared constraints will have a finite log density (i.e., positive density).

Using constraints on transformed parameters to reject will cause discontinuities and can seriously hurt or even completely destroy HMC’s ability to sample.