How to impose constraint on a parameter transformed from other parameters

I am fitting a nonlinear model consisting of parameters A, B, C, t_c, m, \omega, \phi, \sigma and data pairs (t, y_t) as below:

y_t \sim \mathcal{N}(A + B(t_c - t)^m + C(t_c - t)^m \cos(\omega\ log(t_c - t) + \phi), \sigma^2).

Yet I have an additional physical lower bound constraint (>=1) on a parameter transformed from other parameters as

D =\frac{m |B|}{\omega |C|} \geq 1.

I wonder how I can properly impose this additional constraint on a combination of model parameters in Stan. I don’t think constraining D as a transformed parameter in the transformed parameters block is appropriate.

It seems that I may impose as a soft constraint on D in the prior specification, but then I’ve no idea how the Jacobian adjustment term is calculated.

Many thanks!

1 Like

How come? What that achieves is that Stan works behind the scenes to define an unconstrained parameter and automatically adds the necessary jacobians.

Oh, wait, sorry, I’m wrong.

Expressing a bound in TP only induces a check that flags the sample as bad if the check fails, it doesn’t do any auto-jacobian stuff as with bounds on parameters proper.

I think you’re stuck with having to work out the jacobian unfortunately. I really wish that was an automated thing, which surely it could be.

The issue here is more subtle than getting a Jacobian right. If you put a “prior” (there’s a reason for these quotation marks) on a derived quantity, then a lot of your intuition about the joint prior will immediately break. For example, the margins of the prior pushforward distribution for your untransformed parameters will no longer correspond to the sampling statements that you use to represent their “priors”. In general, the whole point of Jacobian adjustments is to ensure that the prior pushforward distribution corresponds to what the prior sampling statements appear to describe, and so it’s not clear what a Jacobian adjustment would be intended to achieve here. If you want to venture into the territory of putting a prior over a derived quantity, it’s best not to think of the prior sampling statements as sampling statements at all, but rather as a language to build a joint prior distribution over the entire parameter space. Thinking this way, it’s up to you to figure out whether you’ve specified the right distribution or not. If you don’t have the right distribution, you can add more terms to the prior-generating-function (I just made up that term, BTW). Maybe some of those terms will look sorta like Jacobian adjustments; maybe they won’t. But all of the rules of thumb about when you do or don’t need a Jacobian adjustment go out the window.
Once you’ve built a prior in this manner, you need to check very carefully to ensure that it’s not implying anything you don’t intend. And this checking gets hard when the parameter space is high-dimensional. So for example, if you start with some prior distribution and then crop out all of the regions of parameter space where some derived quantity is greater than one, you might unintentionally end up with overwhelming probability mass within an overly-narrow range for one parameter. Or you might end up implying an unrealistic prior covariance between two parameters. Or you might end up implying some unrealistically constrained nonlinear relationship between pairs or even larger groups of parameters. Or…

With all of that said if:

  • You really do want to just take the prior implied by the sampling statements and crop out the region where D<1, and
  • The constraint isn’t there for identifiability, but rather reflects a piece of domain expertise, and
  • Without the constraint there’s substantial probability mass in regions where D > 1, and
  • You don’t need to squeeze every last bit of efficiency out of your implementation,

THEN, you can achieve the constraint you want by simply fitting the model without the constraint and rejecting all posterior iterations that violate the constraint. To be safe, you’d need to make sure that you pass all diagnostics before throwing out the rejected iterations.

2 Likes

Some good advice in here already. Also see http://models.street-artists.org/2017/05/30/once-more-about-stan-and-jacobian-warnings/ and Non-invertible change of variables? - #11 by dlakelan from @dlakelan for some thoughts on how to include this type of information in a model.

1 Like

Haven’t logged in here in a long time but thanks for the references!

Here’s one way to think about the issue. suppose for example you have the knowledge that “x,y are somewhere near the unit circle in x,y plane”. One way you could handle this is to parameterize as radius and angle. But another way is to provide a prior over x,y which puts high probability near this circle.

p(x,y) = normal(x,0,5) * normal(y,0,5) * normal(sqrt(x^2+y^2),1,.1)

is such a prior. You actually can just get rid of the first two factors. and use:

p(x,y) = normal(sqrt(x^2+y^2),1,.1)

by itself, but often in real world problems there’s some reason to first provide some basic “bounds” on each variable, and then tighten that with some joint information

What’s the “jacobian adjustment” required when you put “a prior” on sqrt(x^2+y^2) ?? there isn’t one. Why? Because it’s not a transformation of two variables x,y into two other variables foo,bar… it’s a direct expression of the joint prior on x,y Literally your prior knowledge is “x,y are distributed according to the density function normal(sqrt(x^2+y^2),1,0.1)”. Just be aware that if you calculate sqrt(x^2+y^2) you will not find that it has normal(1,0.1) distribution. Rather (x,y) pairs will have the distribution normal(sqrt(x^2+y^2),1,0.1) which is a different thing.

Often it’s worthwhile to express a prior on complicated parameter spaces as some vague independent priors over each parameter, multiplied by some “deformation factor” which squishes probability density in some regions of the space and expresses the dependency between the parameters.

For example, you might express a prior over spline knots as independently each one is “near 0” and then also each one is “not that far from the previous one”, which would express a bunch of dependencies between the dimensions. Doing this you could express priors over “smooth functions that don’t vary too quickly and stay near 0 everywhere”

Sometimes this technique turns out to be hard to sample, and you should reparameterize.

1 Like