I often see people trying to impose hard interval constraints on parameters that are not physical constraints from the model.
Example
For instance, someone might declare a scale parameter to be uniform(0, 10)
rather than exponential(0.2)
, which has the same mean. This corresponds to the following declaration I saw posted in the forums recently,
real<lower = 0, upper = 10> sigma_y;
Not a problem for the MLE
Suppose you have a likelihood and you fit a maximum likelihood estimate \widehat{\theta}. If you then constrain the parameters so that they are forced to fall in hypercube around \theta, it does not change the maximum likelihood estimate.
Changes behavior with Bayesian posteriors
Things work very differently with Bayes. With Bayes, we do the change-of-variables adjustment and if the distribution without the constraints has probably mass beyond the constraint boundaries, you will see strong effects from inserting the boundary.
Simple worked example
Let’s see how this works in practice, first with an encroaching lower bound, then with a distant one.
Stan model
I could separate this into two Stan models, but given the variables are independent here, it’s easier to just write one.
parameters {
real alpha;
real<lower=0> beta;
}
model {
alpha ~ normal(1, 1);
beta ~ normal(1, 1);
}
Note that alpha
is unconstrained, but beta
has a lower bound of zero. Both get the same normal distribution with location 1 and scale 1.
Compile the model
First, we fire up Python and then compile the Stan program.
temp2$ python3
>>> import cmdstanpy as csp
>>> m = csp.CmdStanModel(stan_file='intervals.stan')
15:58:53 - cmdstanpy - INFO - compiling stan file /Users/bcarpenter/temp2/intervals.stan to exe file /Users/bcarpenter/temp2/intervals
...
MLE
Now let’s see what the maximum likelihood estimates look like.
>>> mle = m.optimize()
16:10:33 - cmdstanpy - INFO - Chain [1] start processing
...
>>> print(mle.optimized_params_dict)
OrderedDict({'lp__': np.float64(-6.52483e-12), 'alpha': np.float64(1.0), 'beta': np.float64(0.999997)})
Python’s default prints for dictionaries is awful, but if you dig in here, you can see that alpha
is estimated as 1.0 and beta
as 0.999997. Effectively the same to within the default precision we’re running.
My guess is this is where the intuition for the interval constraint comes from—it feels “uninformative” within that interval. But as we’ll see next, it’s not.
Bayes
Now let’s see what the Bayesian posterior estimates look like.
>>> s = m.sample()
15:59:07 - cmdstanpy - INFO - CmdStan start processing
...
>>> s.summary(sig_figs=2)
Mean MCSE StdDev MAD 5% 50% 95% ESS_bulk ESS_tail R_hat
lp__ -0.84 0.033 1.10 0.81 -3.00 -0.5 0.23 1200.0 1400.0 1.0
alpha 1.00 0.023 0.97 0.97 -0.59 1.0 2.60 1800.0 2000.0 1.0
beta 1.30 0.019 0.79 0.84 0.18 1.2 2.70 1300.0 1200.0 1.0
Now beta
, with the lower bound of 0, gets a very different estimate. All that probability mass (16% or so of the total) for normal(1, 1)
that falls below 0 is pushed above zero, which in turns pushes the estimate higher.
Really wide intervals
If you change the lower=0
to lower=-10
,
real<lower=-10> beta;
then you’re well into the tail of normal(1, 1)
, and the bounds will not affect the posterior mean,
>>> s.summary(sig_figs=2)
Mean MCSE StdDev MAD 5% 50% 95% ESS_bulk ESS_tail R_hat
lp__ 1.40 0.023 0.99 0.74 -0.55 1.70 2.4 1900.0 2600.0 1.0
alpha 1.00 0.016 1.00 0.97 -0.67 1.00 2.7 3800.0 2500.0 1.0
beta 0.97 0.017 1.00 0.99 -0.70 0.98 2.6 3500.0 2800.0 1.0