Interval constraints affect posterior means

I often see people trying to impose hard interval constraints on parameters that are not physical constraints from the model.

Example

For instance, someone might declare a scale parameter to be uniform(0, 10) rather than exponential(0.2), which has the same mean. This corresponds to the following declaration I saw posted in the forums recently,

real<lower = 0, upper = 10> sigma_y;

Not a problem for the MLE

Suppose you have a likelihood and you fit a maximum likelihood estimate \widehat{\theta}. If you then constrain the parameters so that they are forced to fall in hypercube around \theta, it does not change the maximum likelihood estimate.

Changes behavior with Bayesian posteriors

Things work very differently with Bayes. With Bayes, we do the change-of-variables adjustment and if the distribution without the constraints has probably mass beyond the constraint boundaries, you will see strong effects from inserting the boundary.

Simple worked example

Let’s see how this works in practice, first with an encroaching lower bound, then with a distant one.

Stan model

I could separate this into two Stan models, but given the variables are independent here, it’s easier to just write one.

parameters {
  real alpha;
  real<lower=0> beta;
}
model {
  alpha ~ normal(1, 1);
  beta ~ normal(1, 1);
}

Note that alpha is unconstrained, but beta has a lower bound of zero. Both get the same normal distribution with location 1 and scale 1.

Compile the model

First, we fire up Python and then compile the Stan program.

temp2$ python3

>>> import cmdstanpy as csp

>>> m = csp.CmdStanModel(stan_file='intervals.stan')
15:58:53 - cmdstanpy - INFO - compiling stan file /Users/bcarpenter/temp2/intervals.stan to exe file /Users/bcarpenter/temp2/intervals
...

MLE

Now let’s see what the maximum likelihood estimates look like.

>>> mle = m.optimize()
16:10:33 - cmdstanpy - INFO - Chain [1] start processing
...
>>> print(mle.optimized_params_dict)
OrderedDict({'lp__': np.float64(-6.52483e-12), 'alpha': np.float64(1.0), 'beta': np.float64(0.999997)})

Python’s default prints for dictionaries is awful, but if you dig in here, you can see that alpha is estimated as 1.0 and beta as 0.999997. Effectively the same to within the default precision we’re running.

My guess is this is where the intuition for the interval constraint comes from—it feels “uninformative” within that interval. But as we’ll see next, it’s not.

Bayes

Now let’s see what the Bayesian posterior estimates look like.


>>> s = m.sample()
15:59:07 - cmdstanpy - INFO - CmdStan start processing
...

>>> s.summary(sig_figs=2)
       Mean   MCSE  StdDev   MAD    5%  50%   95%  ESS_bulk  ESS_tail  R_hat
lp__  -0.84  0.033    1.10  0.81 -3.00 -0.5  0.23    1200.0    1400.0    1.0
alpha  1.00  0.023    0.97  0.97 -0.59  1.0  2.60    1800.0    2000.0    1.0
beta   1.30  0.019    0.79  0.84  0.18  1.2  2.70    1300.0    1200.0    1.0

Now beta, with the lower bound of 0, gets a very different estimate. All that probability mass (16% or so of the total) for normal(1, 1) that falls below 0 is pushed above zero, which in turns pushes the estimate higher.

Really wide intervals

If you change the lower=0 to lower=-10,

  real<lower=-10> beta;

then you’re well into the tail of normal(1, 1), and the bounds will not affect the posterior mean,

>>> s.summary(sig_figs=2)
       Mean   MCSE  StdDev   MAD    5%   50%  95%  ESS_bulk  ESS_tail  R_hat
lp__   1.40  0.023    0.99  0.74 -0.55  1.70  2.4    1900.0    2600.0    1.0
alpha  1.00  0.016    1.00  0.97 -0.67  1.00  2.7    3800.0    2500.0    1.0
beta   0.97  0.017    1.00  0.99 -0.70  0.98  2.6    3500.0    2800.0    1.0