# How to fix unconstrained growth of deviation parameter?

Hello Stan community,

I’ve been trying to troubleshoot a complex Stan model for about 2.5 days, but some sigma fits just go crazy. Here’s a minimal example - can someone tell me what’s wrong with it? The model problem is that we weigh two boxes, twice each, and want estimates for average box weight (mu), standard weight deviation (sigma), and measurement error (m_sd). For the sake of this example, let say that mu=1000g, box1=999g, box2=1001g, sigma = approx 1, and m_sd = appox 1.

Here’s the self-contained Stan model:

``````transformed data {
int<lower=0> N; // total number of measurements
int<lower=0> boxCount; // number of boxes to measure
int<lower=1> B; // index of box that was measured
real<lower=0> M; // weight measurements

boxCount = 2;
N = 4; // 2 measurements per box
B = {1, 1, 2, 2}; // measured box 1 twice, then box 2 twice
M = {999+1, 999-1, 1001-1, 1001+1}; // measured weights
}
parameters {
real<lower=0> W[boxCount]; // true weight of a given box
real<lower=0> mu; // average weight of boxes
real<lower=0> sigma; // standard deviation of box weights
real<lower=0> m_sd; // standard deviation of measurements
}
model {
// True box weights
for (b in 1:boxCount) {
W[b] ~ normal(mu, sigma);
}
// Measured box weights
for (m in 1:N) {
M[m] ~ normal(W[B[m]], m_sd);
}
}
``````

The result of 100 iterations is as follows (sigma goes towards infinity with more iterations):

OS: x86_64, darwin13.4.0; rstan: 2.15.1; Rcpp: 0.12.8; inline: 0.3.14

Inference for Stan model: temp2.
1 chains, each with iter=100; warmup=50; thin=1;
post-warmup draws per chain=50, total post-warmup draws=50.

``````     mean se_mean      sd   2.5%    25%     50%     75%    97.5% n_eff Rhat
``````

W 999.87 18.61 118.90 755.96 957.54 984.58 1044.29 1230.22 41 0.98
W 969.13 23.85 130.73 626.53 931.98 998.59 1016.34 1181.13 30 0.98
mu 1074.46 296.18 1435.16 51.14 421.16 621.86 1065.05 3803.69 23 0.99
sigma 2129.80 819.44 3836.89 501.56 875.56 1005.34 1503.88 15966.03 22 1.03
m_sd 137.75 52.68 134.06 37.96 42.18 83.51 155.66 480.19 6 0.99
lp__ -2.17 1.01 2.38 -6.94 -3.70 -2.18 -0.61 1.84 6 1.12

Why do sigma and m_sd go crazy like that? They should be around 1 or so, not in the hundreds or thousands…

The analysis using linear mixed effects works just fine:

``````dfBox = data.frame(
B = factor(c(1, 1, 2, 2)),
M = c(999+1, 999-1, 1001-1, 1001+1)
)
library(lme4)
lmer1 = lmer(M ~ (1|B), data=dfBox)
> summary(lmer1)
Linear mixed model fit by REML ['lmerMod']
Formula: M ~ (1 | B)
Data: dfBox

REML criterion at convergence: 12.7

Scaled residuals:
Min      1Q  Median      3Q     Max
-1.0607 -0.5303  0.0000  0.5303  1.0607

Random effects:
Groups   Name        Variance Std.Dev.
B        (Intercept) 1        1.000
Residual             2        1.414
Number of obs: 4, groups:  B, 2

Fixed effects:
Estimate Std. Error t value
(Intercept)     1000          1    1000
> fixef(lmer1)
(Intercept)
1000
> ranef(lmer1)
\$B
(Intercept)
1        -0.5
2         0.5

> confint(lmer1)
Computing profile confidence intervals ...
2.5 %      97.5 %
.sig01        0.0000000    4.118386
.sigma        0.6739152    3.468133
(Intercept) 997.5865087 1002.413505
``````

Ellis

For that little amount of data, it is essential to express your prior beliefs about all the parameters, particularly `sigma` and `m_sd`. For any measurement error problem, you should have a pretty good idea about whether the measurement error is big or small.

In addition, it is tough for Stan to sample efficiently from a posterior distribution where some parameters is on the order of 1000 and others on are the order of 1. In this case, I would make `mu` a transformed parameter like

``````transformed parameters {
real mu = mu_loc + mu_scale * mu_raw;
}
``````

where `mu_raw` is declared in the `parameters` block instead of `mu`, `mu_loc` and `mu_scale` are passed into the `data` block, and `mu_raw` has a standard normal prior. This is more or less how we do hierarchical models in the rstanarm package, and you can confirm you get reasonable answers from it with reasonable prior information:

``````dfBox <- data.frame(
B = factor(c(1, 1, 2, 2)),
M = c(999+1, 999-1, 1001-1, 1001+1)
)
library(rstanarm)
post <- stan_lmer(M ~ (1|B), data = dfBox,
prior_intercept = normal(location = 1000, scale = 10),
prior_aux = exponential(1),
prior_covariance = decov(shape = 2, scale = 0.2))
``````
3 Likes

Thanks Ben, I really appreciate the information and suggestions! The result with stan_lmer looks promising. My attempts this morning to write Stan code equivalent to stan_lmer didn’t produce good results yet, though. I’ll see what I can figure out with that.

Parameter fits can easily “go crazy” if you don’t have priors.

You probably want to use a non-centered parameterization for `W` (draw `W_raw` from `normal(0, 1)` thn define `W = mu + W_raw * sigma`. It changes the distribution being sampled (now it’s `W_raw`), though the distribution for `W` will be the same.

You can also vectorize, as in:

``````W ~ normal(mu, sigma);
M ~ normal(W[B}, m_sd);
``````

Thanks, Bob. After doing some more reading and watching some lectures about Stan models, I was able to get the models to fit much better. One of the keys was using non-centered parameterization, as you and Ben described.