# Retrieve sigma on original scale when sigma is a function of a predictor: Sigma ~ x - 1

#1

I fit a model that has heteroskedasticity in brms where sigma is a function of a predictor. But when I generate data and then try to retrieve the parameter values, I’m not succeeding. This may be more of a statistical question than a technical one, but I’m not sure. Here is a very simple example to illustrate the problem:

R:

``````  n <- 100
x <- seq(1,100,length.out = n)
my_sigma_x <- 1.2
sigma <- my_sigma_x * x
y <- rnorm(n,0,sigma)
simple <- data.frame(x,y)

form <- bf(y ~ 1,
sigma ~ x - 1)

m_simple <- brm(form, chains = 2, cores = 2, data = simple)
summary(m_simple)
``````

Family: gaussian
Links: mu = identity; sigma = log
Formula: y ~ 1
sigma ~ x - 1
Data: simple (Number of observations: 100)
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 2000

``````Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -0.25      0.48    -1.17     0.65        852 1.00
sigma_x       0.10      0.00     0.10     0.11       2000 1.00
``````

I expected sigma_x to be 1.2, instead it is 0.1. For other values of my_sigma_x, sigma_x is also off.

Below are some values of my_sigma_x that I tested and the corresponding sigma_x from stan.

``````my_sigma_x <- c(.0001, .001,.05,.1,.3,.6,.9,1.2,1.5, 2.5, 5)
stan_sigma_x <- c(-0.056, -0.03, .017, .02617, .05202, .076358, .0879, .10, .116, .14, .19)
d <- data.frame(my_sigma_x, stan_sigma_x)
``````

Here’s a plot of this data:

A square root transformation of my_sigma_x makes the relationship linear for positive values of stan_sigma_x, but it is not one-to-one. I fit a ols model to find the ratio between sqrt(my_sigma_x) and stan_sigma_x:

`summary(lm(stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11,]))`

``````Call:
lm(formula = stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11,
])

Residuals:
Min         1Q     Median         3Q        Max
-0.0064202 -0.0049437  0.0009736  0.0023291  0.0066590

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)      0.003885   0.003362   1.156    0.286
sqrt(my_sigma_x) 0.086104   0.002893  29.759 1.25e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.005219 on 7 degrees of freedom
Multiple R-squared:  0.9922,	Adjusted R-squared:  0.991
F-statistic: 885.6 on 1 and 7 DF,  p-value: 1.247e-08
``````

This describes the relationship between my_sigma_x and stan_sigma_x very well. The sqrt makes sense if variance and standard deviation are being confused somewhere, but what is the meaning of the coefficient on sqrt(my_sigma_x)?

What is going on here? Why is my_sigma_x not equal to sigma_x? Where is my misunderstanding?

• Operating System: mac os 10.13.6 High Sierra
• brms Version: 2.3.1

#2

If you check the stan code with `make_stancode(form, data = simple)`, you’ll see that brms exponentiates sigma before returning it when you fit it. If you try exp(0.1), you’ll get a value of 1.11, which is pretty close to your value of 1.2.

#3

Thanks for the suggestion, mathesong. I noticed the exp(sigma) in the code too (I think that’s to prevent sampling too close to 0?), but I don’t think that’s the solution. exp(0.1) is close to 1.2 by luck. For other values, the difference is substantial: exp(.116) is not close to 5.

#4

Here is the generated stan code (removing blank blocks).

``````// generated with brms 2.3.1
data {
int<lower=1> N;  // total number of observations
vector[N] Y;  // response variable
int<lower=1> K_sigma;  // number of population-level effects
matrix[N, K_sigma] X_sigma;  // population-level design matrix
int prior_only;  // should the likelihood be ignored?
}
parameters {
real temp_Intercept;  // temporary intercept
vector[K_sigma] b_sigma;  // population-level effects
}
model {
vector[N] mu = rep_vector(0, N) + temp_Intercept;
vector[N] sigma = X_sigma * b_sigma;
for (n in 1:N) {
sigma[n] = exp(sigma[n]);
}
// priors including all constants
target += student_t_lpdf(temp_Intercept | 3, 0, 10);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = temp_Intercept;
}
``````

It looks like sigma is exponetiated, and then used as an argument in normal_lpdf.

If I change the code from:

`target += normal_lpdf(Y | mu, sigma);`

to:

`target += normal_lpdf(Y | mu, log(sigma));`

then run the model:
`m2 <- stan("simple.stan", data = standata(m_simple))`

the sigma_x (here called “b_sigma[1]”) value is correctly recovered:

``````Inference for Stan model: simple.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.

mean se_mean   sd    2.5%     25%     50%     75%   97.5%
temp_Intercept    1.14    0.02 0.91   -0.60    0.52    1.14    1.75    2.94
b_sigma[1]        1.18    0.00 0.09    1.03    1.12    1.18    1.24    1.36
b_Intercept       1.14    0.02 0.91   -0.60    0.52    1.14    1.75    2.94
lp__           -524.96    0.02 0.99 -527.58 -525.37 -524.65 -524.24 -523.98
n_eff Rhat
temp_Intercept  2863    1
b_sigma[1]      2857    1
b_Intercept     2863    1
lp__            1630    1

Samples were drawn using NUTS(diag_e) at Thu Aug  2 10:49:13 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
``````

I think this is a bug. Should I post it to https://github.com/paul-buerkner/brms?

This also seems to be a problem in example 6 of the `?brm`. `sigma_xb` should be 2, but is about .8.

#5

changing to `target += normal_lpdf(Y | mu, log(sigma));` solves the problem on this simple example. But on the more complicated non-linear model that prompted this question, it is causing problems when there is more than one `y` for a given value of `x`.

The chains fail to initialize because the scale parameter is not greater than 0. I’m trying to set the initial values to be greater than 0 for sigma using the `init` argument, but have not yet had success.

#6

That’s intentional. To make sure that sigma is strictly positive regardless of how the linear predictor looks like, we have have to apply the exponential function (or equivalently the log-link). This is also documented in `?brmsfamily`

You may change the applied link function in the family argument, for instance `brmsfamily("gaussian", link_sigma = "identity")`

#7

Needing sigma to be strictly positive makes sense to me. Thanks for pointing me to brmsfamily, I didn’t realize you could set the link function for sigma.

Switching the link function for sigma to identity causes “Rejecting initial value” errors because the values are less than 0 (as you pointed out would happen and is the reason why the link for sigma is “log”).

I suppose my question is really, “How can I get sigma back to the original scale, after I fit a model with `brmsfamily("gaussian", link_sigma = "log")`?”

#8

Extract the posterior samples via `posterior_samples`, transform them via `exp` and then summarize via `posterior_summary`. But keep in mind that a linear effect on the log scale becomes multiplicative effect after back-transforming on the identity scale.

#9

perfect. Thank you very much for your help. brms is an incredible package.