Retrieve sigma on original scale when sigma is a function of a predictor: Sigma ~ x - 1

I fit a model that has heteroskedasticity in brms where sigma is a function of a predictor. But when I generate data and then try to retrieve the parameter values, I’m not succeeding. This may be more of a statistical question than a technical one, but I’m not sure. Here is a very simple example to illustrate the problem:

R:

  n <- 100
  x <- seq(1,100,length.out = n)
  my_sigma_x <- 1.2
  sigma <- my_sigma_x * x
  y <- rnorm(n,0,sigma)
  simple <- data.frame(x,y)

  form <- bf(y ~ 1,
             sigma ~ x - 1)

  m_simple <- brm(form, chains = 2, cores = 2, data = simple)
  summary(m_simple)

Family: gaussian
Links: mu = identity; sigma = log
Formula: y ~ 1
sigma ~ x - 1
Data: simple (Number of observations: 100)
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 2000

Population-Level Effects:
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -0.25      0.48    -1.17     0.65        852 1.00
sigma_x       0.10      0.00     0.10     0.11       2000 1.00

I expected sigma_x to be 1.2, instead it is 0.1. For other values of my_sigma_x, sigma_x is also off.

Below are some values of my_sigma_x that I tested and the corresponding sigma_x from stan.

my_sigma_x <- c(.0001, .001,.05,.1,.3,.6,.9,1.2,1.5, 2.5, 5)
stan_sigma_x <- c(-0.056, -0.03, .017, .02617, .05202, .076358, .0879, .10, .116, .14, .19)
d <- data.frame(my_sigma_x, stan_sigma_x)

Here’s a plot of this data:

f

A square root transformation of my_sigma_x makes the relationship linear for positive values of stan_sigma_x, but it is not one-to-one. I fit a ols model to find the ratio between sqrt(my_sigma_x) and stan_sigma_x:

summary(lm(stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11,]))

Call:
lm(formula = stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11, 
    ])

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0064202 -0.0049437  0.0009736  0.0023291  0.0066590 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.003885   0.003362   1.156    0.286    
sqrt(my_sigma_x) 0.086104   0.002893  29.759 1.25e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.005219 on 7 degrees of freedom
Multiple R-squared:  0.9922,	Adjusted R-squared:  0.991 
F-statistic: 885.6 on 1 and 7 DF,  p-value: 1.247e-08

This describes the relationship between my_sigma_x and stan_sigma_x very well. The sqrt makes sense if variance and standard deviation are being confused somewhere, but what is the meaning of the coefficient on sqrt(my_sigma_x)?

What is going on here? Why is my_sigma_x not equal to sigma_x? Where is my misunderstanding?

Thanks for your help.

  • Operating System: mac os 10.13.6 High Sierra
  • brms Version: 2.3.1
1 Like

If you check the stan code with make_stancode(form, data = simple), you’ll see that brms exponentiates sigma before returning it when you fit it. If you try exp(0.1), you’ll get a value of 1.11, which is pretty close to your value of 1.2.

Hope this is helpful!

Thanks for the suggestion, mathesong. I noticed the exp(sigma) in the code too (I think that’s to prevent sampling too close to 0?), but I don’t think that’s the solution. exp(0.1) is close to 1.2 by luck. For other values, the difference is substantial: exp(.116) is not close to 5.

Here is the generated stan code (removing blank blocks).

// generated with brms 2.3.1
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K_sigma;  // number of population-level effects 
  matrix[N, K_sigma] X_sigma;  // population-level design matrix 
  int prior_only;  // should the likelihood be ignored? 
} 
parameters { 
  real temp_Intercept;  // temporary intercept 
  vector[K_sigma] b_sigma;  // population-level effects 
} 
model { 
  vector[N] mu = rep_vector(0, N) + temp_Intercept; 
  vector[N] sigma = X_sigma * b_sigma; 
  for (n in 1:N) { 
    sigma[n] = exp(sigma[n]); 
  } 
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma); 
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept; 
}

It looks like sigma is exponetiated, and then used as an argument in normal_lpdf.

If I change the code from:

target += normal_lpdf(Y | mu, sigma);

to:

target += normal_lpdf(Y | mu, log(sigma));

then run the model:
m2 <- stan("simple.stan", data = standata(m_simple))

the sigma_x (here called “b_sigma[1]”) value is correctly recovered:

Inference for Stan model: simple.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                  mean se_mean   sd    2.5%     25%     50%     75%   97.5%
temp_Intercept    1.14    0.02 0.91   -0.60    0.52    1.14    1.75    2.94
b_sigma[1]        1.18    0.00 0.09    1.03    1.12    1.18    1.24    1.36
b_Intercept       1.14    0.02 0.91   -0.60    0.52    1.14    1.75    2.94
lp__           -524.96    0.02 0.99 -527.58 -525.37 -524.65 -524.24 -523.98
               n_eff Rhat
temp_Intercept  2863    1
b_sigma[1]      2857    1
b_Intercept     2863    1
lp__            1630    1

Samples were drawn using NUTS(diag_e) at Thu Aug  2 10:49:13 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

I think this is a bug. Should I post it to https://github.com/paul-buerkner/brms?

This also seems to be a problem in example 6 of the ?brm. sigma_xb should be 2, but is about .8.

changing to target += normal_lpdf(Y | mu, log(sigma)); solves the problem on this simple example. But on the more complicated non-linear model that prompted this question, it is causing problems when there is more than one y for a given value of x.

The chains fail to initialize because the scale parameter is not greater than 0. I’m trying to set the initial values to be greater than 0 for sigma using the init argument, but have not yet had success.

That’s intentional. To make sure that sigma is strictly positive regardless of how the linear predictor looks like, we have have to apply the exponential function (or equivalently the log-link). This is also documented in ?brmsfamily

You may change the applied link function in the family argument, for instance brmsfamily("gaussian", link_sigma = "identity")

Needing sigma to be strictly positive makes sense to me. Thanks for pointing me to brmsfamily, I didn’t realize you could set the link function for sigma.

Switching the link function for sigma to identity causes “Rejecting initial value” errors because the values are less than 0 (as you pointed out would happen and is the reason why the link for sigma is “log”).

I suppose my question is really, “How can I get sigma back to the original scale, after I fit a model with brmsfamily("gaussian", link_sigma = "log")?”

Extract the posterior samples via posterior_samples, transform them via exp and then summarize via posterior_summary. But keep in mind that a linear effect on the log scale becomes multiplicative effect after back-transforming on the identity scale.

3 Likes

perfect. Thank you very much for your help. brms is an incredible package.