Retrieve sigma on original scale when sigma is a function of a predictor: Sigma ~ x - 1


#1

I fit a model that has heteroskedasticity in brms where sigma is a function of a predictor. But when I generate data and then try to retrieve the parameter values, I’m not succeeding. This may be more of a statistical question than a technical one, but I’m not sure. Here is a very simple example to illustrate the problem:

R:

  n <- 100
  x <- seq(1,100,length.out = n)
  my_sigma_x <- 1.2
  sigma <- my_sigma_x * x
  y <- rnorm(n,0,sigma)
  simple <- data.frame(x,y)

  form <- bf(y ~ 1,
             sigma ~ x - 1)

  m_simple <- brm(form, chains = 2, cores = 2, data = simple)
  summary(m_simple)

Family: gaussian
Links: mu = identity; sigma = log
Formula: y ~ 1
sigma ~ x - 1
Data: simple (Number of observations: 100)
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 2000

Population-Level Effects:
          Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept    -0.25      0.48    -1.17     0.65        852 1.00
sigma_x       0.10      0.00     0.10     0.11       2000 1.00

I expected sigma_x to be 1.2, instead it is 0.1. For other values of my_sigma_x, sigma_x is also off.

Below are some values of my_sigma_x that I tested and the corresponding sigma_x from stan.

my_sigma_x <- c(.0001, .001,.05,.1,.3,.6,.9,1.2,1.5, 2.5, 5)
stan_sigma_x <- c(-0.056, -0.03, .017, .02617, .05202, .076358, .0879, .10, .116, .14, .19)
d <- data.frame(my_sigma_x, stan_sigma_x)

Here’s a plot of this data:

f

A square root transformation of my_sigma_x makes the relationship linear for positive values of stan_sigma_x, but it is not one-to-one. I fit a ols model to find the ratio between sqrt(my_sigma_x) and stan_sigma_x:

summary(lm(stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11,]))

Call:
lm(formula = stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11, 
    ])

Residuals:
       Min         1Q     Median         3Q        Max 
-0.0064202 -0.0049437  0.0009736  0.0023291  0.0066590 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.003885   0.003362   1.156    0.286    
sqrt(my_sigma_x) 0.086104   0.002893  29.759 1.25e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.005219 on 7 degrees of freedom
Multiple R-squared:  0.9922,	Adjusted R-squared:  0.991 
F-statistic: 885.6 on 1 and 7 DF,  p-value: 1.247e-08

This describes the relationship between my_sigma_x and stan_sigma_x very well. The sqrt makes sense if variance and standard deviation are being confused somewhere, but what is the meaning of the coefficient on sqrt(my_sigma_x)?

What is going on here? Why is my_sigma_x not equal to sigma_x? Where is my misunderstanding?

Thanks for your help.

  • Operating System: mac os 10.13.6 High Sierra
  • brms Version: 2.3.1

#2

If you check the stan code with make_stancode(form, data = simple), you’ll see that brms exponentiates sigma before returning it when you fit it. If you try exp(0.1), you’ll get a value of 1.11, which is pretty close to your value of 1.2.

Hope this is helpful!


#3

Thanks for the suggestion, mathesong. I noticed the exp(sigma) in the code too (I think that’s to prevent sampling too close to 0?), but I don’t think that’s the solution. exp(0.1) is close to 1.2 by luck. For other values, the difference is substantial: exp(.116) is not close to 5.


#4

Here is the generated stan code (removing blank blocks).

// generated with brms 2.3.1
data { 
  int<lower=1> N;  // total number of observations 
  vector[N] Y;  // response variable 
  int<lower=1> K_sigma;  // number of population-level effects 
  matrix[N, K_sigma] X_sigma;  // population-level design matrix 
  int prior_only;  // should the likelihood be ignored? 
} 
parameters { 
  real temp_Intercept;  // temporary intercept 
  vector[K_sigma] b_sigma;  // population-level effects 
} 
model { 
  vector[N] mu = rep_vector(0, N) + temp_Intercept; 
  vector[N] sigma = X_sigma * b_sigma; 
  for (n in 1:N) { 
    sigma[n] = exp(sigma[n]); 
  } 
  // priors including all constants 
  target += student_t_lpdf(temp_Intercept | 3, 0, 10); 
  // likelihood including all constants 
  if (!prior_only) { 
    target += normal_lpdf(Y | mu, sigma); 
  } 
} 
generated quantities { 
  // actual population-level intercept 
  real b_Intercept = temp_Intercept; 
}

It looks like sigma is exponetiated, and then used as an argument in normal_lpdf.

If I change the code from:

target += normal_lpdf(Y | mu, sigma);

to:

target += normal_lpdf(Y | mu, log(sigma));

then run the model:
m2 <- stan("simple.stan", data = standata(m_simple))

the sigma_x (here called “b_sigma[1]”) value is correctly recovered:

Inference for Stan model: simple.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

                  mean se_mean   sd    2.5%     25%     50%     75%   97.5%
temp_Intercept    1.14    0.02 0.91   -0.60    0.52    1.14    1.75    2.94
b_sigma[1]        1.18    0.00 0.09    1.03    1.12    1.18    1.24    1.36
b_Intercept       1.14    0.02 0.91   -0.60    0.52    1.14    1.75    2.94
lp__           -524.96    0.02 0.99 -527.58 -525.37 -524.65 -524.24 -523.98
               n_eff Rhat
temp_Intercept  2863    1
b_sigma[1]      2857    1
b_Intercept     2863    1
lp__            1630    1

Samples were drawn using NUTS(diag_e) at Thu Aug  2 10:49:13 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

I think this is a bug. Should I post it to https://github.com/paul-buerkner/brms?

This also seems to be a problem in example 6 of the ?brm. sigma_xb should be 2, but is about .8.


#5

changing to target += normal_lpdf(Y | mu, log(sigma)); solves the problem on this simple example. But on the more complicated non-linear model that prompted this question, it is causing problems when there is more than one y for a given value of x.

The chains fail to initialize because the scale parameter is not greater than 0. I’m trying to set the initial values to be greater than 0 for sigma using the init argument, but have not yet had success.


#6

That’s intentional. To make sure that sigma is strictly positive regardless of how the linear predictor looks like, we have have to apply the exponential function (or equivalently the log-link). This is also documented in ?brmsfamily

You may change the applied link function in the family argument, for instance brmsfamily("gaussian", link_sigma = "identity")


#7

Needing sigma to be strictly positive makes sense to me. Thanks for pointing me to brmsfamily, I didn’t realize you could set the link function for sigma.

Switching the link function for sigma to identity causes “Rejecting initial value” errors because the values are less than 0 (as you pointed out would happen and is the reason why the link for sigma is “log”).

I suppose my question is really, “How can I get sigma back to the original scale, after I fit a model with brmsfamily("gaussian", link_sigma = "log")?”


#8

Extract the posterior samples via posterior_samples, transform them via exp and then summarize via posterior_summary. But keep in mind that a linear effect on the log scale becomes multiplicative effect after back-transforming on the identity scale.


#9

perfect. Thank you very much for your help. brms is an incredible package.