I fit a model that has heteroskedasticity in brms where sigma is a function of a predictor. But when I generate data and then try to retrieve the parameter values, I’m not succeeding. This may be more of a statistical question than a technical one, but I’m not sure. Here is a very simple example to illustrate the problem:
R:
n <- 100
x <- seq(1,100,length.out = n)
my_sigma_x <- 1.2
sigma <- my_sigma_x * x
y <- rnorm(n,0,sigma)
simple <- data.frame(x,y)
form <- bf(y ~ 1,
sigma ~ x - 1)
m_simple <- brm(form, chains = 2, cores = 2, data = simple)
summary(m_simple)
Family: gaussian
Links: mu = identity; sigma = log
Formula: y ~ 1
sigma ~ x - 1
Data: simple (Number of observations: 100)
Samples: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup samples = 2000Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept -0.25 0.48 -1.17 0.65 852 1.00 sigma_x 0.10 0.00 0.10 0.11 2000 1.00
I expected sigma_x to be 1.2, instead it is 0.1. For other values of my_sigma_x, sigma_x is also off.
Below are some values of my_sigma_x that I tested and the corresponding sigma_x from stan.
my_sigma_x <- c(.0001, .001,.05,.1,.3,.6,.9,1.2,1.5, 2.5, 5)
stan_sigma_x <- c(-0.056, -0.03, .017, .02617, .05202, .076358, .0879, .10, .116, .14, .19)
d <- data.frame(my_sigma_x, stan_sigma_x)
Here’s a plot of this data:
A square root transformation of my_sigma_x makes the relationship linear for positive values of stan_sigma_x, but it is not one-to-one. I fit a ols model to find the ratio between sqrt(my_sigma_x) and stan_sigma_x:
summary(lm(stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11,]))
Call:
lm(formula = stan_sigma_x ~ sqrt(my_sigma_x), data = d[3:11,
])
Residuals:
Min 1Q Median 3Q Max
-0.0064202 -0.0049437 0.0009736 0.0023291 0.0066590
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.003885 0.003362 1.156 0.286
sqrt(my_sigma_x) 0.086104 0.002893 29.759 1.25e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.005219 on 7 degrees of freedom
Multiple R-squared: 0.9922, Adjusted R-squared: 0.991
F-statistic: 885.6 on 1 and 7 DF, p-value: 1.247e-08
This describes the relationship between my_sigma_x and stan_sigma_x very well. The sqrt makes sense if variance and standard deviation are being confused somewhere, but what is the meaning of the coefficient on sqrt(my_sigma_x)?
What is going on here? Why is my_sigma_x not equal to sigma_x? Where is my misunderstanding?
Thanks for your help.
- Operating System: mac os 10.13.6 High Sierra
- brms Version: 2.3.1