Dear all, I just started learning Stan/brms and I was wondering if it’s possible to define a regression model in brms where one of the covariates is unknown and, in its place, I measured another covariate that represents the true covariate plus some error. More or less, like this (6.1 Bayesian measurement error model | Stan User’s Guide) example reported in Stan’s user manual. If so, can you please provide an example?
I checked the me() and mi() functions and the corresponding examples, but, I think, they can only be used to specify the standard deviation of the error, not a different covariate.
When we talk about a measurement error, typically what we mean is that we have measured some values for a covariate, but we assume that the true values are not identical to the measured values. In particular, me() in brms assumes that the differences between the measured values and the true values are independent identically distributed Gaussian variates with a known standard deviation.
To map this onto your problem, the covariate that you measured could be treated as the “uncertain” measurement. If you are willing to assume that the true values are distributed about your measured covariate as an i.i.d. Gaussian with known standard deviation, then me() is what you need. This is the second Stan program given at the link you posted. Otherwise, I think some additional details about what you want will be useful. Do you want to relax the assumption of independence? The assumption of known standard deviation? The assumption of a Gaussian?
Thanks, now I think I understand it a little bit better. I tried to create an example with simulated data (see below) and I think everything works fine.
# packages
suppressPackageStartupMessages(library(brms))
# simulate data
set.seed(1)
n <- 250L
x <- rnorm(n) # the unmeasurable covariate
w <- rnorm(n, mean = x, sd = 1) # the observable proxy
log_lambda <- 1 + x # the log-linear regression model
y <- rpois(n, exp(log_lambda)) # the respons variable
# modelling step
my_bf <- bf(y ~ mi(w), family = poisson) +
bf(w | mi(se) ~ 1, family = gaussian) +
set_rescor(FALSE)
mod <- brm(
formula = my_bf,
data = list(y = y, w = w, se = rep(1, n)),
iter = 6000,
warmup = 2000,
cores = 4,
seed = 1L
)
summary(mod)
#> Family: MV(poisson, gaussian)
#> Links: mu = log
#> mu = identity; sigma = identity
#> Formula: y ~ mi(w)
#> w | mi(se) ~ 1
#> Data: list(y = y, w = w, se = rep(1, n)) (Number of observations: 250)
#> Draws: 4 chains, each with iter = 6000; warmup = 2000; thin = 1;
#> total post-warmup draws = 16000
#>
#> Population-Level Effects:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> y_Intercept 0.95 0.08 0.79 1.09 1.00 3908 7506
#> w_Intercept 0.05 0.09 -0.13 0.22 1.00 4591 8161
#> y_miw 0.96 0.09 0.80 1.15 1.00 1483 2815
#>
#> Family Specific Parameters:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma_w 1.01 0.09 0.85 1.18 1.00 1692 3360
#>
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
I use mi() instead of me() since the docs of me() says that it’s soft deprecated.
I would like to relax the assumption of known standard deviation, including that as an extra parameter. Do you know if it’s possible in brms? If so, can you suggest me some references or examples? After fixing that part, I would like to include a spatial random effect (i.e. CAR, PCAR or similar) in the second formula (i.e. the model defining the ME) since I think that in the “real scenario” there is also a spatial dimension in the ME [1]. In that case, I should simply add an extra term in the linear predictor, right?
Again, thanks.
[1] Bernadinelli, L., et al. “Disease mapping with errors in covariates.” Statistics in medicine 16.7 (1997): 741-752.
I’m not aware that this is possible in brms. If I am right that it’s not possible, then I think there’s a strong chance that there’s a good reason why it’s not possible. For example, maybe this standard deviation parameter is known to be extremely difficult to estimate. I suspect this is the case, because as this standard deviation parameter gets large, the model gains increasing flexibility to rearrange the “true” covariate such that the residual standard deviation gets tiny.
However, if you want to play around with this, you can use brms to generate your stan code, and then find and remove the fixed sd from the data declaration, and then add a parameter of the same name in the parameters block with an appropriate <lower=0> constraint in the declaration. You’d probably also want to add some suitable prior on this parameter in the model block.
Ok, thanks. I’ll take some days to learn the basics of Stan and then run a few tests/simulations. I’ll post something here in case I can develop something useful.