Thanks for the suggestion Martin. I am familiar with the vignette and was looking to do something different (because the methods shown there won’t work for my problem).
I finally had time to look into this and explore how to use the stanvars
function to do what I want to do.
Before presenting my (half working) solution, I’ll recap the problem:
I want to estimate the total effect of e on y, but y has missing data.
The full data generating process is (omitting error variances):
m \sim N(e,\sigma_m) \\ y \sim N(e\cdot l + m,\sigma_y) where l\cdot e stands for the interaction of variables l and e. Further, the probability of missing data is
P_{miss_y} \sim bern(inv.logit(m + l))
Inspired by the full information maximum likelihood (FIML) approach to dealing with missing data, one way to deal with missing data is to simultaneously
- imputed the missing values and
- estimate y \sim e
- estimate the co-variances of y,m,e,l
Such a model can be implemented in brms as follows (with all variables in the data.frame my_data
):
(a) Specify code to be added to the parameters and model blocks of the basic brms model.
parameters_code = "
corr_matrix[4] Omega;
vector<lower=0>[4] tau;
vector[4] mus;
"
Variables are now in capital letters. Yl
is the brms-generated variable with imputed missing values.
model_code = "
mus ~ normal(0,2);
tau ~ normal(0,2);
Omega ~ lkj_corr(2.0);
for (i in 1:N) {
target += multi_normal_lpdf(to_vector({M[i],iEL[i],Yl[i]}) |
mus,
quad_form_diag(Omega,tau));
}
"
put model code to be added and auxiliary variables in a stan_vars
object:
stanvars =
stanvar(x = as.vector(my_data$E),
name = "E") +
stanvar(x = as.vector(my_data$M),
name = "M") +
stanvar(x = as.vector(my_data$E*my_data$L),
name = "iEL") +
stanvar(block = "parameters",
scode = parameters_code) +
stanvar(block = "model",
scode = model_code)
(Try to) run the modified brms model-
sf <- brm(bf(Y | mi() ~ E),
data = mar_interaction_data,
stanvars = stanvars)
I write “try to run” because there remain 2 problems:
- brms puts the additional model code between the variable declarations and the rest of the original brms code, i.e. before missing values are put into the vector
Yl
. I can work around this by manually modifiying the brms code, but maybe @paul.buerkner has an idea that is more elegant than what is described here? - Even if there is an elegant way to solve this coding problem, I found that the estimation of the co-variance and the effect model interfere with each other. Only when I multiply the log posterior for the covariance estimation with a large number (e.g. 100) am I able to recover the correct parameter values. I’ll describe this problem in a different thread.