Estimation of covariance over variables in a regression model

Thanks for the suggestion Martin. I am familiar with the vignette and was looking to do something different (because the methods shown there won’t work for my problem).

I finally had time to look into this and explore how to use the stanvars function to do what I want to do.

Before presenting my (half working) solution, I’ll recap the problem:

I want to estimate the total effect of e on y, but y has missing data.
The full data generating process is (omitting error variances):
m \sim N(e,\sigma_m) \\ y \sim N(e\cdot l + m,\sigma_y) where l\cdot e stands for the interaction of variables l and e. Further, the probability of missing data is
P_{miss_y} \sim bern(inv.logit(m + l))

Inspired by the full information maximum likelihood (FIML) approach to dealing with missing data, one way to deal with missing data is to simultaneously

  • imputed the missing values and
  • estimate y \sim e
  • estimate the co-variances of y,m,e,l

Such a model can be implemented in brms as follows (with all variables in the data.frame my_data):

(a) Specify code to be added to the parameters and model blocks of the basic brms model.

parameters_code = "
corr_matrix[4] Omega;
vector<lower=0>[4] tau;
vector[4] mus;
"

Variables are now in capital letters. Yl is the brms-generated variable with imputed missing values.

model_code = "
mus ~ normal(0,2);
tau ~ normal(0,2);
Omega ~ lkj_corr(2.0);

for (i in 1:N) {
  target += multi_normal_lpdf(to_vector({M[i],iEL[i],Yl[i]}) | 
                              mus,
                              quad_form_diag(Omega,tau));
}
"

put model code to be added and auxiliary variables in a stan_vars object:


stanvars = 
  stanvar(x = as.vector(my_data$E),
          name = "E") +
  stanvar(x = as.vector(my_data$M),
          name = "M") +
  stanvar(x = as.vector(my_data$E*my_data$L),
          name = "iEL") +
  stanvar(block = "parameters",
          scode = parameters_code) +
  stanvar(block = "model",
          scode = model_code) 

(Try to) run the modified brms model-

sf <- brm(bf(Y | mi() ~ E),
          data = mar_interaction_data,
          stanvars = stanvars)

I write “try to run” because there remain 2 problems:

  1. brms puts the additional model code between the variable declarations and the rest of the original brms code, i.e. before missing values are put into the vector Yl. I can work around this by manually modifiying the brms code, but maybe @paul.buerkner has an idea that is more elegant than what is described here?
  2. Even if there is an elegant way to solve this coding problem, I found that the estimation of the co-variance and the effect model interfere with each other. Only when I multiply the log posterior for the covariance estimation with a large number (e.g. 100) am I able to recover the correct parameter values. I’ll describe this problem in a different thread.
1 Like