Question regarding the handling of missing data in brms

Hi all,

I am planning an individual participant data meta-analysis in brms, in which daily observations are nested in participants, which are further nested in studies - the syntax looks something like this:

brm(bf(y ~ 0 + Intercept + covariate + x1 + x2 + x1:x2 + 
                      (1 + x1 | study:pid) + (1 + x1 + x2 + x1:x2 | study), 
                      hu ~ 0 + Intercept + covariate + x1 + x2 + x1:x2 + 
                      (1 + x1 | study:pid) + (1 + x1 + x2 + x1:x2 | study)), 
                   data = metadata, family = hurdle_negbinomial(), prior = metaPriors, sample_prior = TRUE, 
                    iter = 3000, chains = 4, backend = "cmdstanr", threads = threading(7))

As you can tell, the dependent variable is a zero-inflated count variable. X1 is a continuous within-participant predictor, X2 is a continuous between-participant predictor. There is missingness both in the within-participant predictor, which is assessed across all studies, and in the between-participant predictor, which is assessed only in a subset of studies (~ 50-60% of studies). My question is, how do I handle this missingness in the model? How do I prevent losing the data from the studies that did not assess x2 in estimating the x1 effect to listwise deletion? My only solution right now is to run a separate model in which I don’t include x2 at all. I don’t think multiple imputation here is feasible due to the complexity of the outcome/model, and this model will already run for multiple days on a supercomputing cluster due to the large amount of data, so imputation would make the computing time explode. I was wondering if there is a way in brms around listwise deletion or possibly to treat missing data as a parameter?
Help would be greatly appreciated!

Best,
Jonas

Have you looked at the brms vignette for missing data: Handle Missing Values with brms

I believe the Imputation During Model Fitting is what you’re after

When I tried that I got the following error:

Error: Argument ‘mi’ is not supported for family ‘hurdle_negbinomial(log)’.

So I assumed that this form of imputation would also not work for this model.

Ah I missed that you had a count outcome. In that case, there is no way to account for missingness in brms. This is because brms missingness treats the missing value as a parameter to be estimated, but Stan is not able to estimate discrete parameters. Your only option here (that I’m aware of, happy to be corrected) would be to impute the missing count data externally

1 Like