Missing data in a monotonic variable

I have data that are missing in a monotonic predictor variable. Is it possible to deal with this in brms? If not in the brms interface, is this possible in Stan at all? And what would be the easiest route to go from brms-produced Stan code to the final code that would do this? I have a feeling that it might not be possible because of the discrete parameter, but I’m hoping I’m wrong.

The clear alternative is to treat it as a linear variable. This variable isn’t the type that in my experience has benefited from treating it explicitly as monotonic.

(edited because I realized that mi() with a factor variable isn’t possible. Which strongly indicates to me that mi + mo won’t work…)

Argh, I basically want to trash this entire question. Treating the factor variable as linear is clearly worse than treating it as a factor or monotonic, both conceptually and evaluating model fit using something like loo_ic.

In this case, it seems like it will be best to use a program like mice that can impute categorical and then use brm_multiple to fit it. I really didn’t want to do this, so that I could properly deal with uncertainty like imputation during model fitting does. But since I have missing data on both continuous and categorical data, maybe it will be best to use imputation before model fitting for missing values on categorical data and imputation during model fitting for continuous data.

Any thoughts on which of the approaches might be best?

In this case, I’d do multiple imputation first, and then fit the model with brm_multiple().

To clarify, @Solomon , would you do multiple imputation first on all variable, or just the categorical variables and then do imputation while fitting for the continuous ones?

I would impute using the full data set.

By monotonic predictor variable, do you mean there’s a vector of increasing values and one of them is missing, like

ordered_vector[5] x = [1.2. 3.7, 9.143, 14.2, 103];

but say the second value is missing. Then you can code this in Stan as:

data {
  ordered_vector[5] x_obs;  // x_obs[2] is missing, so can just be NaN or anything on int;ut
  ...

parameters {
  real<lower=x_obs[1], upper=x_obs[3]> x_obs_2;
  ...

transformed parameters {
  ordered_vector[5] x = x_obs;
  x[2] = x_obs_2;
  ...

This will implicitly give x_obs_2 a uniform prior between its upper and lower bound—this could be replaced with another prior if you want to do this like imputation. If it’s at the boundary, then it either gets an upper bound if it’s at the start or a lower-bound if it’s at the end.

1 Like