Missing data in a monotonic variable

CurtisAtkisson · March 11, 2025, 10:38pm

I have data that are missing in a monotonic predictor variable. Is it possible to deal with this in brms? If not in the brms interface, is this possible in Stan at all? And what would be the easiest route to go from brms-produced Stan code to the final code that would do this? I have a feeling that it might not be possible because of the discrete parameter, but I’m hoping I’m wrong.

The clear alternative is to treat it as a linear variable. This variable isn’t the type that in my experience has benefited from treating it explicitly as monotonic.

(edited because I realized that mi() with a factor variable isn’t possible. Which strongly indicates to me that mi + mo won’t work…)

CurtisAtkisson · March 11, 2025, 11:47pm

Argh, I basically want to trash this entire question. Treating the factor variable as linear is clearly worse than treating it as a factor or monotonic, both conceptually and evaluating model fit using something like loo_ic.

In this case, it seems like it will be best to use a program like mice that can impute categorical and then use brm_multiple to fit it. I really didn’t want to do this, so that I could properly deal with uncertainty like imputation during model fitting does. But since I have missing data on both continuous and categorical data, maybe it will be best to use imputation before model fitting for missing values on categorical data and imputation during model fitting for continuous data.

Any thoughts on which of the approaches might be best?

Solomon · March 12, 2025, 2:33pm

In this case, I’d do multiple imputation first, and then fit the model with brm_multiple().

CurtisAtkisson · March 12, 2025, 3:22pm

To clarify, @Solomon , would you do multiple imputation first on all variable, or just the categorical variables and then do imputation while fitting for the continuous ones?

Solomon · March 12, 2025, 3:46pm

I would impute using the full data set.

Bob_Carpenter · March 13, 2025, 8:12pm

By monotonic predictor variable, do you mean there’s a vector of increasing values and one of them is missing, like

ordered_vector[5] x = [1.2. 3.7, 9.143, 14.2, 103];

but say the second value is missing. Then you can code this in Stan as:

data {
  ordered_vector[5] x_obs;  // x_obs[2] is missing, so can just be NaN or anything on int;ut
  ...

parameters {
  real<lower=x_obs[1], upper=x_obs[3]> x_obs_2;
  ...

transformed parameters {
  ordered_vector[5] x = x_obs;
  x[2] = x_obs_2;
  ...

This will implicitly give x_obs_2 a uniform prior between its upper and lower bound—this could be replaced with another prior if you want to do this like imputation. If it’s at the boundary, then it either gets an upper bound if it’s at the start or a lower-bound if it’s at the end.

Topic		Replies	Views
Missing categorical group data Modeling brms , missing-data	1	540	January 22, 2023
Missing data of main effects in model with interaction terms brms missing-data	17	3113	October 4, 2022
Brms: mi() for discrete outcomes in an IRT model brms specification , irt , missing-data	13	1904	June 18, 2021
Modeling missing discrete covariates in regression model? Modeling specification , discrete-parameters	8	142	February 10, 2025
Missing data in categorical data models Modeling rstan	7	1312	August 12, 2023

Missing data in a monotonic variable

Related topics