Losing observation when modelling missing values

mts24 · April 9, 2019, 9:54pm

Hi! I’m attempting to fit multiple outcomes with missing values using the mi() functions in BRMS. However, it seems that when fitting these models the number of observations reported in the fit seem to not reflect the number of observations after the mi() took place. In fact, after checking it seems to reflect where all outcome variables (y3, y2, y1) are not missing. Is this an issue with my specification (using “:”) or intended?

Thanks

formula <- bf(y3 ~ mi(y2):dummy2 + mi(y1):dummy1 + s(time)) +
bf(y2 | mi() ~ mi(y1) + s(time)) +
bf(y1 | mi() ~ X + s(time))

Please also provide the following information in addition to your question:

Operating System: Using Rstudio on a multi core cluster computer
brms Version: 2.2 0

paul.buerkner · April 9, 2019, 9:56pm

Does brms warn about excluding observations due to NAs?
Perhaps there are also other variables with NAs in the model which lead to the exclusion.

mts24 · April 9, 2019, 9:58pm

Nope. Only warning messages are:

Warning messages:
1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
We recommend running more iterations and/or setting stronger priors. 
2: There were 4 divergent transitions after warmup. Increasing adapt_delta above 0.85 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

paul.buerkner · April 9, 2019, 10:03pm

Perhaps you accidently remove the NAs already before passing the data to brms?

If not, please provide a minimal reproducible example for me to try out.

mts24 · April 10, 2019, 4:52pm

Hi Paul, Thanks for the quick reply. I’ve rerun the model with my full dataset and a subset and did find that after the sampling there is the warning about observations were being excluded due to NAs. However, this is strange as the subset only has missing values in the values that are to be imputed. Please find this same subset and the brm code I’m using.

I realize the number of iterations is low, but you’ll see that even these 3 smaller chains take about 40 minutes each and the major issue still remains that while this subset of my data has over 3,500 observations, the model output only shows having 37 observations, which is reflected in the fit$data.

Really appreciate your help and insights.
Matt

formula <- bf(ln_value_pc ~ mi(ln_hc_lev_1_total_pc) * hc_lev_1 + mi(ln_hp_lev_1_total_pc) * hp_lev_1 + mi(ln_hc_lev_2_total_pc) * hc_lev_2 + mi(ln_hp_lev_2_total_pc) * hp_lev_2 + mi(ln_hc_lev_3_total_pc) * hc_lev_3 + mi(ln_hp_lev_3_total_pc) * hp_lev_3 + mi(ln_che_pc) * che_lev + s(year, bs = "ps")) + 
bf(ln_hc_lev_3_total_pc | mi() ~ mi(ln_hc_lev_2_total_pc):sha_hc_lev3_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hp_lev_3_total_pc | mi() ~ mi(ln_hp_lev_2_total_pc):sha_hp_lev3_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hc_lev_2_total_pc | mi() ~ mi(ln_hc_lev_1_total_pc):sha_hc_lev2_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hp_lev_2_total_pc | mi() ~ mi(ln_hp_lev_1_total_pc):sha_hp_lev2_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hc_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hc_lev1_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hp_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hp_lev1_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_che_pc | mi() ~ s(year, bs = "ps") + (1 | c_num) + ln_the_pc)

fit <- brm(formula, data = data_sub, chains = 4, iter = 300, warmup = 150, seed = 02152019, cores = 34 control = list(max_treedepth = 15,adapt_delta = 0.85))

Just for your quick reference, below is the summary of the fit from the above brm run.

Family: MV(gaussian, gaussian, gaussian, gaussian, gaussian, gaussian, gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: ln_value_pc ~ mi(ln_hc_lev_1_total_pc) * hc_lev_1 + mi(ln_hp_lev_1_total_pc) * hp_lev_1 + mi(ln_hc_lev_2_total_pc) * hc_lev_2 + mi(ln_hp_lev_2_total_pc) * hp_lev_2 + mi(ln_hc_lev_3_total_pc) * hc_lev_3 + mi(ln_hp_lev_3_total_pc) * hp_lev_3 + mi(ln_che_pc) * che_lev + s(year, bs = "ps") 
         ln_hc_lev_3_total_pc | mi() ~ mi(ln_hc_lev_2_total_pc):sha_hc_lev3_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hp_lev_3_total_pc | mi() ~ mi(ln_hp_lev_2_total_pc):sha_hp_lev3_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hc_lev_2_total_pc | mi() ~ mi(ln_hc_lev_1_total_pc):sha_hc_lev2_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hp_lev_2_total_pc | mi() ~ mi(ln_hp_lev_1_total_pc):sha_hp_lev2_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hc_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hc_lev1_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hp_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hp_lev1_num + s(year, bs = "ps") + (1 | c_num) 
         ln_che_pc | mi() ~ s(year, bs = "ps") + (1 | c_num) + ln_the_pc 
   Data: data_sub (Number of observations: 37) 
Samples: 4 chains, each with iter = 300; warmup = 150; thin = 1; 
         total post-warmup samples = 600
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Smooth Terms: 
                             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sds(lnvaluepc_syear_1)           0.13      0.15     0.00     0.52        186 1.01
sds(lnhclev3totalpc_syear_1)     0.26      0.15     0.07     0.64         38 1.09
sds(lnhplev3totalpc_syear_1)     0.03      0.04     0.00     0.13        241 1.01
sds(lnhclev2totalpc_syear_1)     0.07      0.05     0.03     0.17         37 1.08
sds(lnhplev2totalpc_syear_1)     0.05      0.06     0.00     0.22        173 1.01
sds(lnhclev1totalpc_syear_1)     0.01      0.02     0.00     0.07         31 1.12
sds(lnhplev1totalpc_syear_1)     0.21      0.07     0.12     0.37         98 1.04
sds(lnchepc_syear_1)             0.02      0.01     0.00     0.05        101 1.01

Group-Level Effects: 
~c_num (Number of levels: 1) 
                              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(lnhclev3totalpc_Intercept)     7.31      6.35     0.10    23.73        232 1.00
sd(lnhplev3totalpc_Intercept)     8.40      8.59     0.38    28.43        260 1.00
sd(lnhclev2totalpc_Intercept)     6.40      5.18     0.40    19.47        215 1.02
sd(lnhplev2totalpc_Intercept)     8.13      7.93     0.33    32.45        118 1.02
sd(lnhclev1totalpc_Intercept)     7.53      6.99     0.17    25.99        250 1.00
sd(lnhplev1totalpc_Intercept)    11.08      9.81     0.12    35.91        103 1.05
sd(lnchepc_Intercept)             7.73      7.28     0.14    28.40        158 1.03

Population-Level Effects: 
                                                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
lnvaluepc_Intercept                                       39.00    138.29  -150.69   308.95          3 3.20
lnhclev3totalpc_Intercept                                  8.04      5.28    -4.12    15.97        100 1.04
lnhplev3totalpc_Intercept                                 -0.28      5.86   -14.11     9.92         47 1.08
lnhclev2totalpc_Intercept                                  1.08      5.05    -8.76    11.88        105 1.01
lnhplev2totalpc_Intercept                                  0.02      6.34   -17.13    10.03         91 1.03
lnhclev1totalpc_Intercept                                  2.22      5.40    -9.99    12.80         78 1.06
lnhplev1totalpc_Intercept                                 -9.42     10.04   -24.78    10.53         57 1.10
lnchepc_Intercept                                          5.63      5.77    -7.86    15.68         54 1.10
lnvaluepc_hc_lev_1                                       -80.25     84.59  -233.64    63.33          3 3.58
lnvaluepc_hp_lev_1                                        18.22     61.40   -76.06   202.09          4 1.71
lnvaluepc_hc_lev_2                                        -0.50     87.49  -185.15   144.84          3 2.13
lnvaluepc_hp_lev_2                                       -13.52     90.81  -185.43   142.15          3 2.53
lnvaluepc_hc_lev_3                                       -10.58     98.83  -196.31   116.83          2 4.33
lnvaluepc_hp_lev_3                                       -29.89     60.25  -128.99    80.38          8 1.80
lnvaluepc_che_lev                                          2.30     36.07   -62.08    66.47          3 2.09
lnvaluepc_syear_1                                         -0.99     10.42   -19.77    18.60          5 1.38
lnhclev3totalpc_syear_1                                    6.72      0.31     6.12     7.36         52 1.08
lnhplev3totalpc_syear_1                                    1.15      0.30     0.49     1.74        440 1.01
lnhclev2totalpc_syear_1                                    0.99      0.26     0.42     1.50        166 1.02
lnhplev2totalpc_syear_1                                    1.70      0.28     1.20     2.33        317 1.01
lnhclev1totalpc_syear_1                                    1.28      0.25     0.79     1.75        214 1.01
lnhplev1totalpc_syear_1                                   -3.90      0.75    -5.24    -2.31        106 1.04
lnchepc_ln_the_pc                                         -0.26      0.16    -0.57     0.05        263 1.01
lnchepc_syear_1                                            1.45      0.15     1.16     1.74        265 1.00
lnvaluepc_miln_hc_lev_1_total_pc                          -4.28     11.88   -27.87    19.19          9 1.16
lnvaluepc_miln_hp_lev_1_total_pc                          -2.10      2.86    -8.05     3.43        124 1.03
lnvaluepc_miln_hc_lev_2_total_pc                           2.74      6.23   -11.90    14.00         34 1.21
lnvaluepc_miln_hp_lev_2_total_pc                           8.55     16.83   -19.63    44.89          4 2.59
lnvaluepc_miln_hc_lev_3_total_pc                           1.37      0.14     1.10     1.64        281 1.01
lnvaluepc_miln_hp_lev_3_total_pc                          -6.20     16.81   -42.80    21.63          4 2.58
lnvaluepc_miln_che_pc                                     35.03     57.67   -49.64   128.91          3 3.14
lnvaluepc_miln_hc_lev_1_total_pc:hc_lev_1                 30.25     82.05  -148.65   195.88          5 2.25
lnvaluepc_miln_hp_lev_1_total_pc:hp_lev_1                 -6.83     23.92   -46.23    42.14          7 1.98
lnvaluepc_miln_hc_lev_2_total_pc:hc_lev_2                 55.92     92.04   -74.53   262.06          2 2.92
lnvaluepc_miln_hp_lev_2_total_pc:hp_lev_2                 -6.37     58.04  -167.19    90.81          4 2.03
lnvaluepc_miln_hc_lev_3_total_pc:hc_lev_3                 -4.58     63.79  -148.86    70.82          3 3.07
lnvaluepc_miln_hp_lev_3_total_pc:hp_lev_3                -12.97     67.57  -148.93   118.21          4 2.90
lnvaluepc_miln_che_pc:che_lev                            -33.55     56.10  -132.15    52.73          3 3.00
lnhclev3totalpc_miln_hc_lev_2_total_pc:sha_hc_lev3_num    -1.09      0.01    -1.11    -1.07        600 1.01
lnhplev3totalpc_miln_hp_lev_2_total_pc:sha_hp_lev3_num     0.02      0.02    -0.01     0.06        404 1.01
lnhclev2totalpc_miln_hc_lev_1_total_pc:sha_hc_lev2_num     0.05      0.07    -0.08     0.19        140 1.03
lnhplev2totalpc_miln_hp_lev_1_total_pc:sha_hp_lev2_num    -0.04      0.02    -0.06    -0.00        520 1.01
lnhclev1totalpc_miln_che_pc:sha_hc_lev1_num                0.17      0.20    -0.20     0.55        216 1.00
lnhplev1totalpc_miln_che_pc:sha_hp_lev1_num                1.53      0.22     1.05     1.94        107 1.04

Family Specific Parameters: 
                      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma_lnvaluepc           1.13      0.15     0.87     1.46        227 1.01
sigma_lnhclev3totalpc     0.08      0.01     0.05     0.11         60 1.06
sigma_lnhplev3totalpc     0.27      0.04     0.21     0.35        600 1.00
sigma_lnhclev2totalpc     0.01      0.00     0.01     0.01        119 1.04
sigma_lnhplev2totalpc     0.26      0.03     0.20     0.33        266 1.00
sigma_lnhclev1totalpc     0.01      0.00     0.01     0.02        105 1.04
sigma_lnhplev1totalpc     0.03      0.00     0.02     0.04        234 1.01
sigma_lnchepc             0.01      0.00     0.01     0.01        135 1.01

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning messages:
1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
We recommend running more iterations and/or setting stronger priors. 
2: There were 197 divergent transitions after warmup. Increasing adapt_delta above 0.85 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup <a class="attachment" href="//cdck-file-uploads-canada1.s3.dualstack.ca-central-1.amazonaws.com/flex030/uploads/mc_stan/original/2X/6/68bf8cdcc4c70c32b804ca1f0a90be4d6c434cf5.r">data_subset.r</a> (624.4 KB)

mts24 · April 10, 2019, 4:53pm

Making sure the dataset is here for my above comment.

data_subset.r (681.1 KB)

paul.buerkner · April 10, 2019, 5:16pm

The dataformat seems to be invalid. In any case, I would expect that you forgot to wrap some variable in mi() although it contains missing values. If not, please try to provide minimal reproducible example of your problem that strips away all the complexity of your model which is unrelated to the problem. Otherwise it’s super hard to diagnose what’s the problem.

mts24 · April 10, 2019, 6:36pm

Apologies for the complex model. I’ve created a more minimal version below and have a csv of the data.


formula <- bf(ln_value_pc ~ mi(ln_hc_lev_3_total_pc) + mi(ln_hp_lev_3_total_pc) + mi(ln_che_pc)) + 
   bf(ln_hc_lev_3_total_pc | mi() ~ mi(ln_che_pc)*sha_hc_lev3_num + (1 | c_num)) +
   bf(ln_hp_lev_3_total_pc | mi() ~ mi(ln_che_pc)*sha_hp_lev3_num + (1 | c_num)) +
   bf(ln_che_pc | mi() ~ (1 | c_num) + ln_the_pc)

fit <- brm(formula, data = data_sub, chains = 1, iter = 300, warmup = 150, seed = 02152019, cores = c, control = list(max_treedepth = 15,adapt_delta = 0.85))

data_subset.csv (486.1 KB)

paul.buerkner · April 11, 2019, 7:02am

Your example works fine for me and returns 3625 observations just as desired. Perhaps updating brms to the latest version could help as I just see you are using an outdated version that is a year old.

mts24 · April 11, 2019, 10:50pm

Thanks, Paul. Yes I spent many hours yesterday trying to figure out why brms 2.8.0 wouldn’t update on R 3.5.0 but finally got an updated version of R on my cluster and am working with your latest github version. Thanks so much for verifying the model is working for you.

Topic		Replies	Views
Exclude NA's in brms Modeling brms	1	1171	March 18, 2021
BRM function suddenly stopped running and throws a range of different errors Modeling	26	2141	October 7, 2020
Predict error when using subset for the outcome and mi for a covariate brms bug	2	619	December 21, 2020
Modeling missing data on outcomes in brms brms	9	1308	March 3, 2022
Warning after training model with brms brms	10	1269	August 6, 2021

Losing observation when modelling missing values

Related topics