Losing observation when modelling missing values

Hi! I’m attempting to fit multiple outcomes with missing values using the mi() functions in BRMS. However, it seems that when fitting these models the number of observations reported in the fit seem to not reflect the number of observations after the mi() took place. In fact, after checking it seems to reflect where all outcome variables (y3, y2, y1) are not missing. Is this an issue with my specification (using “:”) or intended?

Thanks

formula <- bf(y3 ~ mi(y2):dummy2 + mi(y1):dummy1 + s(time)) +
bf(y2 | mi() ~ mi(y1) + s(time)) +
bf(y1 | mi() ~ X + s(time))

Please also provide the following information in addition to your question:

  • Operating System: Using Rstudio on a multi core cluster computer
  • brms Version: 2.2 0

Does brms warn about excluding observations due to NAs?
Perhaps there are also other variables with NAs in the model which lead to the exclusion.

Nope. Only warning messages are:

Warning messages:
1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
We recommend running more iterations and/or setting stronger priors. 
2: There were 4 divergent transitions after warmup. Increasing adapt_delta above 0.85 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Perhaps you accidently remove the NAs already before passing the data to brms?

If not, please provide a minimal reproducible example for me to try out.

Hi Paul, Thanks for the quick reply. I’ve rerun the model with my full dataset and a subset and did find that after the sampling there is the warning about observations were being excluded due to NAs. However, this is strange as the subset only has missing values in the values that are to be imputed. Please find this same subset and the brm code I’m using.

I realize the number of iterations is low, but you’ll see that even these 3 smaller chains take about 40 minutes each and the major issue still remains that while this subset of my data has over 3,500 observations, the model output only shows having 37 observations, which is reflected in the fit$data.

Really appreciate your help and insights.
Matt

formula <- bf(ln_value_pc ~ mi(ln_hc_lev_1_total_pc) * hc_lev_1 + mi(ln_hp_lev_1_total_pc) * hp_lev_1 + mi(ln_hc_lev_2_total_pc) * hc_lev_2 + mi(ln_hp_lev_2_total_pc) * hp_lev_2 + mi(ln_hc_lev_3_total_pc) * hc_lev_3 + mi(ln_hp_lev_3_total_pc) * hp_lev_3 + mi(ln_che_pc) * che_lev + s(year, bs = "ps")) + 
bf(ln_hc_lev_3_total_pc | mi() ~ mi(ln_hc_lev_2_total_pc):sha_hc_lev3_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hp_lev_3_total_pc | mi() ~ mi(ln_hp_lev_2_total_pc):sha_hp_lev3_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hc_lev_2_total_pc | mi() ~ mi(ln_hc_lev_1_total_pc):sha_hc_lev2_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hp_lev_2_total_pc | mi() ~ mi(ln_hp_lev_1_total_pc):sha_hp_lev2_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hc_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hc_lev1_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_hp_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hp_lev1_num + s(year, bs = "ps") + (1 | c_num)) +
bf(ln_che_pc | mi() ~ s(year, bs = "ps") + (1 | c_num) + ln_the_pc)

fit <- brm(formula, data = data_sub, chains = 4, iter = 300, warmup = 150, seed = 02152019, cores = 34 control = list(max_treedepth = 15,adapt_delta = 0.85))

Just for your quick reference, below is the summary of the fit from the above brm run.

Family: MV(gaussian, gaussian, gaussian, gaussian, gaussian, gaussian, gaussian, gaussian) 
  Links: mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity
         mu = identity; sigma = identity 
Formula: ln_value_pc ~ mi(ln_hc_lev_1_total_pc) * hc_lev_1 + mi(ln_hp_lev_1_total_pc) * hp_lev_1 + mi(ln_hc_lev_2_total_pc) * hc_lev_2 + mi(ln_hp_lev_2_total_pc) * hp_lev_2 + mi(ln_hc_lev_3_total_pc) * hc_lev_3 + mi(ln_hp_lev_3_total_pc) * hp_lev_3 + mi(ln_che_pc) * che_lev + s(year, bs = "ps") 
         ln_hc_lev_3_total_pc | mi() ~ mi(ln_hc_lev_2_total_pc):sha_hc_lev3_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hp_lev_3_total_pc | mi() ~ mi(ln_hp_lev_2_total_pc):sha_hp_lev3_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hc_lev_2_total_pc | mi() ~ mi(ln_hc_lev_1_total_pc):sha_hc_lev2_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hp_lev_2_total_pc | mi() ~ mi(ln_hp_lev_1_total_pc):sha_hp_lev2_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hc_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hc_lev1_num + s(year, bs = "ps") + (1 | c_num) 
         ln_hp_lev_1_total_pc | mi() ~ mi(ln_che_pc):sha_hp_lev1_num + s(year, bs = "ps") + (1 | c_num) 
         ln_che_pc | mi() ~ s(year, bs = "ps") + (1 | c_num) + ln_the_pc 
   Data: data_sub (Number of observations: 37) 
Samples: 4 chains, each with iter = 300; warmup = 150; thin = 1; 
         total post-warmup samples = 600
    ICs: LOO = NA; WAIC = NA; R2 = NA
 
Smooth Terms: 
                             Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sds(lnvaluepc_syear_1)           0.13      0.15     0.00     0.52        186 1.01
sds(lnhclev3totalpc_syear_1)     0.26      0.15     0.07     0.64         38 1.09
sds(lnhplev3totalpc_syear_1)     0.03      0.04     0.00     0.13        241 1.01
sds(lnhclev2totalpc_syear_1)     0.07      0.05     0.03     0.17         37 1.08
sds(lnhplev2totalpc_syear_1)     0.05      0.06     0.00     0.22        173 1.01
sds(lnhclev1totalpc_syear_1)     0.01      0.02     0.00     0.07         31 1.12
sds(lnhplev1totalpc_syear_1)     0.21      0.07     0.12     0.37         98 1.04
sds(lnchepc_syear_1)             0.02      0.01     0.00     0.05        101 1.01

Group-Level Effects: 
~c_num (Number of levels: 1) 
                              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(lnhclev3totalpc_Intercept)     7.31      6.35     0.10    23.73        232 1.00
sd(lnhplev3totalpc_Intercept)     8.40      8.59     0.38    28.43        260 1.00
sd(lnhclev2totalpc_Intercept)     6.40      5.18     0.40    19.47        215 1.02
sd(lnhplev2totalpc_Intercept)     8.13      7.93     0.33    32.45        118 1.02
sd(lnhclev1totalpc_Intercept)     7.53      6.99     0.17    25.99        250 1.00
sd(lnhplev1totalpc_Intercept)    11.08      9.81     0.12    35.91        103 1.05
sd(lnchepc_Intercept)             7.73      7.28     0.14    28.40        158 1.03

Population-Level Effects: 
                                                       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
lnvaluepc_Intercept                                       39.00    138.29  -150.69   308.95          3 3.20
lnhclev3totalpc_Intercept                                  8.04      5.28    -4.12    15.97        100 1.04
lnhplev3totalpc_Intercept                                 -0.28      5.86   -14.11     9.92         47 1.08
lnhclev2totalpc_Intercept                                  1.08      5.05    -8.76    11.88        105 1.01
lnhplev2totalpc_Intercept                                  0.02      6.34   -17.13    10.03         91 1.03
lnhclev1totalpc_Intercept                                  2.22      5.40    -9.99    12.80         78 1.06
lnhplev1totalpc_Intercept                                 -9.42     10.04   -24.78    10.53         57 1.10
lnchepc_Intercept                                          5.63      5.77    -7.86    15.68         54 1.10
lnvaluepc_hc_lev_1                                       -80.25     84.59  -233.64    63.33          3 3.58
lnvaluepc_hp_lev_1                                        18.22     61.40   -76.06   202.09          4 1.71
lnvaluepc_hc_lev_2                                        -0.50     87.49  -185.15   144.84          3 2.13
lnvaluepc_hp_lev_2                                       -13.52     90.81  -185.43   142.15          3 2.53
lnvaluepc_hc_lev_3                                       -10.58     98.83  -196.31   116.83          2 4.33
lnvaluepc_hp_lev_3                                       -29.89     60.25  -128.99    80.38          8 1.80
lnvaluepc_che_lev                                          2.30     36.07   -62.08    66.47          3 2.09
lnvaluepc_syear_1                                         -0.99     10.42   -19.77    18.60          5 1.38
lnhclev3totalpc_syear_1                                    6.72      0.31     6.12     7.36         52 1.08
lnhplev3totalpc_syear_1                                    1.15      0.30     0.49     1.74        440 1.01
lnhclev2totalpc_syear_1                                    0.99      0.26     0.42     1.50        166 1.02
lnhplev2totalpc_syear_1                                    1.70      0.28     1.20     2.33        317 1.01
lnhclev1totalpc_syear_1                                    1.28      0.25     0.79     1.75        214 1.01
lnhplev1totalpc_syear_1                                   -3.90      0.75    -5.24    -2.31        106 1.04
lnchepc_ln_the_pc                                         -0.26      0.16    -0.57     0.05        263 1.01
lnchepc_syear_1                                            1.45      0.15     1.16     1.74        265 1.00
lnvaluepc_miln_hc_lev_1_total_pc                          -4.28     11.88   -27.87    19.19          9 1.16
lnvaluepc_miln_hp_lev_1_total_pc                          -2.10      2.86    -8.05     3.43        124 1.03
lnvaluepc_miln_hc_lev_2_total_pc                           2.74      6.23   -11.90    14.00         34 1.21
lnvaluepc_miln_hp_lev_2_total_pc                           8.55     16.83   -19.63    44.89          4 2.59
lnvaluepc_miln_hc_lev_3_total_pc                           1.37      0.14     1.10     1.64        281 1.01
lnvaluepc_miln_hp_lev_3_total_pc                          -6.20     16.81   -42.80    21.63          4 2.58
lnvaluepc_miln_che_pc                                     35.03     57.67   -49.64   128.91          3 3.14
lnvaluepc_miln_hc_lev_1_total_pc:hc_lev_1                 30.25     82.05  -148.65   195.88          5 2.25
lnvaluepc_miln_hp_lev_1_total_pc:hp_lev_1                 -6.83     23.92   -46.23    42.14          7 1.98
lnvaluepc_miln_hc_lev_2_total_pc:hc_lev_2                 55.92     92.04   -74.53   262.06          2 2.92
lnvaluepc_miln_hp_lev_2_total_pc:hp_lev_2                 -6.37     58.04  -167.19    90.81          4 2.03
lnvaluepc_miln_hc_lev_3_total_pc:hc_lev_3                 -4.58     63.79  -148.86    70.82          3 3.07
lnvaluepc_miln_hp_lev_3_total_pc:hp_lev_3                -12.97     67.57  -148.93   118.21          4 2.90
lnvaluepc_miln_che_pc:che_lev                            -33.55     56.10  -132.15    52.73          3 3.00
lnhclev3totalpc_miln_hc_lev_2_total_pc:sha_hc_lev3_num    -1.09      0.01    -1.11    -1.07        600 1.01
lnhplev3totalpc_miln_hp_lev_2_total_pc:sha_hp_lev3_num     0.02      0.02    -0.01     0.06        404 1.01
lnhclev2totalpc_miln_hc_lev_1_total_pc:sha_hc_lev2_num     0.05      0.07    -0.08     0.19        140 1.03
lnhplev2totalpc_miln_hp_lev_1_total_pc:sha_hp_lev2_num    -0.04      0.02    -0.06    -0.00        520 1.01
lnhclev1totalpc_miln_che_pc:sha_hc_lev1_num                0.17      0.20    -0.20     0.55        216 1.00
lnhplev1totalpc_miln_che_pc:sha_hp_lev1_num                1.53      0.22     1.05     1.94        107 1.04

Family Specific Parameters: 
                      Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sigma_lnvaluepc           1.13      0.15     0.87     1.46        227 1.01
sigma_lnhclev3totalpc     0.08      0.01     0.05     0.11         60 1.06
sigma_lnhplev3totalpc     0.27      0.04     0.21     0.35        600 1.00
sigma_lnhclev2totalpc     0.01      0.00     0.01     0.01        119 1.04
sigma_lnhplev2totalpc     0.26      0.03     0.20     0.33        266 1.00
sigma_lnhclev1totalpc     0.01      0.00     0.01     0.02        105 1.04
sigma_lnhplev1totalpc     0.03      0.00     0.02     0.04        234 1.01
sigma_lnchepc             0.01      0.00     0.01     0.01        135 1.01

Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
is a crude measure of effective sample size, and Rhat is the potential 
scale reduction factor on split chains (at convergence, Rhat = 1).
Warning messages:
1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! 
We recommend running more iterations and/or setting stronger priors. 
2: There were 197 divergent transitions after warmup. Increasing adapt_delta above 0.85 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup <a class="attachment" href="//cdck-file-uploads-global.s3.dualstack.us-west-2.amazonaws.com/standard14/uploads/mc_stan/original/2X/6/68bf8cdcc4c70c32b804ca1f0a90be4d6c434cf5.r">data_subset.r</a> (624.4 KB)
 

Making sure the dataset is here for my above comment.

data_subset.r (681.1 KB)

The dataformat seems to be invalid. In any case, I would expect that you forgot to wrap some variable in mi() although it contains missing values. If not, please try to provide minimal reproducible example of your problem that strips away all the complexity of your model which is unrelated to the problem. Otherwise it’s super hard to diagnose what’s the problem.

Apologies for the complex model. I’ve created a more minimal version below and have a csv of the data.


formula <- bf(ln_value_pc ~ mi(ln_hc_lev_3_total_pc) + mi(ln_hp_lev_3_total_pc) + mi(ln_che_pc)) + 
   bf(ln_hc_lev_3_total_pc | mi() ~ mi(ln_che_pc)*sha_hc_lev3_num + (1 | c_num)) +
   bf(ln_hp_lev_3_total_pc | mi() ~ mi(ln_che_pc)*sha_hp_lev3_num + (1 | c_num)) +
   bf(ln_che_pc | mi() ~ (1 | c_num) + ln_the_pc)

fit <- brm(formula, data = data_sub, chains = 1, iter = 300, warmup = 150, seed = 02152019, cores = c, control = list(max_treedepth = 15,adapt_delta = 0.85))
 

data_subset.csv (486.1 KB)

Your example works fine for me and returns 3625 observations just as desired. Perhaps updating brms to the latest version could help as I just see you are using an outdated version that is a year old.

Thanks, Paul. Yes I spent many hours yesterday trying to figure out why brms 2.8.0 wouldn’t update on R 3.5.0 but finally got an updated version of R on my cluster and am working with your latest github version. Thanks so much for verifying the model is working for you.