Inconsistent behaviour of model that includes measurement error term for a predictor

Hello,
I am analysing cross-sectional time-series aggregate data that summarise the annual effect of an environmental exposure (EE) on body-weight (bwt) in large groups of people (300-30000, median 1000). The same data are available as either the mean bwt (which I call xbwt) for each entire group, or as the number of people who exceed a pre-set bwt threshold in each group (which I call tbwt).

The environmental exposure is measured with error (ee.err) and I wish to account for this measurement error. So, I included me(EE, ee.err) in my analyses. The analyses also covary other aggregate population indices (ethnicity, income, etc) and random effects of large geographical regions and smaller socioeconomic areas (within regions) and I weight them by the inverse probability (ipwt) of receiving exposure, for each group.

My code for the negative binomial analysis is:-
mod1=brm(tbwt|weights(ipwt) ~ year+s(region,bs=‘mrf’,xt=list(polys=geom))
+(1|area)+ethnicity+income… etc… +me(EE,ee.err)+offset(log(N)), family=negbinomial(log))

My code for the analogous gaussian analysis of the aggregate mean bwt (xbwt) is:-
mod2=brm(xbwt|weights(ipwt) ~ year+s(region,bs=‘mrf’,xt=list(polys=geom))
+(1|area)+ethnicity+income… etc… +me(EE,ee.err), family=gaussian)

mod2 has the problem outlined below - but mod2a runs well:-
mod2a=brm(xbwt|weights(ipwt) ~ year+s(region,bs=‘mrf’,xt=list(polys=geom))
+(1|area)+ethnicity+income… etc… +EE, family=gaussian)

I run all the above models with:-
bwprior=c(set_prior(‘normal(0,1)’,class=‘b’),set_prior(‘normal(0,ee.err)’,class=‘meanme’))
brm(model, data = dat, chains = 4, cores = 4, iter=2500, prior=bwprior, future=F, control=list(adapt_delta=.95, max_treedepth = 15), inits=0, seed=123, save_mevars=T)

mod1 runs quickly (about an hour) and efficiently (all Rhats = 1.00 and all effective sample sizes >1000).
In contrast, mod2 runs slowly (3-8 hours) and extremely inefficiently (most Rhats>100 and most effective sample sizes<10).
mod2a runs as quickly and efficiently as the negative binomial model.

I am puzzled. If anything, I would have expected including the me(EE,ee.err) term to affect the negative binomial model, not the normal model.

Could you please explain what is happening here?

With many thanks, in anticipation of your help
Jonathan Williams

R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] brms_2.8.0 Rcpp_1.0.1

loaded via a namespace (and not attached):
[1] mvtnorm_1.0-8 lattice_0.20-35 zoo_1.8-5 gtools_3.5.0
[5] assertthat_0.2.0 digest_0.6.12 mime_0.5 R6_2.2.2
[9] plyr_1.8.4 backports_1.1.2 ggridges_0.5.0 stats4_3.4.3
[13] coda_0.19-1 ggplot2_2.2.1 colourpicker_1.0 pillar_1.3.1
[17] rlang_0.4.0 lazyeval_0.2.0 miniUI_0.1.1 Matrix_1.2-12
[21] DT_0.2 shinythemes_1.1.1 shinyjs_0.9.1 stringr_1.4.0
[25] htmlwidgets_1.3 loo_2.1.0 igraph_1.1.2 munsell_0.5.0
[29] shiny_1.0.5 compiler_3.4.3 httpuv_1.3.5 rstan_2.17.3
[33] pkgconfig_2.0.1 base64enc_0.1-3 rstantools_1.4.0 htmltools_0.3.6
[37] tidyselect_0.2.5 tibble_2.1.3 gridExtra_2.3 threejs_0.3.1
[41] matrixStats_0.52.2 crayon_1.3.4 dplyr_0.8.3 grid_3.4.3
[45] nlme_3.1-131 xtable_1.8-2 gtable_0.2.0 magrittr_1.5
[49] StanHeaders_2.17.2 scales_1.0.0 stringi_1.2.4 reshape2_1.4.2
[53] dygraphs_1.1.1.4 xts_0.11-2 tools_3.4.3 glue_1.3.0
[57] shinystan_2.4.0 markdown_0.8 purrr_0.3.2 crosstalk_1.0.0
[61] rsconnect_0.8.5 abind_1.4-5 parallel_3.4.3 inline_0.3.14
[65] colorspace_1.3-2 bridgesampling_0.4-0 bayesplot_1.5.0 Brobdingnag_1.2-4

Negative binomial models are hard to fit and sometimes this family shows this weird behavior of not wanting to converge. You can usually improve convergence drastically by specifying weakly informative priors (see ?set_prior) or by rescaling your predictor variables (e.g., standardizing) so that they are roughly on some unit scale. If you variables have very big scales, this may create problems for log-link models especially something like negbinomial.

If it still fails, you may also try a poisson distribution and add a random effect with as many levels as observations in the data. An example for this is given in the epilepsy data set that is shipped with brms.

Thanks - but in fact the negbinomial model runs well. It is the model with family=gaussian that shows the problem (see the original text). That’s why I am puzzled.

I am sorry, I read what I expected to read, now I am just as puzzled as you…

One problem that could have been happening is that the model predicts so well
when including the me() term that the residual SD sigma gets very close to 0,
which creates a posterior which is hard to sample from. So basically, it could be that
the model is too flexible. I don’t know exactly why this doesn’t affect the negbinomial
model though.

What does happen if you reduce the model complexity in some other way but keep the me()
term in the model? For instance remove the spatial smooth of region or some other covariates.

Can you explain why you set the prior for the meanme to ‘normal(0,ee.err)’? This shouldn’t
actually be working because ee.err is no variable in stan unless you pass it via stanvars. And even
then I am not sure why the meanme value should have the same SE as the individual values.
Perhaps I am misunderstanding something here?

Yes, your first explanation is correct: the model predicts to well when including the me() term that the residual SD is 0.00. If I halve the value of ee.err, then all is well, even though the model retains the spatial smooth of region.

The only difficulty is that I computed the value of ee.err in a separate analysis and it is likely that in fact the value that I included in the original analyses (that predicted too well) is itself an under-estimate of the true value. So, I wonder if there is a way to prevent the residual SD approaching 0.00?

Thanks again for your help

You can try to specify a prior on sigma that keeps it away from zero. For example, well specified truncated normal prior or a (inv-)gamma prior could do this job. Basically, you can play around with these distributions in R to see which looks like a reasonable choice to keep sigma away from fully approaching zero.