Hello all,
I used brms to test if there are differences in microbiome diversity (Shannon index) between spiders collected in natural and urban environments. Since the two environments have unequal variances, I also enable sigma to vary among the two.
I ran the following lognormal model with the default priors for now :
form3 <- brmsformula(Shannon ~ 0 + sample_env + (1 | location),
sigma ~ 0 + sample_env)
fit3 <- brm(formula = form3,
family = lognormal(),
#prior = priors3,
iter = 1500,
warmup = 500,
thin = 4,
chains = 4,
seed = 123,
control = list(adapt_delta = 0.99),
save_pars = save_pars(all = TRUE),
sample_prior = TRUE,
data = data)
which returns the following summary :
Family: lognormal
Links: mu = identity; sigma = log
Formula: Shannon ~ 0 + sample_env + (1 | location)
sigma ~ 0 + sample_env
Data: data (Number of observations: 14)
Draws: 4 chains, each with iter = 1500; warmup = 500; thin = 4;
total post-warmup draws = 1000
Group-Level Effects:
~location (Number of levels: 4)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.27 1.17 0.07 4.06 1.00 713 751
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sample_envdesert -1.79 1.86 -5.27 1.95 1.00 930 865
sample_envurban -0.41 1.18 -2.90 2.06 1.01 857 778
sigma_sample_envdesert 1.12 0.33 0.54 1.83 1.00 913 881
sigma_sample_envurban -0.22 0.34 -0.78 0.54 1.01 729 780
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
When I plot the conditional effects, I see that the model predicts extremely large values, for instance, 329910.8 (i.e. exp(12.70658)) when my raw data does not go higher than 1.77. However, it does so only for a handful of simulated posterior values. I found that 72 posterior observations had values (back transformed with exp()) higher than 5, which is roughly 3.6% of the whole posterior values.
I also assume that it may explain why the mean diversity for each sampling location is quite high, which seems unlikely since the raw data is low as I have stated above (between 0 and 1.8).
I have attached the raw data, the fitted model object, and the data produced by conditional_effects().
data.Rdata (675 Bytes)
model.Rdata (1.8 MB)
effects_data.Rdata (727 Bytes)
I would highly appreciate if someone could provide some guidance in how to understand the problem. Would specifying some priors on the group means help?
Thank you very much
Maxime
Here is my session info
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] data.table_1.14.2 ggplot2_3.3.6 phyloseq_1.38.0 brms_2.17.0 Rcpp_1.0.9
loaded via a namespace (and not attached):
[1] colorspace_2.0-3 ellipsis_0.3.2 ggridges_0.5.3 markdown_1.1 XVector_0.34.0
[6] base64enc_0.1-3 farver_2.1.1 rstan_2.21.5 DT_0.23 fansi_1.0.3
[11] mvtnorm_1.1-3 bridgesampling_1.1-2 codetools_0.2-18 splines_4.1.3 shinythemes_1.2.0
[16] bayesplot_1.9.0 ade4_1.7-19 jsonlite_1.8.0 cluster_2.1.3 shiny_1.7.2
[21] compiler_4.1.3 backports_1.4.1 Matrix_1.4-1 fastmap_1.1.0 cli_3.3.0
[26] later_1.3.0 htmltools_0.5.3 prettyunits_1.1.1 tools_4.1.3 igraph_1.3.2
[31] coda_0.19-4 gtable_0.3.0 glue_1.6.2 GenomeInfoDbData_1.2.7 reshape2_1.4.4
[36] dplyr_1.0.9 posterior_1.2.2 Biobase_2.54.0 vctrs_0.4.1 Biostrings_2.62.0
[41] rhdf5filters_1.6.0 multtest_2.50.0 ape_5.6-2 nlme_3.1-158 iterators_1.0.14
[46] crosstalk_1.2.0 tensorA_0.36.2 stringr_1.4.0 ps_1.7.1 mime_0.12
[51] miniUI_0.1.1.1 lifecycle_1.0.1 renv_0.15.5 gtools_3.9.3 zlibbioc_1.40.0
[56] MASS_7.3-58 zoo_1.8-10 scales_1.2.0 colourpicker_1.1.1 promises_1.2.0.1
[61] Brobdingnag_1.2-7 parallel_4.1.3 biomformat_1.22.0 rhdf5_2.38.1 inline_0.3.19
[66] shinystan_2.6.0 gridExtra_2.3 loo_2.5.1 StanHeaders_2.21.0-7 stringi_1.7.8
[71] S4Vectors_0.32.4 dygraphs_1.1.1.6 foreach_1.5.2 checkmate_2.1.0 permute_0.9-7
[76] BiocGenerics_0.40.0 pkgbuild_1.3.1 GenomeInfoDb_1.30.1 rlang_1.0.4 pkgconfig_2.0.3
[81] matrixStats_0.62.0 bitops_1.0-7 distributional_0.3.0 lattice_0.20-45 purrr_0.3.4
[86] Rhdf5lib_1.16.0 labeling_0.4.2 rstantools_2.2.0 htmlwidgets_1.5.4 processx_3.7.0
[91] tidyselect_1.1.2 plyr_1.8.7 magrittr_2.0.3 R6_2.5.1 IRanges_2.28.0
[96] generics_0.1.3 DBI_1.1.3 withr_2.5.0 pillar_1.8.0 mgcv_1.8-40
[101] xts_0.12.1 survival_3.3-1 abind_1.4-5 RCurl_1.98-1.7 tibble_3.1.7
[106] crayon_1.5.1 utf8_1.2.2 grid_4.1.3 callr_3.7.1 vegan_2.6-2
[111] threejs_0.3.3 digest_0.6.29 xtable_1.8-4 httpuv_1.6.5 RcppParallel_5.1.5
[116] stats4_4.1.3 munsell_0.5.0 shinyjs_2.1.0