I used brms to test if there are differences in microbiome diversity (Shannon index) between spiders collected in natural and urban environments. Since the two environments have unequal variances, I also enable sigma to vary among the two.
I ran the following lognormal model with the default priors for now :
form3 <- brmsformula(Shannon ~ 0 + sample_env + (1 | location), sigma ~ 0 + sample_env) fit3 <- brm(formula = form3, family = lognormal(), #prior = priors3, iter = 1500, warmup = 500, thin = 4, chains = 4, seed = 123, control = list(adapt_delta = 0.99), save_pars = save_pars(all = TRUE), sample_prior = TRUE, data = data)
which returns the following summary :
Family: lognormal Links: mu = identity; sigma = log Formula: Shannon ~ 0 + sample_env + (1 | location) sigma ~ 0 + sample_env Data: data (Number of observations: 14) Draws: 4 chains, each with iter = 1500; warmup = 500; thin = 4; total post-warmup draws = 1000 Group-Level Effects: ~location (Number of levels: 4) Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sd(Intercept) 1.27 1.17 0.07 4.06 1.00 713 751 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sample_envdesert -1.79 1.86 -5.27 1.95 1.00 930 865 sample_envurban -0.41 1.18 -2.90 2.06 1.01 857 778 sigma_sample_envdesert 1.12 0.33 0.54 1.83 1.00 913 881 sigma_sample_envurban -0.22 0.34 -0.78 0.54 1.01 729 780 Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).
When I plot the conditional effects, I see that the model predicts extremely large values, for instance, 329910.8 (i.e. exp(12.70658)) when my raw data does not go higher than 1.77. However, it does so only for a handful of simulated posterior values. I found that 72 posterior observations had values (back transformed with exp()) higher than 5, which is roughly 3.6% of the whole posterior values.
I also assume that it may explain why the mean diversity for each sampling location is quite high, which seems unlikely since the raw data is low as I have stated above (between 0 and 1.8).
I would highly appreciate if someone could provide some guidance in how to understand the problem. Would specifying some priors on the group means help?
Thank you very much
Here is my session info
R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044) Matrix products: default locale:  LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252  LC_NUMERIC=C LC_TIME=English_Canada.1252 attached base packages:  stats graphics grDevices datasets utils methods base other attached packages:  data.table_1.14.2 ggplot2_3.3.6 phyloseq_1.38.0 brms_2.17.0 Rcpp_1.0.9 loaded via a namespace (and not attached):  colorspace_2.0-3 ellipsis_0.3.2 ggridges_0.5.3 markdown_1.1 XVector_0.34.0  base64enc_0.1-3 farver_2.1.1 rstan_2.21.5 DT_0.23 fansi_1.0.3  mvtnorm_1.1-3 bridgesampling_1.1-2 codetools_0.2-18 splines_4.1.3 shinythemes_1.2.0  bayesplot_1.9.0 ade4_1.7-19 jsonlite_1.8.0 cluster_2.1.3 shiny_1.7.2  compiler_4.1.3 backports_1.4.1 Matrix_1.4-1 fastmap_1.1.0 cli_3.3.0  later_1.3.0 htmltools_0.5.3 prettyunits_1.1.1 tools_4.1.3 igraph_1.3.2  coda_0.19-4 gtable_0.3.0 glue_1.6.2 GenomeInfoDbData_1.2.7 reshape2_1.4.4  dplyr_1.0.9 posterior_1.2.2 Biobase_2.54.0 vctrs_0.4.1 Biostrings_2.62.0  rhdf5filters_1.6.0 multtest_2.50.0 ape_5.6-2 nlme_3.1-158 iterators_1.0.14  crosstalk_1.2.0 tensorA_0.36.2 stringr_1.4.0 ps_1.7.1 mime_0.12  miniUI_0.1.1.1 lifecycle_1.0.1 renv_0.15.5 gtools_3.9.3 zlibbioc_1.40.0  MASS_7.3-58 zoo_1.8-10 scales_1.2.0 colourpicker_1.1.1 promises_220.127.116.11  Brobdingnag_1.2-7 parallel_4.1.3 biomformat_1.22.0 rhdf5_2.38.1 inline_0.3.19  shinystan_2.6.0 gridExtra_2.3 loo_2.5.1 StanHeaders_2.21.0-7 stringi_1.7.8  S4Vectors_0.32.4 dygraphs_18.104.22.168 foreach_1.5.2 checkmate_2.1.0 permute_0.9-7  BiocGenerics_0.40.0 pkgbuild_1.3.1 GenomeInfoDb_1.30.1 rlang_1.0.4 pkgconfig_2.0.3  matrixStats_0.62.0 bitops_1.0-7 distributional_0.3.0 lattice_0.20-45 purrr_0.3.4  Rhdf5lib_1.16.0 labeling_0.4.2 rstantools_2.2.0 htmlwidgets_1.5.4 processx_3.7.0  tidyselect_1.1.2 plyr_1.8.7 magrittr_2.0.3 R6_2.5.1 IRanges_2.28.0  generics_0.1.3 DBI_1.1.3 withr_2.5.0 pillar_1.8.0 mgcv_1.8-40  xts_0.12.1 survival_3.3-1 abind_1.4-5 RCurl_1.98-1.7 tibble_3.1.7  crayon_1.5.1 utf8_1.2.2 grid_4.1.3 callr_3.7.1 vegan_2.6-2  threejs_0.3.3 digest_0.6.29 xtable_1.8-4 httpuv_1.6.5 RcppParallel_5.1.5  stats4_4.1.3 munsell_0.5.0 shinyjs_2.1.0