Hello,
I am trying to fit poisson (or negative binomial) models with offset parameters in brms. I found that brms implements rate() which I found convenient.
When I used this with normal poisson regression, it worked. However, with zero_inflated variants, it throws an error. I assume this is just because how the model is parameterized.
Regular Poisson or negative binomial
- The following worked just fine.
f3 <- bf(n | rate(words) ~ 0 + c + d + cd + rid ,
c ~ 0 + Category ,
d ~ 0 + disciplinary_group,
cd ~ (0 + Category) : (0 + disciplinary_group),
rid ~ 1 + (1 |id) ,
nl = TRUE)
m4 <- brm(formula = f3,
family = negbinomial(link = "log", link_shape = "log"),
data = doc_small,
backend = 'cmdstanr',
prior = weak_priors,
cores = 8,
seed = 1234,
file = 'brm_models/example_nl4',
#sample_prior = "only"
)
Zero_inflated
- When I just apply the above formula with zero_inflated family, it says:
Error: Argument 'rate' is not supported for family 'zero_inflated_poisson(log)'.
So I am assuming that I need to use offset()
rather than rate()
for the Zero_inflated variants of the model.
m4.2 <- brm(formula = f3,
family = zero_inflated_poisson(link = "log", link_zi = "logit"),
data = doc_small,
backend = 'cmdstanr',
prior = weak_priors,
cores = 8,
seed = 1234,
file = 'brm_models/example_nl4.2',
#sample_prior = "only"
)
Refactoring the code
So I am trying to refactor the code with offset. But I was not sure where I am supposed to include the offset(of())
in the non-linear formulation. Does it go into every single “fixed” parameter…?
mf3 <- bf(n ~ 0 + c + d + cd + rid ,
c ~ 0 + Category + offset(log(words)),
d ~ 0 + disciplinary_group + offset(log(words)),
cd ~ (0 + Category) : (0 + disciplinary_group) + offset(log(words)),
rid ~ 1 + (1 |id) ,
nl = TRUE)
or should I include them in the random effects as well as the zi parameter too?
mf3_2 <- bf(n ~ 0 + c + d + cd + rid ,
c ~ 0 + Category + offset(log(words)),
d ~ 0 + disciplinary_group + offset(log(words)),
cd ~ (0 + Category) : (0 + disciplinary_group) + offset(log(words)),
rid ~ 1 + (1 |id) + offset(log(words)),
zi ~ 1 + offset(log(words)),
nl = TRUE)
I would really appreciate any insights on whether I am refactoring the model correct, or if there is any better way of implimenting the rate() in zero inflated variants of the poisson or negative binomial response distribution.
Note, that I also tried the following parameterization, and it did not compile the stan code. So, I am thinking how to include them to the non-linear formula.
> f4 <- bf(n ~ 0 + c + d + cd + rid + offset(log(words)),
+ c ~ 0 + Category ,
+ d ~ 0 + disciplinary_group,
+ cd ~ (0 + Category) : (0 + disciplinary_group),
+ rid ~ 1 + (1 |id) ,
+ zi ~ offset(log(words)),
+ nl = TRUE)
Warning: Rows containing NAs were excluded from the model.Semantic error in '/var/folders/zp/z590y19x4y7by4_bpngs84cr0000gn/T/RtmpSYh5QY/model_a9f42168efaf19a0019db2d4c4d31a35.stan', line 158, column 65 to column 84:
-------------------------------------------------
156: for (n in 1:N) {
157: // compute non-linear predictor values
158: mu[n] = 0 + nlp_c[n] + nlp_d[n] + nlp_cd[n] + nlp_rid[n] + offset(log(C_1[n]));
^
159: }
160: for (n in 1:N) {
-------------------------------------------------
A returning function was expected but an undeclared identifier 'offset' was supplied.
Error: Syntax error found! See the message above for more information.
Thanks for your help in advance!
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.2
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bayesplot_1.10.0 emmeans_1.8.1-1 tidybayes_3.0.3 brms_2.18.1 Rcpp_1.0.9 here_1.0.1
[7] forcats_0.5.2 stringr_1.4.1 dplyr_1.1.0 purrr_0.3.4 readr_2.1.3 tidyr_1.2.1
[13] tibble_3.1.8 ggplot2_3.4.1 tidyverse_1.3.2
loaded via a namespace (and not attached):
[1] readxl_1.4.1 backports_1.4.1 plyr_1.8.7 igraph_1.3.5 splines_4.1.2
[6] svUnit_1.0.6 crosstalk_1.2.0 TH.data_1.1-1 rstantools_2.2.0 inline_0.3.19
[11] digest_0.6.30 htmltools_0.5.3 fansi_1.0.3 magrittr_2.0.3 checkmate_2.1.0
[16] googlesheets4_1.0.1 tzdb_0.3.0 modelr_0.1.9 RcppParallel_5.1.5 matrixStats_0.62.0
[21] vroom_1.6.0 xts_0.12.2 sandwich_3.0-2 prettyunits_1.1.1 colorspace_2.0-3
[26] rvest_1.0.3 ggdist_3.2.0 haven_2.5.1 xfun_0.33 callr_3.7.2
[31] crayon_1.5.2 jsonlite_1.8.2 survival_3.4-0 zoo_1.8-11 glue_1.6.2
[36] gtable_0.3.1 gargle_1.2.1 V8_4.2.1 distributional_0.3.1 pkgbuild_1.3.1
[41] rstan_2.26.13 abind_1.4-5 scales_1.2.1 mvtnorm_1.1-3 DBI_1.1.3
[46] miniUI_0.1.1.1 xtable_1.8-4 bit_4.0.4 stats4_4.1.2 StanHeaders_2.26.13
[51] DT_0.25 htmlwidgets_1.5.4 httr_1.4.4 threejs_0.3.3 arrayhelpers_1.1-0
[56] posterior_1.3.1 ellipsis_0.3.2 pkgconfig_2.0.3 loo_2.5.1 farver_2.1.1
[61] dbplyr_2.2.1 utf8_1.2.2 labeling_0.4.2 tidyselect_1.2.0 rlang_1.0.6
[66] reshape2_1.4.4 later_1.3.0 munsell_0.5.0 cellranger_1.1.0 tools_4.1.2
[71] cli_3.4.1 generics_0.1.3 broom_1.0.1 evaluate_0.16 fastmap_1.1.0
[76] yaml_2.3.6 bit64_4.0.5 processx_3.7.0 knitr_1.40 fs_1.5.2
[81] nlme_3.1-159 mime_0.12 xml2_1.3.3 compiler_4.1.2 shinythemes_1.2.0
[86] rstudioapi_0.14 curl_4.3.3 reprex_2.0.2 stringi_1.7.8 ps_1.7.1
[91] Brobdingnag_1.2-9 lattice_0.20-45 Matrix_1.5-1 markdown_1.1 shinyjs_2.1.0
[96] tensorA_0.36.2 vctrs_0.5.2 pillar_1.8.1 lifecycle_1.0.3.9000 bridgesampling_1.1-2
[101] estimability_1.4.1 data.table_1.14.4 httpuv_1.6.6 R6_2.5.1 promises_1.2.0.1
[106] gridExtra_2.3 codetools_0.2-18 colourpicker_1.1.1 MASS_7.3-58.1 gtools_3.9.3
[111] assertthat_0.2.1 rprojroot_2.0.3 withr_2.5.0 shinystan_2.6.0 multcomp_1.4-20
[116] parallel_4.1.2 hms_1.1.2 grid_4.1.2 coda_0.19-4 rmarkdown_2.16
[121] cmdstanr_0.5.3 googledrive_2.0.0 shiny_1.7.2 lubridate_1.8.0 base64enc_0.1-3
[126] dygraphs_1.1.1.6