How to specify rate() or offset in zero_inflated models using nl

Hello,

I am trying to fit poisson (or negative binomial) models with offset parameters in brms. I found that brms implements rate() which I found convenient.

When I used this with normal poisson regression, it worked. However, with zero_inflated variants, it throws an error. I assume this is just because how the model is parameterized.

Regular Poisson or negative binomial

  • The following worked just fine.
f3 <- bf(n | rate(words) ~ 0 + c + d + cd + rid ,
               c ~ 0 + Category ,
               d ~ 0 + disciplinary_group,
               cd ~ (0 + Category) : (0 + disciplinary_group),
               rid ~ 1 + (1 |id) ,
               nl = TRUE)

m4 <- brm(formula = f3,
          family = negbinomial(link = "log", link_shape = "log"),
          data = doc_small,
          backend = 'cmdstanr',
          prior = weak_priors,
          cores = 8,
          seed = 1234,
          file = 'brm_models/example_nl4',
          #sample_prior = "only"
          )

Zero_inflated

  • When I just apply the above formula with zero_inflated family, it says:
Error: Argument 'rate' is not supported for family 'zero_inflated_poisson(log)'.

So I am assuming that I need to use offset() rather than rate() for the Zero_inflated variants of the model.

m4.2 <- brm(formula = f3,
          family = zero_inflated_poisson(link = "log", link_zi = "logit"),
          data = doc_small,
          backend = 'cmdstanr',
          prior = weak_priors,
          cores = 8,
          seed = 1234,
          file = 'brm_models/example_nl4.2',
          #sample_prior = "only"
          )

Refactoring the code

So I am trying to refactor the code with offset. But I was not sure where I am supposed to include the offset(of()) in the non-linear formulation. Does it go into every single “fixed” parameter…?

mf3  <- bf(n ~ 0 + c + d + cd + rid ,
             c ~ 0 + Category + offset(log(words)),
             d ~ 0 + disciplinary_group + offset(log(words)),
             cd ~ (0 + Category) : (0 + disciplinary_group) + offset(log(words)),
             rid ~ 1 + (1 |id) ,
             nl = TRUE)

or should I include them in the random effects as well as the zi parameter too?

mf3_2  <- bf(n ~ 0 + c + d + cd + rid ,
             c ~ 0 + Category + offset(log(words)),
             d ~ 0 + disciplinary_group + offset(log(words)),
             cd ~ (0 + Category) : (0 + disciplinary_group) + offset(log(words)),
             rid ~ 1 + (1 |id) + offset(log(words)),
            zi ~ 1 + offset(log(words)),
             nl = TRUE)

I would really appreciate any insights on whether I am refactoring the model correct, or if there is any better way of implimenting the rate() in zero inflated variants of the poisson or negative binomial response distribution.

Note, that I also tried the following parameterization, and it did not compile the stan code. So, I am thinking how to include them to the non-linear formula.

> f4 <- bf(n ~ 0 + c + d + cd + rid + offset(log(words)),
+              c ~ 0 + Category ,
+              d ~ 0 + disciplinary_group,
+              cd ~ (0 + Category) : (0 + disciplinary_group),
+              rid ~ 1 + (1 |id) ,
+               zi ~ offset(log(words)),
+              nl = TRUE)
Warning: Rows containing NAs were excluded from the model.Semantic error in '/var/folders/zp/z590y19x4y7by4_bpngs84cr0000gn/T/RtmpSYh5QY/model_a9f42168efaf19a0019db2d4c4d31a35.stan', line 158, column 65 to column 84:
   -------------------------------------------------
   156:      for (n in 1:N) {
   157:        // compute non-linear predictor values
   158:        mu[n] = 0 + nlp_c[n] + nlp_d[n] + nlp_cd[n] + nlp_rid[n] + offset(log(C_1[n]));
                                                                          ^
   159:      }
   160:      for (n in 1:N) {
   -------------------------------------------------

A returning function was expected but an undeclared identifier 'offset' was supplied.

Error: Syntax error found! See the message above for more information.

Thanks for your help in advance!

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.6.2

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bayesplot_1.10.0 emmeans_1.8.1-1  tidybayes_3.0.3  brms_2.18.1      Rcpp_1.0.9       here_1.0.1      
 [7] forcats_0.5.2    stringr_1.4.1    dplyr_1.1.0      purrr_0.3.4      readr_2.1.3      tidyr_1.2.1     
[13] tibble_3.1.8     ggplot2_3.4.1    tidyverse_1.3.2 

loaded via a namespace (and not attached):
  [1] readxl_1.4.1         backports_1.4.1      plyr_1.8.7           igraph_1.3.5         splines_4.1.2       
  [6] svUnit_1.0.6         crosstalk_1.2.0      TH.data_1.1-1        rstantools_2.2.0     inline_0.3.19       
 [11] digest_0.6.30        htmltools_0.5.3      fansi_1.0.3          magrittr_2.0.3       checkmate_2.1.0     
 [16] googlesheets4_1.0.1  tzdb_0.3.0           modelr_0.1.9         RcppParallel_5.1.5   matrixStats_0.62.0  
 [21] vroom_1.6.0          xts_0.12.2           sandwich_3.0-2       prettyunits_1.1.1    colorspace_2.0-3    
 [26] rvest_1.0.3          ggdist_3.2.0         haven_2.5.1          xfun_0.33            callr_3.7.2         
 [31] crayon_1.5.2         jsonlite_1.8.2       survival_3.4-0       zoo_1.8-11           glue_1.6.2          
 [36] gtable_0.3.1         gargle_1.2.1         V8_4.2.1             distributional_0.3.1 pkgbuild_1.3.1      
 [41] rstan_2.26.13        abind_1.4-5          scales_1.2.1         mvtnorm_1.1-3        DBI_1.1.3           
 [46] miniUI_0.1.1.1       xtable_1.8-4         bit_4.0.4            stats4_4.1.2         StanHeaders_2.26.13 
 [51] DT_0.25              htmlwidgets_1.5.4    httr_1.4.4           threejs_0.3.3        arrayhelpers_1.1-0  
 [56] posterior_1.3.1      ellipsis_0.3.2       pkgconfig_2.0.3      loo_2.5.1            farver_2.1.1        
 [61] dbplyr_2.2.1         utf8_1.2.2           labeling_0.4.2       tidyselect_1.2.0     rlang_1.0.6         
 [66] reshape2_1.4.4       later_1.3.0          munsell_0.5.0        cellranger_1.1.0     tools_4.1.2         
 [71] cli_3.4.1            generics_0.1.3       broom_1.0.1          evaluate_0.16        fastmap_1.1.0       
 [76] yaml_2.3.6           bit64_4.0.5          processx_3.7.0       knitr_1.40           fs_1.5.2            
 [81] nlme_3.1-159         mime_0.12            xml2_1.3.3           compiler_4.1.2       shinythemes_1.2.0   
 [86] rstudioapi_0.14      curl_4.3.3           reprex_2.0.2         stringi_1.7.8        ps_1.7.1            
 [91] Brobdingnag_1.2-9    lattice_0.20-45      Matrix_1.5-1         markdown_1.1         shinyjs_2.1.0       
 [96] tensorA_0.36.2       vctrs_0.5.2          pillar_1.8.1         lifecycle_1.0.3.9000 bridgesampling_1.1-2
[101] estimability_1.4.1   data.table_1.14.4    httpuv_1.6.6         R6_2.5.1             promises_1.2.0.1    
[106] gridExtra_2.3        codetools_0.2-18     colourpicker_1.1.1   MASS_7.3-58.1        gtools_3.9.3        
[111] assertthat_0.2.1     rprojroot_2.0.3      withr_2.5.0          shinystan_2.6.0      multcomp_1.4-20     
[116] parallel_4.1.2       hms_1.1.2            grid_4.1.2           coda_0.19-4          rmarkdown_2.16      
[121] cmdstanr_0.5.3       googledrive_2.0.0    shiny_1.7.2          lubridate_1.8.0      base64enc_0.1-3     
[126] dygraphs_1.1.1.6  

After some test runs, it seems it makes sense to specify the model as follows:

f7 <- bf(n ~ a + c + d + cd,
             a ~ 1 + offset(log(words)) + (1 |id),
             c ~ 0 + Category ,
             d ~ 0 + disciplinary_group ,
             cd ~ (0 + Category) : (0 + disciplinary_group) ,
             zi ~ 1 + offset(log(words)),
             nl = TRUE)

I modelled this formula after the famous Dr. Kurz’s reimplimentation of the Dr. McElreath’s Statistical Rethinking.

But I am still unsure if the zero inflated part should also have the offset term, if so, I wonder if my implementation above is fine.

I forgot to explain, but Category and disciplinary_group are both unordered factor. I used this to derive effects for each category, not using dummy coding.