Prior for phi in beta-binomial when using custom response distribution


#1

My question is brms specific and builds directly upon the custom response distribution vignette.

Imagine I want to extend the beta-binomial model with an idiosyncratic phi per level of herd:
phi ~ (1|herd) (I am not sure how much this makes sense here, but it is hopefully easy to see that there are situations where this could make sense).

If we specify this model, we can see a somewhat suboptimal default prior:

make_stancode(bf(
  incidence ~ period + (1|p|herd),
  phi ~ (1|p|herd)),  data = cbpp, 
  family = beta_binomial2, stan_funs = stan_funs,
  stanvars = stanvars)
[...]
model { 
  [...]
  vector[N] phi = rep_vector(0, N) + temp_phi_Intercept; 
  for (n in 1:N) { 
    [...]
    phi[n] += r_1_phi_2[J_1[n]] * Z_1_phi_2[n];
    phi[n] = exp(phi[n]); 
    [...]
  } 
  // priors including all constants 
  [...]
  target += student_t_lpdf(temp_phi_Intercept | 3, 0, 10); 
  [...]
  // likelihood including all constants 
  if (!prior_only) { 
    for (n in 1:N) { 
      target += beta_binomial2_lpmf(Y[n] | mu[n], phi[n], trials[n]); 
    } 
  } 
} 
[...]

As specified in the custom_family, the exponential transformation is used to get phi from the unconstrained range onto the positive range. However, a t distribution with df = 3 and sd = 10 seems somewhat problematic and the model also will not converge successfully. If we choose a considerably smaller prior, it nevertheless works (although the model still has some divergent transitions):

fit4 <- brm(bf(
  incidence ~ period + (1|p|herd),
  phi ~ (1|p|herd)),  data = cbpp, 
  family = beta_binomial2, stan_funs = stan_funs,
  stanvars = stanvars,
  prior = set_prior("student_t(3, 0, 0.5)", dpar = "phi", class = "Intercept"),
  control = list(adapt_delta = 0.999)
)
summary(fit4)
#  Family: beta_binomial2 
#   Links: mu = logit; phi = log 
# Formula: incidence ~ period + (1 | p | herd) 
#          phi ~ (1 | p | herd)
#    Data: cbpp (Number of observations: 56) 
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
#          total post-warmup samples = 4000
#     ICs: LOO = NA; WAIC = NA; R2 = NA
#  
# Group-Level Effects: 
# ~herd (Number of levels: 15) 
#                              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# sd(Intercept)                    0.40      0.27     0.02     1.03        849 1.00
# sd(phi_Intercept)                0.85      0.98     0.03     3.40        991 1.00
# cor(Intercept,phi_Intercept)    -0.23      0.56    -0.98     0.90       2398 1.00
# 
# Population-Level Effects: 
#               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# Intercept        -1.33      0.28    -1.90    -0.79       2855 1.00
# phi_Intercept     2.51      0.89     1.28     4.55        728 1.00
# period2          -0.99      0.43    -1.84    -0.16       4000 1.00
# period3          -1.27      0.48    -2.25    -0.35       3288 1.00
# period4          -1.52      0.54    -2.67    -0.53       4000 1.00
# 
# Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
# is a crude measure of effective sample size, and Rhat is the potential 
# scale reduction factor on split chains (at convergence, Rhat = 1).
# Warning message:
# There were 4 divergent transitions after warmup. Increasing adapt_delta above 0.999 may help.
# See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Following the prior-choice recommendations, a better prior could be chosen:

The neg_binomial_2 distribution in Stan is parameterized so that the mean is mu and the variance is mu*(1 + mu/phi). […]

The generic prior [half-N(0,1)] works much much better on the parameter 1/phi. Even better, you can use 1/sqrt(phi). This can be motivated by noticing you can write a negative binomial a y | g ~ Poisson(g*mu), g ~ gamma(phi,phi) and the standard deviation of g is 1/sqrt(phi). Some more information is in the second-last section of this blog.

From my understanding, this would translate to something like (I still use t here):

target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1);

However, I seem to be unable to set such a prior with brms. If I pass this with prior_string(), this prior is added, but the original prior for this parameter is retained:

p <- prior_string(
  "target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1)", 
  dpar = "phi", class = "Intercept", check = FALSE)
make_stancode(bf(
  incidence ~ period + (1|p|herd),
  phi ~ (1|p|herd)),  data = cbpp, 
  family = beta_binomial2, stan_funs = stan_funs,
  stanvars = stanvars,
  prior = p)
model { 
  [...]
  // priors including all constants 
  target += student_t_lpdf(temp_phi_Intercept | 3, 0, 10); 
  [...]
  target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1);
  [...]
} 

My question: How can I set the prior somewhat flexibly for the additional parameters in my custom response distribution?


> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252
[4] LC_NUMERIC=C                        LC_TIME=German_Switzerland.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.20    brms_2.1.9    ggplot2_2.2.1 Rcpp_0.12.16 

loaded via a namespace (and not attached):
 [1] Brobdingnag_1.2-5    gtools_3.5.0         StanHeaders_2.17.2   threejs_0.3.1        shiny_1.0.5         
 [6] assertthat_0.2.0     highr_0.6            stats4_3.5.0         yaml_2.1.19          pillar_1.2.2        
[11] lattice_0.20-35      glue_1.2.0           digest_0.6.15        promises_1.0.1       colorspace_1.3-2    
[16] htmltools_0.3.6      httpuv_1.4.3         Matrix_1.2-14        plyr_1.8.4           dygraphs_1.1.1.4    
[21] pkgconfig_2.0.1      devtools_1.13.5      rstan_2.17.3         xtable_1.8-2         mvtnorm_1.0-7       
[26] scales_0.5.0         later_0.7.2          tibble_1.4.2         nleqslv_3.3.1        mgcv_1.8-23         
[31] bayesplot_1.5.0      DT_0.4               withr_2.1.2          shinyjs_1.0          lazyeval_0.2.1      
[36] magrittr_1.5         mime_0.5             evaluate_0.10.1      memoise_1.1.0        nlme_3.1-137        
[41] xts_0.10-2           xml2_1.2.0           colourpicker_1.0     rsconnect_0.8.8      tools_3.5.0         
[46] loo_2.0.0            matrixStats_0.53.1   stringr_1.3.1        munsell_0.4.3        bindrcpp_0.2.2      
[51] compiler_3.5.0       rlang_0.2.0          grid_3.5.0           ggridges_0.5.0       rstudioapi_0.7      
[56] htmlwidgets_1.2      crosstalk_1.0.0      igraph_1.2.1         miniUI_0.1.1         labeling_0.3        
[61] base64enc_0.1-3      testthat_2.0.0       codetools_0.2-15     gtable_0.2.0         abind_1.4-5         
[66] inline_0.3.14        roxygen2_6.0.1       markdown_0.8         reshape2_1.4.3       R6_2.2.2            
[71] gridExtra_2.3        rstantools_1.5.0     zoo_1.8-1            bridgesampling_0.4-0 dplyr_0.7.4         
[76] bindr_0.1.1          shinystan_2.5.0      commonmark_1.5       shinythemes_1.1.1    stringi_1.2.2       
[81] parallel_3.5.0       coda_0.19-1         

#2

If you set check=FALSE it will pass the prior as is and ignore the other arguments such as dpar or class. I suggest:

p <-  prior_string("", dpar = "phi", class = "Intercept") +
  prior_string("target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1)",  check = FALSE)

#3

I see, prior_string("", ...) suppresses the corresponding prior. I could not find it in the documentation (and NULL and other stuff I tried did not work).

Thanks!