Prior for phi in beta-binomial when using custom response distribution

My question is brms specific and builds directly upon the custom response distribution vignette.

Imagine I want to extend the beta-binomial model with an idiosyncratic phi per level of herd:
phi ~ (1|herd) (I am not sure how much this makes sense here, but it is hopefully easy to see that there are situations where this could make sense).

If we specify this model, we can see a somewhat suboptimal default prior:

make_stancode(bf(
  incidence ~ period + (1|p|herd),
  phi ~ (1|p|herd)),  data = cbpp, 
  family = beta_binomial2, stan_funs = stan_funs,
  stanvars = stanvars)
[...]
model { 
  [...]
  vector[N] phi = rep_vector(0, N) + temp_phi_Intercept; 
  for (n in 1:N) { 
    [...]
    phi[n] += r_1_phi_2[J_1[n]] * Z_1_phi_2[n];
    phi[n] = exp(phi[n]); 
    [...]
  } 
  // priors including all constants 
  [...]
  target += student_t_lpdf(temp_phi_Intercept | 3, 0, 10); 
  [...]
  // likelihood including all constants 
  if (!prior_only) { 
    for (n in 1:N) { 
      target += beta_binomial2_lpmf(Y[n] | mu[n], phi[n], trials[n]); 
    } 
  } 
} 
[...]

As specified in the custom_family, the exponential transformation is used to get phi from the unconstrained range onto the positive range. However, a t distribution with df = 3 and sd = 10 seems somewhat problematic and the model also will not converge successfully. If we choose a considerably smaller prior, it nevertheless works (although the model still has some divergent transitions):

fit4 <- brm(bf(
  incidence ~ period + (1|p|herd),
  phi ~ (1|p|herd)),  data = cbpp, 
  family = beta_binomial2, stan_funs = stan_funs,
  stanvars = stanvars,
  prior = set_prior("student_t(3, 0, 0.5)", dpar = "phi", class = "Intercept"),
  control = list(adapt_delta = 0.999)
)
summary(fit4)
#  Family: beta_binomial2 
#   Links: mu = logit; phi = log 
# Formula: incidence ~ period + (1 | p | herd) 
#          phi ~ (1 | p | herd)
#    Data: cbpp (Number of observations: 56) 
# Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; 
#          total post-warmup samples = 4000
#     ICs: LOO = NA; WAIC = NA; R2 = NA
#  
# Group-Level Effects: 
# ~herd (Number of levels: 15) 
#                              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# sd(Intercept)                    0.40      0.27     0.02     1.03        849 1.00
# sd(phi_Intercept)                0.85      0.98     0.03     3.40        991 1.00
# cor(Intercept,phi_Intercept)    -0.23      0.56    -0.98     0.90       2398 1.00
# 
# Population-Level Effects: 
#               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
# Intercept        -1.33      0.28    -1.90    -0.79       2855 1.00
# phi_Intercept     2.51      0.89     1.28     4.55        728 1.00
# period2          -0.99      0.43    -1.84    -0.16       4000 1.00
# period3          -1.27      0.48    -2.25    -0.35       3288 1.00
# period4          -1.52      0.54    -2.67    -0.53       4000 1.00
# 
# Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
# is a crude measure of effective sample size, and Rhat is the potential 
# scale reduction factor on split chains (at convergence, Rhat = 1).
# Warning message:
# There were 4 divergent transitions after warmup. Increasing adapt_delta above 0.999 may help.
# See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Following the prior-choice recommendations, a better prior could be chosen:

The neg_binomial_2 distribution in Stan is parameterized so that the mean is mu and the variance is mu*(1 + mu/phi). […]

The generic prior [half-N(0,1)] works much much better on the parameter 1/phi. Even better, you can use 1/sqrt(phi). This can be motivated by noticing you can write a negative binomial a y | g ~ Poisson(g*mu), g ~ gamma(phi,phi) and the standard deviation of g is 1/sqrt(phi). Some more information is in the second-last section of this blog.

From my understanding, this would translate to something like (I still use t here):

target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1);

However, I seem to be unable to set such a prior with brms. If I pass this with prior_string(), this prior is added, but the original prior for this parameter is retained:

p <- prior_string(
  "target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1)", 
  dpar = "phi", class = "Intercept", check = FALSE)
make_stancode(bf(
  incidence ~ period + (1|p|herd),
  phi ~ (1|p|herd)),  data = cbpp, 
  family = beta_binomial2, stan_funs = stan_funs,
  stanvars = stanvars,
  prior = p)
model { 
  [...]
  // priors including all constants 
  target += student_t_lpdf(temp_phi_Intercept | 3, 0, 10); 
  [...]
  target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1);
  [...]
} 

My question: How can I set the prior somewhat flexibly for the additional parameters in my custom response distribution?


> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=German_Switzerland.1252  LC_CTYPE=German_Switzerland.1252    LC_MONETARY=German_Switzerland.1252
[4] LC_NUMERIC=C                        LC_TIME=German_Switzerland.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.20    brms_2.1.9    ggplot2_2.2.1 Rcpp_0.12.16 

loaded via a namespace (and not attached):
 [1] Brobdingnag_1.2-5    gtools_3.5.0         StanHeaders_2.17.2   threejs_0.3.1        shiny_1.0.5         
 [6] assertthat_0.2.0     highr_0.6            stats4_3.5.0         yaml_2.1.19          pillar_1.2.2        
[11] lattice_0.20-35      glue_1.2.0           digest_0.6.15        promises_1.0.1       colorspace_1.3-2    
[16] htmltools_0.3.6      httpuv_1.4.3         Matrix_1.2-14        plyr_1.8.4           dygraphs_1.1.1.4    
[21] pkgconfig_2.0.1      devtools_1.13.5      rstan_2.17.3         xtable_1.8-2         mvtnorm_1.0-7       
[26] scales_0.5.0         later_0.7.2          tibble_1.4.2         nleqslv_3.3.1        mgcv_1.8-23         
[31] bayesplot_1.5.0      DT_0.4               withr_2.1.2          shinyjs_1.0          lazyeval_0.2.1      
[36] magrittr_1.5         mime_0.5             evaluate_0.10.1      memoise_1.1.0        nlme_3.1-137        
[41] xts_0.10-2           xml2_1.2.0           colourpicker_1.0     rsconnect_0.8.8      tools_3.5.0         
[46] loo_2.0.0            matrixStats_0.53.1   stringr_1.3.1        munsell_0.4.3        bindrcpp_0.2.2      
[51] compiler_3.5.0       rlang_0.2.0          grid_3.5.0           ggridges_0.5.0       rstudioapi_0.7      
[56] htmlwidgets_1.2      crosstalk_1.0.0      igraph_1.2.1         miniUI_0.1.1         labeling_0.3        
[61] base64enc_0.1-3      testthat_2.0.0       codetools_0.2-15     gtable_0.2.0         abind_1.4-5         
[66] inline_0.3.14        roxygen2_6.0.1       markdown_0.8         reshape2_1.4.3       R6_2.2.2            
[71] gridExtra_2.3        rstantools_1.5.0     zoo_1.8-1            bridgesampling_0.4-0 dplyr_0.7.4         
[76] bindr_0.1.1          shinystan_2.5.0      commonmark_1.5       shinythemes_1.1.1    stringi_1.2.2       
[81] parallel_3.5.0       coda_0.19-1         

If you set check=FALSE it will pass the prior as is and ignore the other arguments such as dpar or class. I suggest:

p <-  prior_string("", dpar = "phi", class = "Intercept") +
  prior_string("target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1)",  check = FALSE)

I see, prior_string("", ...) suppresses the corresponding prior. I could not find it in the documentation (and NULL and other stuff I tried did not work).

Thanks!

I am trying to fit a (zero-inflated) beta regression with brms and am having a few divergent transitions that I think are due to the prior on phi.
While attempting the solution in this thread I get an error:

phiPrior = prior_string("", dpar = "phi", class = "Intercept") +
           prior_string("target += student_t_lpdf(1/sqrt(exp(temp_phi_Intercept)) | 3, 0, 1)",  
                              check = FALSE)

ziBeta = brm(bf(y ~ 1 + g*s + (1|mi|m),
                         phi ~ 1 + (1|mi|m)),
                         data = data,
                         family = zero_inflated_beta,
                         prior = phiPrior,
                         control = list(adapt_delta = 0.95),
                         cores = 4,
                         chains = 4,
                         iter = 4000,
                         sample_prior = TRUE,
                         save_all_pars = TRUE)
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Info: left-hand side variable (name=phi) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Info: left-hand side variable (name=mu) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
  error in 'model1bd44b47caca_file1bd4501da915' at line 124, column 15
  -------------------------------------------------
   122:   real prior_sd_1 = student_t_rng(3,0,10);
   123:   real prior_cor_1 = lkj_corr_rng(M_1,1)[1, 2];
   124:   real prior_1/sqrt(exp(temp_phi_Intercept) = student_t_rng(3,0,1);
                      ^
   125:   // take only relevant parts of correlation matrix
  -------------------------------------------------

PARSER EXPECTED: ";"
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file1bd4501da915' due to the above error.
SYNTAX ERROR, MESSAGE(S) FROM PARSER:

Info: left-hand side variable (name=phi) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
Info: left-hand side variable (name=mu) occurs on right-hand side of assignment, causing inefficient deep copy to avoid aliasing.
  error in 'model1bd4514aa434_file1bd4501da915' at line 124, column 15
  -------------------------------------------------
   122:   real prior_sd_1 = student_t_rng(3,0,10);
   123:   real prior_cor_1 = lkj_corr_rng(M_1,1)[1, 2];
   124:   real prior_1/sqrt(exp(temp_phi_Intercept) = student_t_rng(3,0,1);
                      ^
   125:   // take only relevant parts of correlation matrix
  -------------------------------------------------

PARSER EXPECTED: ";"
Error in stanc(model_code = paste(program, collapse = "\n"), model_name = model_cppname,  : 
  failed to parse Stan model 'file1bd4501da915' due to the above error.

Am I abusing brms syntax in any way?

If you turn of sampling from the priors it will work. Let me see if I can find a workaround to disable sampling from unchecked priors.

As a side note: What’s your motivation behind this prior you are using?

Thanks for looking into it. I understand the issue now.

The estimates are unstable with different priors for phi and I am looking for a principled way to make the choice as I do not yet have a grasp on what magnitudes are reasonable for my problem (student_t(3, 0, 1) and exponential(1) produce different results). Sensing that you are sceptical/surprised by this choice I plotted the prior and realised that indeed using this prior does not look like the right approach. The default prior at least does not produce divergent transitions so I might as well go with that.