Priors for binomial trials in generalized linear models


I am very new at Stan and Bayesian inference itself. I was following Michael Betancourt’s lecture named Some Bayesian Modeling Techniques in Stan, and I was trying to replicate his example with data that I have that follows the same structure. I have tried both parametrized and non-parametrized versions and both run pretty quickly and end with really good Rhat values and high n_eff. However, when I look at the generated quantities (see code below) and compared with the actual values it does not look very good. Before getting into more complicated matters I wanted to ask a simple question about priors. In Michael’s code mu is normal(0,10) and sigma is cauchy(0,10) (in the parametrized model below). I have no idea why this values 0,10, I assume it depends on the values of the continuous covariates but I don’t know if I need to transform the values somehow to fit the normal and cauchy values or change these to fit my covariates. I am completely lost with this, so any help would be greatly appreciated.

transformed parameters {
  vector [N] alpha_indv;
  for (n in 1:N){
      alpha_indv[n] = alpha_null + alpha_C1[indv_to_C1[n]] 
                    + alpha_C2[indv_to_C2[n]] 
                    + alpha_C3[indv_to_C3[n]] 
                    + alpha_C4[indv_to_C4[n]] 
                    + alpha_C5[indv_to_C5[n]];
model {
  beta ~ normal(0,10);
  alpha_null ~ normal(mu_alpha_null, sigma_alpha_null);
  mu_alpha_null ~ normal(0,10);
  sigma_alpha_null ~ cauchy(0,10);
  alpha_C1 ~ normal(0,sigma_alpha_C1);
  sigma_alpha_C1 ~ cauchy(0,10);
  alpha_C2 ~ normal(0,sigma_alpha_C2);
  sigma_alpha_C2 ~ cauchy(0,10);
  alpha_C3 ~ normal(0,sigma_alpha_C3);
  sigma_alpha_C3 ~ cauchy(0,10);
  alpha_C4 ~ normal(0,sigma_alpha_C4);
  sigma_alpha_C4 ~ cauchy(0,10);
  alpha_C5 ~ normal(0,sigma_alpha_C5);
  sigma_alpha_C5 ~ cauchy(0,10);
  y ~ bernoulli_logit(X'*beta+alpha_indv);
generated quantities{
  int y_pred[N];
  for (n in 1:N) {
    y_pred[n] = bernoulli_logit_rng(X'[n]*beta+alpha_indv[n]);


See for some discussion on setting scales for weakly informative priors. Ultimately the exact values you use will have to incorporate domain expertise for your specific problem.


thanks for the link and answering so quickly! I think I’m asking for something more basic than that though, for instance, when you say “If we have chosen appropriate units then the scales reduce to unity and all of our weakly informative priors take a form like θ∼(0,1)θ∼N(0,1) or θ∼Cauchy(0,1)”, what does it mean appropriate units?


From the link,

“Once we have identified an appropriate parameterization we can determine the scales coherent with our prior knowledge of the system. Each scale partitions the parameters into extreme values above and reasonable values below. Perhaps the most straightforward way to reason about scales is to identify the units that one would use to describe the system of interest before the measurement. If we are building an experiment to study nanoscale effects then we wouldn’t use kiloscale units, right? Well we also wouldn’t want to put any significant prior probability on kiloscale effect sizes. In practice it is easier to make one last reparameterization into these natural units so that all of our scales are of order unity.”