Priors to regularize beta regression predicted values towards 0 and 1?


I recently encountered a paper analyzing longitudinal survey data where the questions are all of the form “what do you think is the probability that event X will happen in the next 10 years?”. So the responses are all proportions between 0 and 1, which lends itself nicely to beta regression.

But there’s a twist: the authors present a summary statistic of the timepoint-specific responses that accounts for ‘respondent under-confidence’ by converting each response to its associated log odds, raising each one to a shared fixed power 0 < α < ∞, and using MLE to infer the value of α. Larger values of α scale responses to be more extreme, which is what we would want if we thought people were systematically under-confident in their responses. As the authors put it: “in this paper we…correct the systematic bias at a collective level by shifting each probability forecast closer to its nearest boundary point. If the probability forecast is less (more) than 0.5, it is moved away from its original point and closer to 0.0 (1.0).”

This struck me as a fun idea – usually we regularize to make values less extreme!


My question is: how might one implement this kind of (de?)regularization towards extreme responses in a Bayesian beta regression context?

Suppose we have the vanilla beta regression given below (borrowed from Michael Clark’s outstanding examples). How might one modify this Stan code such that the posterior predictions tend to be pulled towards the extremes 0 and 1 of the distribution? Can it be done with priors, or by altering the model structure in some way?

data {
  int<lower=1> N;                      // sample size
  int<lower=1> K;                      // K predictors
  vector<lower=0,upper=1>[N] y;        // response 
  matrix[N,K] X;                       // predictor matrix

parameters {
  vector[K] theta;                     // reg coefficients
  real<lower=0> phi;                   // dispersion parameter

transformed parameters{
  vector[K] beta;

  beta = theta * 5;                    // same as beta ~ normal(0, 5); fairly diffuse

model {
  // model calculations
  vector[N] LP;                        // linear predictor
  vector[N] mu;                        // transformed linear predictor
  vector[N] A;                         // parameter for beta distn
  vector[N] B;                         // parameter for beta distn

  LP = X * beta;
  for (i in 1:N) { 
    mu[i] = inv_logit(LP[i]);   

  A = mu * phi;
  B = (1.0 - mu) * phi;

  // priors
  theta ~ normal(0, 1);   
  phi ~ cauchy(0, 5);                  // different options for phi  
  //phi ~ inv_gamma(.001, .001);
  //phi ~ uniform(0, 500);             // put upper on phi if using this

  // likelihood
  y ~ beta(A, B);

generated quantities {
  vector[N] y_rep;
  for (i in 1:N) { 
    real mu;
    real A;
    real B;
    mu = inv_logit(X[i] * beta);   
    A = mu * phi;
    B = (1.0 - mu) * phi;
    y_rep[i] = beta_rng(A, B); 

The paper you link involves fitting the relationship between the human-predicted probability and the true outcome based on actual knowledge of the outcome of the events being predicted by the humans. Do you want to do something similar, or do you just want an ad hoc tool to push predicted probabilities towards something more extreme based on a user-specified “anti-regularizing” parameter?

The latter, but more specifically: I was imagining using beta regression as a way of summarizing the distribution of responses, but in a way that incorporates background knowledge around ‘under-confidence’ in those responses. So the inferred distribution is itself of interest, not just the posterior predictions.

Thinking again, it would also be nice to make the amount of under-confidence correction itself an inferred quantity that can be conditional on a set of predictors. Since my data is longitudinal, maybe I could make it a multilevel model clustering on timepoint? Then if the under-confidence adjustment is achieved via an “anti-regularizing” prior, the hyperparameters of that prior could be inferred. Maybe something like this?

  • pⱼₜ ~ Beta(μₜ, φ)
  • μₜ = αₜ
  • αₜ ~ anti-regularizing-prior(θ)
  • θ ~ some-distribution(my initial background knowledge about under-confidence)
  • φ ~ exp(1)
  • σ ~ exp(1)

Does this make any sense? Maybe using priors to anti-regularize isn’t the best approach? Thanks for your help, Jacob.