Priors to regularize beta regression predicted values towards 0 and 1?

alex.b.r · October 27, 2023, 3:33am

Background

I recently encountered a paper analyzing longitudinal survey data where the questions are all of the form “what do you think is the probability that event X will happen in the next 10 years?”. So the responses are all proportions between 0 and 1, which lends itself nicely to beta regression.

But there’s a twist: the authors present a summary statistic of the timepoint-specific responses that accounts for ‘respondent under-confidence’ by converting each response to its associated log odds, raising each one to a shared fixed power 0 < α < ∞, and using MLE to infer the value of α. Larger values of α scale responses to be more extreme, which is what we would want if we thought people were systematically under-confident in their responses. As the authors put it: “in this paper we…correct the systematic bias at a collective level by shifting each probability forecast closer to its nearest boundary point. If the probability forecast is less (more) than 0.5, it is moved away from its original point and closer to 0.0 (1.0).”

This struck me as a fun idea – usually we regularize to make values less extreme!

Question

My question is: how might one implement this kind of (de?)regularization towards extreme responses in a Bayesian beta regression context?

Suppose we have the vanilla beta regression given below (borrowed from Michael Clark’s outstanding examples). How might one modify this Stan code such that the posterior predictions tend to be pulled towards the extremes 0 and 1 of the distribution? Can it be done with priors, or by altering the model structure in some way?

data {
  int<lower=1> N;                      // sample size
  int<lower=1> K;                      // K predictors
  vector<lower=0,upper=1>[N] y;        // response 
  matrix[N,K] X;                       // predictor matrix
}

parameters {
  vector[K] theta;                     // reg coefficients
  real<lower=0> phi;                   // dispersion parameter
}

transformed parameters{
  vector[K] beta;

  beta = theta * 5;                    // same as beta ~ normal(0, 5); fairly diffuse
}

model {
  // model calculations
  vector[N] LP;                        // linear predictor
  vector[N] mu;                        // transformed linear predictor
  vector[N] A;                         // parameter for beta distn
  vector[N] B;                         // parameter for beta distn

  LP = X * beta;
  
  for (i in 1:N) { 
    mu[i] = inv_logit(LP[i]);   
  }

  A = mu * phi;
  B = (1.0 - mu) * phi;

  // priors
  theta ~ normal(0, 1);   
  phi ~ cauchy(0, 5);                  // different options for phi  
  //phi ~ inv_gamma(.001, .001);
  //phi ~ uniform(0, 500);             // put upper on phi if using this

  // likelihood
  y ~ beta(A, B);
}

generated quantities {
  vector[N] y_rep;
  
  for (i in 1:N) { 
    real mu;
    real A;
    real B;
    
    mu = inv_logit(X[i] * beta);   
    
    A = mu * phi;
    B = (1.0 - mu) * phi;
    
    y_rep[i] = beta_rng(A, B); 
  }
}

jsocolar · October 27, 2023, 1:01pm

The paper you link involves fitting the relationship between the human-predicted probability and the true outcome based on actual knowledge of the outcome of the events being predicted by the humans. Do you want to do something similar, or do you just want an ad hoc tool to push predicted probabilities towards something more extreme based on a user-specified “anti-regularizing” parameter?

alex.b.r · October 27, 2023, 2:20pm

The latter, but more specifically: I was imagining using beta regression as a way of summarizing the distribution of responses, but in a way that incorporates background knowledge around ‘under-confidence’ in those responses. So the inferred distribution is itself of interest, not just the posterior predictions.

Thinking again, it would also be nice to make the amount of under-confidence correction itself an inferred quantity that can be conditional on a set of predictors. Since my data is longitudinal, maybe I could make it a multilevel model clustering on timepoint? Then if the under-confidence adjustment is achieved via an “anti-regularizing” prior, the hyperparameters of that prior could be inferred. Maybe something like this?

pⱼₜ ~ Beta(μₜ, φ)
μₜ = αₜ
αₜ ~ anti-regularizing-prior(θ)
θ ~ some-distribution(my initial background knowledge about under-confidence)
φ ~ exp(1)
σ ~ exp(1)

Does this make any sense? Maybe using priors to anti-regularize isn’t the best approach? Thanks for your help, Jacob.

Topic		Replies	Views
Setting a scaled beta prior Modeling prior-choice , priors , brms	5	577	July 18, 2023
Regression (Beta) from posteriors rather than single points - with stan or stanarm/brms Modeling techniques	4	594	July 28, 2018
Regularized Horseshoe prior posterior credible intervals Modeling techniques , performance , bioinformatics	5	752	May 23, 2019
Bayesian priors that combine data and parameters General	2	484	September 14, 2022
Constructing prior distribution from scale responses General rstan , techniques , specification , hierarchical-model , brms	4	399	March 19, 2023

Priors to regularize beta regression predicted values towards 0 and 1?

Background

Question

Related topics