R squared with strong priors

This seems like an edge case, but I’ve been running into it so wanted to post to see if there is a simple fix. From Gelman et al 2019, R^2 can vary from 0 to 1. In other models, I think it’s possible to have a negative R^2, specifically when it is calculated using: R^2 = 1 - \frac{SSE}{SST}, where SST = \sum{(y_i - \bar{y})^2} and SSE = \sum{(y_i - \hat{y_i})^2}. I was playing around with how a shifting prior affects the Bayesian R^2, and found some strange outcomes where a worse model gives a higher R^2, specifically because of the prior. (See example below).

I wonder if there is a way to rework the Gelman et al 2019 R^2 such that it can become negative, which might address this edge case?

Stan Model 1: Strong (Bad) Prior

functions {
real linear_predictor(real alpha, real beta, real x) {
real out;
out = alpha + beta*x;
return out;
}
}

data {
int<lower=0> N;            // number of observations
vector[N] x;               // predictor variable
vector[N] y;               // response variable
}

parameters {
real alpha;                 // intercept
real beta;                  // slope
real<lower=0> sigma;        // error standard deviation
}

model {
// Prior distributions
alpha ~ normal(0, 0.2);     // prior for intercept
beta ~ normal(-3, 0.01);      // prior for slope

// Likelihood
y ~ normal(alpha + beta * x, sigma);  // likelihood function
}

generated quantities {
vector[N] y_pred;           // generated quantities for predicted values
vector[N] y_pred_w_variance;

for (i in 1:N) {
y_pred[i] = linear_predictor(alpha, beta, x[i]);
y_pred_w_variance[i] = normal_rng(alpha + beta * x[i], sigma);

}
}

Stan Model 2: Strong Prior around 0

functions {
real linear_predictor(real alpha, real beta, real x) {
real out;
out = alpha + beta*x;
return out;
}
}

data {
int<lower=0> N;            // number of observations
vector[N] x;               // predictor variable
vector[N] y;               // response variable
}

parameters {
real alpha;                 // intercept
real beta;                  // slope
real<lower=0> sigma;        // error standard deviation
}

model {
// Prior distributions
alpha ~ normal(0, 0.2);     // prior for intercept
beta ~ normal(0, 0.01);      // prior for slope

// Likelihood
y ~ normal(alpha + beta * x, sigma);  // likelihood function
}

generated quantities {
vector[N] y_pred;           // generated quantities for predicted values
vector[N] y_pred_w_variance;

for (i in 1:N) {
y_pred[i] = linear_predictor(alpha, beta, x[i]);
y_pred_w_variance[i] = normal_rng(alpha + beta * x[i], sigma);

}
}

Comparing the two, you can see that the median R^2 for the worse model is ~0.4 and the (still bad) model is ~0.

library(rstan)
library(tidybayes)
library(magrittr)
library(ggplot2)

## generate a small dataset
x <- 1:5 - 3
y <- c(1.7, 2.6, 2.5, 4.4, 3.8) - 3
xy <- data.frame(x,y)

stan_model_strong <- stan_model("scripts/stan_model_r_sq_check_strong_prior.stan")
stan_model_weak <- stan_model("scripts/stan_model_r_sq_check_weak_prior.stan")

## fitting the model in stan
fit_strong <- sampling(stan_model_strong,
data = list(N = 5,
x = xy$x, y = xy$y),
iter = 4000,
chains = 4)

## fitting the model in stan
fit_weak <- sampling(stan_model_weak,
data = list(N = 5,
x = xy$x, y = xy$y),
iter = 4000,
chains = 4)

## calculate r squared using method from Gelman et al
calculate_r_sq_gelman <- function(fit_out){
y_pred_solved <- rstan::extract(fit_out)$y_pred residuals_y <- rstan::extract(fit_out)$y_pred - rstan::extract(fit_out)\$y_pred_w_variance

var_residuals <- apply(residuals_y, 1, var)
var_pred_out <- apply(y_pred_solved, 1, var)

hist(var_pred_out/(var_pred_out + var_residuals))
return(median(var_pred_out/(var_pred_out + var_residuals)))
}

# plot the two r squared distributions
calculate_r_sq_gelman(fit_strong)
calculate_r_sq_gelman(fit_weak)

# plot the posterior predictions
post_pred_strong <- fit_strong %>%
sample_n(100) %>%
ggplot() +
theme_bw() +
ggtitle(paste("R2 = ", round(calculate_r_sq_gelman(fit_strong), 2))) +
geom_point(data = xy, aes(x = x, y = y)) +
geom_abline(aes(slope = beta, intercept = alpha), alpha = 0.2)

post_pred_weak <- fit_weak %>%
sample_n(100) %>%
ggplot() +
theme_bw() +
ggtitle(paste("R2 = ", round(calculate_r_sq_gelman(fit_weak), 3))) +
geom_point(data = xy, aes(x = x, y = y)) +
geom_abline(aes(slope = beta, intercept = alpha), alpha = 0.2)

ggarrange(post_pred_strong, post_pred_weak)

Sorry this never got answered but I’m afraid I’m not a regression expert. I’m pinging @andrewgelman in case he has an opinion. I didn’t even realize it can be negative. I thought it was the proportion of variance explained, but I guess it can be negative if the regression just adds variance. This stack overflow was helpful for me:

Hi, I recommend these two papers on Bayesian R-squared which I hope will give some insight:

[2018] R-squared for Bayesian regression models. {\em American Statistician} {\bf 73}, 307–309.

[2006] Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. {\em Technometrics} {\bf 48}, 241–251.

1 Like