 # Bayesian R-squared posterior variance

Hi!

I am working with Bayesian models using brms and I am so interested in understand Bayesian r-squared proposed by Gelman et al. (2019). A really good explanation in Bayesian R2 and LOO-R2

In my attempt to understand how this Bayesian r-squared works, I have done different simulations and fittings with different accuracies. In particular, I use same response variable for all the models with different covariates. Here the code:

``````library(brms)
library(dplyr)
library(ggplot2)

### data.frame1
set.seed(10)
n <- 100
x1 <- rnorm(n, 30, 1)

set.seed(100)
y <- 2*x1 + rnorm(n, sd = 0.1)
data1 <- data.frame(x = x1, y = y)
fit1 <- brm(y ~ x, data = data1,
iter = 1000,
cores = 2,
chains = 2)
plot(data1\$x, data1\$y)

fit <- list(fit1)
standar_devs <- c(seq(0.1, 0.7, 0.2), 1, 10, 100)
for (j in 1:length(standar_devs))
{
set.seed(j*100)
x <- x1 + rnorm(n, sd = standar_devs[j])
data2 <- data.frame(x = x, y = y)
fit[[j + 1]] <- brm(y ~ x, data = data2,
iter = 1000,
cores = 2,
chains = 2)
}

### Bayes_R2
fit %>%
sapply(., function(x)brms::bayes_R2(x, summary=FALSE)) %>%
as.data.frame() %>%
tidyr::pivot_longer(., cols = 1:(length(standar_devs) + 1),  names_to = "dataset", values_to = "R_squared")-> r_squared_test4

r_squared_test4\$dataset <- as.factor(r_squared_test4\$dataset)

p_test4 <- r_squared_test4 %>%
ggplot( aes(x = R_squared, fill = dataset)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity', bins = 100) +
theme_bw() +
labs(fill="") +
xlim(c(0,1.01))
p_test4
``````

Here the corresponding plot

In the plot (attached ), we can see that the greater is the R-squared, smaller is the variability of its distribution. I was wondering if it fulfills always, or am I missing something?

Many thanks for your time,

Best,
Joaquín.

Probably quite often, for example, in your simulation you’ll get high R^2 only if residual sigma is small and then coefficients are determined also with small variance. You could try examples with number of covariates close to the number of observations, and then you could sometimes get high R^2 as by chance the observations would lie on lower dimensional plane (but then you should compute LOO-R^2 anyway)