Hi!
I am working with Bayesian models using brms and I am so interested in understand Bayesian r-squared proposed by Gelman et al. (2019). A really good explanation in Bayesian R2 and LOO-R2
In my attempt to understand how this Bayesian r-squared works, I have done different simulations and fittings with different accuracies. In particular, I use same response variable for all the models with different covariates. Here the code:
library(brms)
library(dplyr)
library(ggplot2)
### data.frame1
set.seed(10)
n <- 100
x1 <- rnorm(n, 30, 1)
set.seed(100)
y <- 2*x1 + rnorm(n, sd = 0.1)
data1 <- data.frame(x = x1, y = y)
fit1 <- brm(y ~ x, data = data1,
iter = 1000,
cores = 2,
chains = 2)
plot(data1$x, data1$y)
fit <- list(fit1)
standar_devs <- c(seq(0.1, 0.7, 0.2), 1, 10, 100)
for (j in 1:length(standar_devs))
{
set.seed(j*100)
x <- x1 + rnorm(n, sd = standar_devs[j])
data2 <- data.frame(x = x, y = y)
fit[[j + 1]] <- brm(y ~ x, data = data2,
iter = 1000,
cores = 2,
chains = 2)
}
### Bayes_R2
fit %>%
sapply(., function(x)brms::bayes_R2(x, summary=FALSE)) %>%
as.data.frame() %>%
tidyr::pivot_longer(., cols = 1:(length(standar_devs) + 1), names_to = "dataset", values_to = "R_squared")-> r_squared_test4
r_squared_test4$dataset <- as.factor(r_squared_test4$dataset)
p_test4 <- r_squared_test4 %>%
ggplot( aes(x = R_squared, fill = dataset)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity', bins = 100) +
theme_bw() +
labs(fill="") +
xlim(c(0,1.01))
p_test4
Here the corresponding plot
In the plot (attached ), we can see that the greater is the R-squared, smaller is the variability of its distribution. I was wondering if it fulfills always, or am I missing something?
Many thanks for your time,
Best,
Joaquín.