Brms: understanding the basics of predictions

satpal · April 14, 2021, 4:04pm

Hi,

I am trying to understand the working of ‘predict’ function by manually calculating predictions and comparing them with those obtained by using the ‘brms predict’ function. However, even after setting the seed, the results differ. Unlike the predicted values, I was successful in getting the exact same fitted values.

A simple example is shown below

library(brms)
library(tidyverse)

set.seed(123)

# generate data
n = 100
x = sort(rnorm(n, 0, 1)) 
b0 = 10
b1 = 2
sd = 0.5

y = b0 + b1*x + rnorm(n = n, mean = 0, sd = sd)

dat <- data.frame(cbind(x,y))


# model fitting 
bfit <- brm(bf(y ~ x), backend = "rstan", data = dat,
             seed = 123, chains = 2, iter = 2000, cores = 2)


# set seed for reproducibility (predicted values)
set.seed(123)

#####################
# get fitted and predicted values by using brms functions 
#####################

# fitted estimates
bfitted_values <- fitted(bfit)
# head(bfitted_values[,1])

# predicted estimates
bpredict_values <- predict(bfit)
# head(bpredict_values[,1])


#####################
# manually calculate fitted and predicted values 
#####################
# parnames(bfit)
nsamples  <- nsamples(bfit)
psb       <- posterior_samples(bfit)

Xmat <- cbind(rep(1, nrow(dat)) , x)

fitted_values       <- matrix(0, nrow = nsamples , ncol = nrow(dat) )
predicted_values    <- matrix(0, nrow = nsamples , ncol = nrow(dat) )

for(j in 1:nsamples) {
  eb0              <- psb[j,"b_Intercept"]   %>% unlist() %>% as.matrix()
  eb1              <- psb[j,"b_x"]           %>% unlist() %>% as.matrix()
  mu               <- Xmat[,1] %*%  eb0 + Xmat[,2] %*% eb1
  sd <- psb[j,"sigma"]
  fitted_values[j,]   <- mu
  predicted_values[j,]   <- rnorm(nrow(dat) , mu, sd)
}

mfitted_values <-  apply(fitted_values, 2, mean, na.rm = TRUE)
# head(mfitted_values)
mpredicted_values <-  apply(predicted_values, 2, mean, na.rm = TRUE)
# head(mpredicted_values)


.fitted <- cbind(bfitted_values[,1], mfitted_values) %>% data.frame()
colnames(.fitted ) <- c("brms", "manual")

.predicted <- cbind(bpredict_values[,1], mpredicted_values) %>% data.frame()
colnames(.predicted ) <- c("brms", "manual")


# compare fitted values
head(.fitted)
      brms   manual
5.229839 5.229839
5.928913 5.928913
6.500177 6.500177
6.781684 6.781684
7.359954 7.359954
7.360638 7.360638

# compare predicted values
head(.predicted)
      brms   manual
5.243650 5.222802
5.922818 5.925199
6.500232 6.502756
6.775731 6.775499
7.352880 7.362450
7.372583 7.344495

Operating System:
brms Version: 2.15.0

martinmodrak · April 21, 2021, 1:21pm

Hi,
I think this is to be expected - the fitted result (which calls posterior_linpred) will give you the values of the linear predictor, which is without any measurement noise. But for predict (which calls posterior_predict), the measurement noise is added, which is random and will not be the same even between two calls of predict.

Does that make sense?

Best of luck with your model!

satpal · April 21, 2021, 2:18pm

Thanks @martinmodrak

I just wanted to make sure that what I am doing is the correct approach i.e., predicted_values[j,] <- rnorm(nrow(dat) , mu, sd)

Correct, the predicted values are not same between two calls but thought setting the seed, for example set.seed(123), perhaps will take care of that. For the brms predict function, reproducibility of results can be achieved by setting the seed.

martinmodrak · April 21, 2021, 2:37pm

Yes, this should be correct.

If you wanted to get the same results as predict, you would need to:

a) set the same seed before calling predict and before running your own prediction code
b) generate the noise in exactly the same order as brms does.

Noting that brms iterates over observations first and then over samples we can actually reproduce this:

set.seed(123)
# Avoid taking summaries for stricter control
bfitted_values <- fitted(bfit, summary = FALSE)
bpredict_values <- predict(bfit, cores = 1, summary = FALSE)

for(j in 1:nsamples) {
  eb0              <- psb[j,"b_Intercept"]   %>% unlist() %>% as.matrix()
  eb1              <- psb[j,"b_x"]           %>% unlist() %>% as.matrix()
  mu               <- Xmat[,1] %*%  eb0 + Xmat[,2] %*% eb1
  fitted_values[j,]   <- mu
}

set.seed(123)
for(i in 1:n) {
  predicted_values[,i]   <- rnorm(nsamples, fitted_values[,i], psb[,"sigma"])
}
  

# Should show all zeroes
unique(bfitted_values - fitted_values)
unique(bpredict_values - predicted_values)

satpal · April 21, 2021, 3:29pm

Dear @martinmodrak

Thank you very much for explaining it by showing an example. It’s Perfect!

Topic		Replies	Views
How to produce brms equivalent of se.fit = TRUE of predict() on a glm() fit? Modeling brms	1	387	October 8, 2021
Why am I getting noisy predictions from brms::predict? brms	4	959	June 3, 2022
Brms predictions inconsistent with custom model (likely user error) General	7	409	September 13, 2023
Fitted.brmsfit's output varies for the same model and data brms	2	148	May 26, 2024
Request to help understand an apparent discrepancy between tidybayes::add_predicted_draws and brms::posterior_predict brms brms	2	685	January 20, 2021

Brms: understanding the basics of predictions

Related topics