Is newdata2 in brms::predict doing what I think it's doing?

Hi List,

I am running a phylogenetic regression and then I am using brms:predict() to estimate response variables for species with predictor measurements. I’ll call them unknowns. I have phylogenetic information for the unknowns. I would like to use the vcv matrix that includes the unknowns in the prediction step so that the auto-correlation can be included in the prediction. This led me to prepare_predictions(newdata2...). However I am not sure that newdata2 is doing what I think it is, or what I want it to do. Here is an example:

library("brms")
library( "dplyr")
library("ape")

# read data from brms phylo tutorial
phylo <- ape::read.nexus("https://paul-buerkner.github.io/data/phylo.nex")
A <- ape::vcv.phylo(phylo, corr = TRUE)
data_simple <- read.table("https://paul-buerkner.github.io/data/data_simple.txt", 
                          header = TRUE)

# remove first 5 species
data_mod <- data_simple %>% slice(-(1:5))

# keep first 5 species for prediction
data_pred <- data_simple %>% slice(1:5)

# quick model for demonstration
model_simple <- brm(
  phen ~ cofactor + (1|gr(phylo, cov = A)), 
  data = data_mod,  data2 = list(A = A),
  chains = 2,
  iter = 1000
)
# Model runs properly even though there are species in A that are not in data_mod

# Now Predict Missing Data
# Make newdata2. This matrix A contains all of the species (1-200)
phylst <- list(A)
# list elements must be named
names(phylst) <- "A"

# Run the prediction with the newdata2 argument
p1 <- predict(model_simple, 
              newdata = data_pred, 
              newdata2 = phylst[1], 
              allow_new_levels = TRUE)

# Run without the newdata2 argument
p2 <- predict(model_simple, 
              newdata = data_pred,
              allow_new_levels = TRUE)

These two predictions, methods, p1 with and p2 without newdata2, provide the same results (with a bit of variation based on the unique runs). I am finding the same thing with my data. Also p1 runs just as fast as p2. I expect it to be a bit slower. These things combined make me think it’s not using this additional data from the vcv matrix in the prediction, but I have no way to tell if the data give the same result or the method is ignoring the matrix.

Am I using newdata2 correctly? I can’t find any examples of its use anywhere! @paul.buerkner I am tagging you here because I think you may be the only one who knows the details of this.

Thanks!

1 Like

Ah, I forgot that this problem has been addressed in another discourse post HERE. Sorry to bother anyone. However there was no mention of newdata2 in that post, but maybe a matrix is not the appropriate input for that command?

1 Like