Predict with a brms phylogenetic model for a new species with known phylogenetic position

I have fitted a phylogenetic model in brms with a phylogenetically structured random intercept. I was wondering whether predict.brmsfit would allow me to predict for a new species when it’s phylogenetic position is known? For example, if I provide the model with a phylogenetic covariance matrix that includes a species, but that species wasn’t used to fit the model, can I predict for that species using information about it’s phylogenetic position? Does the sample_new_levels argument have an option to do this?

Unfortunately, this is not yet possible in brms, but I agree it could be useful. The function that prepares the random effects for prediction (and samples new levels if required) does not make use of cov_ranef at all. For the existing levels, the correlations are part of the posterior distribution so do not need cov_ranef, but for new levels, we would of course need it.

Would you mind opening an issue at https://github.com/paul-buerkner/brms so that I don’t loose track of this feature? This is unlikely to be implemented soon though, as the related functions are quite complex and such a feature would likely require some amounts of refactoring.

Thanks Paul,

I’ve opened an issue on the GitHib repository.

Best wishes,

Louise

Hi Paul,

I understood that right now it’s not possible predict to new species with phylogenetic models.
To achieve this anyway, I wrote the same model in stan including the outcome values for new species as missing data. This seems to work fine with the test I did. However, I have to predict values for many new species and I’m afraid this will take ages.

What approach did you have in mind to predict to new species with brms? Is it possible to estimate phylogenetically structured intercept a posteriori by using a correlation matrix including all new species?
Thanks in advance for any thoughts/advice.

Ah, as you say it, you can also include missing values for the same purpose in brms (see vignette("brms_missings")). Perhaps that will help you doing it.

I am not sure how the solution in the brms post-processing would look like or if there is any solution there at all. I need to think of that a bit more.

1 Like

Is it possible to use predict.brmsfit for a new species when it’s phylogenetic position is NOT known?
This is what happens when I try

predict(model, newdata = data.frame(x = 1))
Error in get(g, data) : object 'phylo' not found

I haven’t thought of it. It could be that is produces some results, but I don’t think those results would necessarily be valid. With regard to the error, can you provide a minimal reproducible example for it so that I can see whether there is something to be fixed?

phylo <- ape::read.nexus("https://paul-buerkner.github.io/data/phylo.nex")
data_simple <- read.table(
  "https://paul-buerkner.github.io/data/data_simple.txt", 
  header = TRUE
)

A <- ape::vcv.phylo(phylo)

model_simple <- brm(
  phen ~ cofactor + (1|phylo), data = data_simple, 
  family = gaussian(), cov_ranef = list(phylo = A),
  prior = c(
    prior(normal(0, 10), "b"),
    prior(normal(0, 50), "Intercept"),
    prior(student_t(3, 0, 20), "sd"),
    prior(student_t(3, 0, 20), "sigma")
  )
)

newdata <- 10.3095881
predict(model_simple, newdata = data.frame(newdata))

`Error in get(g, data) : object 'phylo' not found`

I just want to use model_simple to predict using a cofactor value for a new species. But I don’t know its phylogenetic postition. Is that possible?

Below is my idea of a new prediction and I guess predict.brmsfit does something similar except I didn’t include a varying intercept since I don’t know the position for newdata.

post <- posterior_samples(model_simple)
mu.function <- function(cofactor) post$b_Intercept + post$b_cofactor*cofactor
mu <- sapply( newdata , mu.function )
mu.mean <- apply( mu , 2 , mean )

Your idea can be realized via

predict(model_simple, newdata = data.frame(cofactor = 10.3095881), re_formula = NA)

which implies the random intercept being fixed to zero in the predictions. Alternatively, you can use

predict(model_simple, newdata = data.frame(cofactor = 10.3095881), allow_new_levels = TRUE)

to include the uncertainty associated with the random intercept in the prediction.

Neither of these approaches take the phylogenetic structure explicitely into account but could still provide a reasonable

approximation to what you want to achieve.

Fantastic, thank you!

I would like to understand what is being calculated when allow_new_levels = TRUE? What is added to the my example above:

post <- posterior_samples(model_simple)
mu.function <- function(cofactor) post$b_Intercept + post$b_cofactor*cofactor
mu <- sapply( newdata , mu.function )
mu.mean <- apply( mu , 2 , mean )

Thanks again!

This is discussed in the doc of ?extract_draws 1

Do you mean here? https://www.rdocumentation.org/packages/brms/versions/2.7.0/topics/extract_draws.brmsfit

I cant understand how I get to know the calcultaions from there. Sorry for my ignorance, I’m not totally new to this but still kind of new.

Sorry, for the short answer. The sample_new_levels argument explains the different options. By default, we add random samples to the linear predictor as drawn from the empirical posterior distribution of the existing random effects (centered around zero) to take the uncertainty of these terms into account (since we don’t know the value of the new level).

1 Like