Predict with a brms phylogenetic model for a new species with known phylogenetic position

loubar · January 29, 2019, 9:04am

I have fitted a phylogenetic model in brms with a phylogenetically structured random intercept. I was wondering whether predict.brmsfit would allow me to predict for a new species when it’s phylogenetic position is known? For example, if I provide the model with a phylogenetic covariance matrix that includes a species, but that species wasn’t used to fit the model, can I predict for that species using information about it’s phylogenetic position? Does the sample_new_levels argument have an option to do this?

paul.buerkner · January 29, 2019, 11:09am

Unfortunately, this is not yet possible in brms, but I agree it could be useful. The function that prepares the random effects for prediction (and samples new levels if required) does not make use of cov_ranef at all. For the existing levels, the correlations are part of the posterior distribution so do not need cov_ranef, but for new levels, we would of course need it.

Would you mind opening an issue at https://github.com/paul-buerkner/brms so that I don’t loose track of this feature? This is unlikely to be implemented soon though, as the related functions are quite complex and such a feature would likely require some amounts of refactoring.

loubar · January 29, 2019, 11:21am

Thanks Paul,

I’ve opened an issue on the GitHib repository.

Best wishes,

Louise

nschiett · April 24, 2019, 12:21pm

Hi Paul,

I understood that right now it’s not possible predict to new species with phylogenetic models.
To achieve this anyway, I wrote the same model in stan including the outcome values for new species as missing data. This seems to work fine with the test I did. However, I have to predict values for many new species and I’m afraid this will take ages.

What approach did you have in mind to predict to new species with brms? Is it possible to estimate phylogenetically structured intercept a posteriori by using a correlation matrix including all new species?
Thanks in advance for any thoughts/advice.

paul.buerkner · April 24, 2019, 1:01pm

Ah, as you say it, you can also include missing values for the same purpose in brms (see vignette("brms_missings")). Perhaps that will help you doing it.

I am not sure how the solution in the brms post-processing would look like or if there is any solution there at all. I need to think of that a bit more.

Andreas · September 18, 2019, 4:51pm

Is it possible to use predict.brmsfit for a new species when it’s phylogenetic position is NOT known?
This is what happens when I try

predict(model, newdata = data.frame(x = 1))
Error in get(g, data) : object 'phylo' not found

paul.buerkner · September 28, 2019, 4:16pm

I haven’t thought of it. It could be that is produces some results, but I don’t think those results would necessarily be valid. With regard to the error, can you provide a minimal reproducible example for it so that I can see whether there is something to be fixed?

Andreas · September 28, 2019, 8:27pm

phylo <- ape::read.nexus("https://paul-buerkner.github.io/data/phylo.nex")
data_simple <- read.table(
  "https://paul-buerkner.github.io/data/data_simple.txt", 
  header = TRUE
)

A <- ape::vcv.phylo(phylo)

model_simple <- brm(
  phen ~ cofactor + (1|phylo), data = data_simple, 
  family = gaussian(), cov_ranef = list(phylo = A),
  prior = c(
    prior(normal(0, 10), "b"),
    prior(normal(0, 50), "Intercept"),
    prior(student_t(3, 0, 20), "sd"),
    prior(student_t(3, 0, 20), "sigma")
  )
)

newdata <- 10.3095881
predict(model_simple, newdata = data.frame(newdata))

`Error in get(g, data) : object 'phylo' not found`

I just want to use model_simple to predict using a cofactor value for a new species. But I don’t know its phylogenetic postition. Is that possible?

Below is my idea of a new prediction and I guess predict.brmsfit does something similar except I didn’t include a varying intercept since I don’t know the position for newdata.

post <- posterior_samples(model_simple)
mu.function <- function(cofactor) post$b_Intercept + post$b_cofactor*cofactor
mu <- sapply( newdata , mu.function )
mu.mean <- apply( mu , 2 , mean )

paul.buerkner · September 29, 2019, 9:10am

Your idea can be realized via

predict(model_simple, newdata = data.frame(cofactor = 10.3095881), re_formula = NA)

which implies the random intercept being fixed to zero in the predictions. Alternatively, you can use

predict(model_simple, newdata = data.frame(cofactor = 10.3095881), allow_new_levels = TRUE)

to include the uncertainty associated with the random intercept in the prediction.

Neither of these approaches take the phylogenetic structure explicitely into account but could still provide a reasonable

approximation to what you want to achieve.

Andreas · September 29, 2019, 6:39pm

Fantastic, thank you!

I would like to understand what is being calculated when allow_new_levels = TRUE? What is added to the my example above:

post <- posterior_samples(model_simple)
mu.function <- function(cofactor) post$b_Intercept + post$b_cofactor*cofactor
mu <- sapply( newdata , mu.function )
mu.mean <- apply( mu , 2 , mean )

Thanks again!

paul.buerkner · September 29, 2019, 6:47pm

This is discussed in the doc of ?extract_draws 1

Andreas · September 29, 2019, 9:17pm

Do you mean here? extract_draws.brmsfit function - RDocumentation

I cant understand how I get to know the calcultaions from there. Sorry for my ignorance, I’m not totally new to this but still kind of new.

paul.buerkner · September 30, 2019, 7:45am

Sorry, for the short answer. The sample_new_levels argument explains the different options. By default, we add random samples to the linear predictor as drawn from the empirical posterior distribution of the existing random effects (centered around zero) to take the uncertainty of these terms into account (since we don’t know the value of the new level).

vcgarnica · February 14, 2024, 12:36pm

Hey there , has this option been implemented yet?

Ax3man · February 14, 2024, 11:23pm

Hi Vinicius. Perhaps you can clarify whether you mean predicting for species with or without known phylogenetic position, since both are discussed in this thread.

vcgarnica · February 15, 2024, 4:59pm

Hi @Ax3man, thanks for the reply. I meant prediction for a new species with a known phylogenetic position. The way I see, allow_new_levels = TRUE is just accommodating uncertainty and not taking in account the new information about species relationship.

Ax3man · February 15, 2024, 6:06pm

As Paul hinted at above, you can use missing values for this. If you include the species in data and data2, set then outcome to NA and then use y | mi() ~ your_model, you can make predictions for that species.

I haven’t done this for a phylogenetic model, but I have done something similar with a quantitative genetic model (“animal model”), which has the same structure. Let me know if you get stuck.

jsocolar · February 15, 2024, 7:26pm

@Ax3man’s is the right answer if you have only a few new species to predict (if too many, then the added computational burden of inverting the enlarged covariance matrix at each leapfrog step will get nasty) and if model fitting isn’t too expensive overall (otherwise you might want to avoid a refit if possible).

It is feasible to do these predictions post-hoc if that should be relevant, but it isn’t particularly easy and cannot be done natively in brms. If that’s the position you find yourself in, then let us know. The scheme is roughly:

use brms to give the linear predictor for everything except for the species-specific effects, including the phylogenetic effect.
Sample the species-specific effects. For phylogenetic effects, you’d use the known correlation matrix, iteration-wise estimated standard deviation, and the observed samples for all other species to work out the sampling distribution for the new species (the joint sampling distribution if more than one species). This conditional distribution is given under the “conditional distributions” section here Multivariate normal distribution - Wikipedia
Add the species-specific effects back into the linear predictor
If desired, transform the linear predictor via the inverse link and sample from the error distribution.

Topic		Replies	Views
Phylogenetic models with unseen categories brms	2	520	December 23, 2019
Is newdata2 in brms::predict doing what I think it's doing? brms specification , phylogenetic , brms	1	755	September 8, 2021
Predict from a phylogenetic multilevel model trained on standardised data, but with new data that has not been standardised brms	9	508	June 29, 2020
Phylogenetic model with varying slopes and intercept using brms Modeling brms	9	1380	June 28, 2024
Specifying a nested phylogenetic model brms	7	1501	July 5, 2018

Predict with a brms phylogenetic model for a new species with known phylogenetic position

Related topics