Predict with a brms phylogenetic model for a new species with known phylogenetic position

I have fitted a phylogenetic model in brms with a phylogenetically structured random intercept. I was wondering whether predict.brmsfit would allow me to predict for a new species when it’s phylogenetic position is known? For example, if I provide the model with a phylogenetic covariance matrix that includes a species, but that species wasn’t used to fit the model, can I predict for that species using information about it’s phylogenetic position? Does the sample_new_levels argument have an option to do this?

Unfortunately, this is not yet possible in brms, but I agree it could be useful. The function that prepares the random effects for prediction (and samples new levels if required) does not make use of cov_ranef at all. For the existing levels, the correlations are part of the posterior distribution so do not need cov_ranef, but for new levels, we would of course need it.

Would you mind opening an issue at https://github.com/paul-buerkner/brms so that I don’t loose track of this feature? This is unlikely to be implemented soon though, as the related functions are quite complex and such a feature would likely require some amounts of refactoring.

1 Like

Thanks Paul,

I’ve opened an issue on the GitHib repository.

Best wishes,

Louise

Hi Paul,

I understood that right now it’s not possible predict to new species with phylogenetic models.
To achieve this anyway, I wrote the same model in stan including the outcome values for new species as missing data. This seems to work fine with the test I did. However, I have to predict values for many new species and I’m afraid this will take ages.

What approach did you have in mind to predict to new species with brms? Is it possible to estimate phylogenetically structured intercept a posteriori by using a correlation matrix including all new species?
Thanks in advance for any thoughts/advice.

Ah, as you say it, you can also include missing values for the same purpose in brms (see vignette("brms_missings")). Perhaps that will help you doing it.

I am not sure how the solution in the brms post-processing would look like or if there is any solution there at all. I need to think of that a bit more.

1 Like

Is it possible to use predict.brmsfit for a new species when it’s phylogenetic position is NOT known?
This is what happens when I try

predict(model, newdata = data.frame(x = 1))
Error in get(g, data) : object 'phylo' not found

I haven’t thought of it. It could be that is produces some results, but I don’t think those results would necessarily be valid. With regard to the error, can you provide a minimal reproducible example for it so that I can see whether there is something to be fixed?

phylo <- ape::read.nexus("https://paul-buerkner.github.io/data/phylo.nex")
data_simple <- read.table(
  "https://paul-buerkner.github.io/data/data_simple.txt", 
  header = TRUE
)

A <- ape::vcv.phylo(phylo)

model_simple <- brm(
  phen ~ cofactor + (1|phylo), data = data_simple, 
  family = gaussian(), cov_ranef = list(phylo = A),
  prior = c(
    prior(normal(0, 10), "b"),
    prior(normal(0, 50), "Intercept"),
    prior(student_t(3, 0, 20), "sd"),
    prior(student_t(3, 0, 20), "sigma")
  )
)

newdata <- 10.3095881
predict(model_simple, newdata = data.frame(newdata))

`Error in get(g, data) : object 'phylo' not found`

I just want to use model_simple to predict using a cofactor value for a new species. But I don’t know its phylogenetic postition. Is that possible?

Below is my idea of a new prediction and I guess predict.brmsfit does something similar except I didn’t include a varying intercept since I don’t know the position for newdata.

post <- posterior_samples(model_simple)
mu.function <- function(cofactor) post$b_Intercept + post$b_cofactor*cofactor
mu <- sapply( newdata , mu.function )
mu.mean <- apply( mu , 2 , mean )

Your idea can be realized via

predict(model_simple, newdata = data.frame(cofactor = 10.3095881), re_formula = NA)

which implies the random intercept being fixed to zero in the predictions. Alternatively, you can use

predict(model_simple, newdata = data.frame(cofactor = 10.3095881), allow_new_levels = TRUE)

to include the uncertainty associated with the random intercept in the prediction.

Neither of these approaches take the phylogenetic structure explicitely into account but could still provide a reasonable

approximation to what you want to achieve.

1 Like

Fantastic, thank you!

I would like to understand what is being calculated when allow_new_levels = TRUE? What is added to the my example above:

post <- posterior_samples(model_simple)
mu.function <- function(cofactor) post$b_Intercept + post$b_cofactor*cofactor
mu <- sapply( newdata , mu.function )
mu.mean <- apply( mu , 2 , mean )

Thanks again!

This is discussed in the doc of ?extract_draws 1

Do you mean here? extract_draws.brmsfit function - RDocumentation

I cant understand how I get to know the calcultaions from there. Sorry for my ignorance, I’m not totally new to this but still kind of new.

Sorry, for the short answer. The sample_new_levels argument explains the different options. By default, we add random samples to the linear predictor as drawn from the empirical posterior distribution of the existing random effects (centered around zero) to take the uncertainty of these terms into account (since we don’t know the value of the new level).

1 Like

Hey there , has this option been implemented yet?

Hi Vinicius. Perhaps you can clarify whether you mean predicting for species with or without known phylogenetic position, since both are discussed in this thread.

1 Like

Hi @Ax3man, thanks for the reply. I meant prediction for a new species with a known phylogenetic position. The way I see, allow_new_levels = TRUE is just accommodating uncertainty and not taking in account the new information about species relationship.

As Paul hinted at above, you can use missing values for this. If you include the species in data and data2, set then outcome to NA and then use y | mi() ~ your_model, you can make predictions for that species.

I haven’t done this for a phylogenetic model, but I have done something similar with a quantitative genetic model (“animal model”), which has the same structure. Let me know if you get stuck.

2 Likes

@Ax3man’s is the right answer if you have only a few new species to predict (if too many, then the added computational burden of inverting the enlarged covariance matrix at each leapfrog step will get nasty) and if model fitting isn’t too expensive overall (otherwise you might want to avoid a refit if possible).

It is feasible to do these predictions post-hoc if that should be relevant, but it isn’t particularly easy and cannot be done natively in brms. If that’s the position you find yourself in, then let us know. The scheme is roughly:

  • use brms to give the linear predictor for everything except for the species-specific effects, including the phylogenetic effect.
  • Sample the species-specific effects. For phylogenetic effects, you’d use the known correlation matrix, iteration-wise estimated standard deviation, and the observed samples for all other species to work out the sampling distribution for the new species (the joint sampling distribution if more than one species). This conditional distribution is given under the “conditional distributions” section here Multivariate normal distribution - Wikipedia
  • Add the species-specific effects back into the linear predictor
  • If desired, transform the linear predictor via the inverse link and sample from the error distribution.
2 Likes