Rescaling phylogenetic tree for non-Brownian phylogenetic models in brms?

Hi all,

I have been thinking about implementing phylogenetic models in brms using alternatives to Brownian motion, including (but not limited to) Ornstein-Uhlenbeck. I see that including an OU kernel for modeling gaussian processes is planned (Add more Gaussian process kernels · Issue #234 · paul-buerkner/brms · GitHub), but isn’t available yet.

Since with brms I have been taking the approach of setting the phylogenetic variance-covariance matrix via a grouping term (as described in the brms phylogenetics vignette Estimating Phylogenetic Multilevel Models with brms), I wonder if first rescaling the branch lengths using an OU (or other) transformation would approximate the structure?

What I mean is, instead of this:

phylo <-"")
data_simple <- read.table(
  header = TRUE

A <- ape::vcv.phylo(phylo)

model_simple <- brm(
  phen ~ cofactor + (1|gr(phylo, cov = A)),
  data = data_simple,
  family = gaussian(),
  data2 = list(A = A),
  iter = 100

(from the brms vignette)

modify the tree branch lengths first, to fit OU expectations:

rescale_phylo <- rescale(phylo, "OU", alpha = 1, sigsq = 1)
A_rescale <- ape::vcv.phylo(rescale_phylo)

model_rescale <- brm(
  phen ~ cofactor + (1|gr(phylo, cov = A_rescale)),
  data = data_simple,
  family = gaussian(),
  data2 = list(A_rescale = A_rescale),
  iter = 100

I suspect this is a naive question and that there is something I’m missing with respect to the variance structure in the model since the kernels seem to differ in taking the square of the distance. (For one thing, I don’t entirely understand the difference in the implementation of the vcv matrix as a grouping term (as in the vignette) vs. as described in terms of a gaussian process in Statistical Rethinking with brms (14 Adventures in Covariance | Statistical rethinking with brms, ggplot2, and the tidyverse: Second edition).)

I would appreciate any thoughts on whether or not this is a reasonable approach, and what I might be missing!


@dpascall @rmcminds

I’ve never heard of rescaling the branch lengths as an approach (but I guess if you could find a mapping that changes the tree such that if you apply the Brownian motion transformation to it you get the same answer as if you had directly generated the variance co-variance matrix under an OU process, that would be fine). The way I’m aware of generating a variance covariance matrix from a phylogenetic tree with evolution under an OU model is the OU.vcv function (OU.vcv: The variance covariance matrix for an Ornstein-Uhlenbeck... in PIGShift: Polygenic Inverse Gamma Shifts). I’ve never done this myself though, and I have no idea how you should know the degree of constraint to apply for the transformation without prior information, but maybe you have an idea for your specific use case.


I’ve used the rescaling approach with multilevel phylogenetic models for OU evolution. It should result in the appropriately pooled estimates for tips, and you can do the rescaling within Stan rather than beforehand if you want to estimate the OU parameter (alpha or theta) rather than guess it. Be aware that estimates associated with inner nodes in such a model should not be interpreted as ancestral states, though - these higher level effects represent weighted means of descendant effects, and are shrunk toward the mean by the OU process compared to the ancestral states that would be associated with those nodes.

Since I like to have ancestral state estimates and my brain works better when the model represents the process directly, I have lately tended to parameterize OU models without using the simple rescaling approach. In brief, you can estimate an ancestral state (centered, non centered, however), and then explicitly use the OU process equations to generate the prediction for a descendant node or tip, which will be shrunk toward the stationary mean. I believe it works out to be:

x_descendant ~ normal(x_ancestor * exp(-alpha * time), sigma * sqrt(1-square(exp(-alpha * time))))

Though you’ll want to double check that, and it can be re-written for greater efficiency and stability. I have some code I can share if I am reminded later (busy day today), but in general I can say these models are sometimes tough to parameterize well because some clades like to be centered, some like to be noncentered, and the degeneracy is influenced strongly by the alpha parameter (so if it’s estimated simultaneously you can get funnels all over the place that can change their shape throughout a chain).


Sorry, I’m realizing I wrote that mostly about implementing a custom model directly in Stan. I’m not super experienced with brms, but I can say:

  1. if you have a working Brownian phylogenetic model in brms, you should indeed be able to simply rescale the phylogeny and use the same model with the new branch lengths;
  2. if you do this, don’t interpret higher level node values as if they are ancestral states;
  3. always check your diagnostics and be aware that you might need to fiddle with things to get rid of divergences

Oh, also also: be careful if your tree is not ultrametric. I.e., if your tips were not all sampled at the same time, I’m not confident off the top of my head whether the branch rescaling approach handles that properly

@dpascall You’ve articulated my concern with PIGShift’s OU.vcv, which was indeed my first thought (but still left me wondering if there would be issues in the final model), but I don’t have a good estimate for theta. I know better what I’m doing with the alpha parameter in rescale.

@rmcminds I’m not very experienced with Stan directly, so your general breakdown is helpful. Would you mind saying more about doing the rescaling in Stan to estimate alpha, or pointing me toward a paper/discussion/link?

Since I’m not super concerned with the estimates for the internal nodes in this model, it sounds like the basic rescaling approach will work for me. I believe you’re correct about requiring an ultrametric tree. If the tips are at different distances from the root I think the covariances will be off when the vcv matrix is calculated because, in addition to the shared distance between tips (MRCA), the covariance also needs to take into account the difference in their lengths.

1 Like

I see on a re-read that you are in fact giving brms a precomputed covariance matrix - I interpreted your message at first as if the phylogenetic effects were themselves directly coded as a multilevel model, with grouping effects for each internal node. If you’re not doing that and have an ultrametric tree, I actually think the rescale-then-vcv.phylo approach is probably identical to the direct OU.vcv approach. You can check to see if the covariance matrices you wind up with are identical?

So then, yes, the theoretical problem we all agree on is deciding what value of alpha(/theta/whatever - I think these are just different names for the same parameter in these different packages?) you use beforehand. It might actually be possible to use brms to do that; is there a way to use custom functions in brms, and you could write one that copies the OU.vcv math into stan code so that you could calculate a new covariance matrix on-the-fly while adding alpha as an extra parameter? The problem I’ve run into with estimating alpha simultaneously while ignoring ancestral states (e.g. calculating just the vcv of the tips) is that you have to invert or decompose that covariance matrix every single iteration, and with a larger phylogeny it becomes extremely slow. Jarrod Hadfield’s MCMCglmm package vignette explains that with some fancy math, that inversion is actually computationally cheaper to do if you estimate the ancestral nodes (despite there being more parameters).

So, if your phylogeny is kinda small, hacking brms /might/ work (though I don’t have enough experience to say). Otherwise I just find it easier to go straight to Stan itself, and directly model the evolution.

I don’t have code that I can point you to except some models that have extensive further complications. I should make some, though. Just not today…

I was interpreting the theta in OU.vcv as the value of the optimum, whereas alpha is the strength of the pull toward it. If I’m wrong in that I’m happy to be corrected though. In this particular situation, I’m more comfortable guesstimating alpha than theta, but better to infer it from data if possible.

Thanks for the custom function suggestion. It looks like there is a way to do that in brms as long as I can to set it up in Stan, which I should be able to figure out. Whether or not that will play nicely with a tree with 200+ tips I don’t know. It’s becoming clear that I’ll want to shift a lot of my work to Stan directly at some point, but for now I’ll see what I can hack in brms.

I appreciate your time and thoughts! If you ever do make some Stan models like this that you’re comfortable sharing, I’d love an update.

I just took a look at the code for both Geiger’s .ou.phylo() (geiger source: R/utilities-phylo.R; used by rescale()), and PIGShift’s OU.vcv (PIGShift source: R/brownian_motion_package.r).

‘Theta’ in PIGShift seems to mean the same thing as ‘alpha’ in Geiger (both are the strength of attraction). I think both assume the optimum is zero - this makes sense especially in the case of ‘residual’-type effects in a phylogenetic linear model (you might interpret the linear model’s intercept as the true optimum that everything moves toward - so in a way you’re already estimating that extra parameter).

If you feel comfortable guessing and fixing the value of the strength of attraction, then using your existing brms code with a transformed covariance matrix, derived by either method, should work beautifully! Since the matrix is data, I would hope brms inverts or decomposes it only once, so it should work fine even for large trees.

I will put some effort into some phylogenetic models in the relatively near future that I could share. But if you can use pre-existing models and reasonable simplifying assumptions, I would definitely start there - my apologies for complicating things!

I don’t know why I didn’t think to just look at their code to untangle alpha and theta. When I run either vcv.phylo on the tree rescaled with alpha = 10-8 (arbitrary number close to 0) or OU.vcv with theta=0, I get the same matrix (makes sense either way). But when I run them both with alpha=1/theta=1, the results are substantially different. What’s more, the result of vcv(rescale(…)) is a symmetrix matrix, but the result of OU.vcv is not. This is something I might look into more later, but I think on this point I’ve probably gone beyond the scope of this forum!

So! I think eventually I will want to build a model that can infer alpha (or theta), but for now I’ll play with a few different values when rescaling the tree to get an idea of how strongly if affects the fit, since I was going to compare it to the standard Brownian motion one anyway. Theta I’m more nervous about playing with because the trait I’m working with is pretty squirrelly, and also because now I’m confused about what OU.vcv is doing under the hood.

Thank you for working through this with me! No need to apologize, you’ve both helped me to get an answer to the original question (rescaling the tree is probably an okay way to get at an OU-phylogenetic model) and sent me on an interesting digression. I know more about Stan and the math and theory behind OU models now!

I’m not sure why you would get an asymmetric matrix with OU.vcv and an ultrametric tree. I just went ahead and threw together some comparisons, and in my code the geiger and pigshift methods both return the same matrix (assuming alpha=theta and sigsq=1). Notably, I believe though each method is calculated by different means, both will be incorrect for non-ultrametric trees (the branch rescaling method is discussed in this paper:, and it looks to me like the pigshift method would be incorrect (and perhaps lead to asymmetric matrices) in such a case due to the way it calculates phylogenetic distances).

I generated a random tree, a random model matrix, and random data according to an OU model of evolution, and then recovered the input betas and sigmas using ‘classical’ frequentist methods (with package phyr) using both geiger and pigshift-generated covariances. I then was able to recover the same values using a stan model that takes the same precalculated covariance as phyr, and again got the same values with another stan model that calculated the covariance ‘from scratch’ using phylogenetic distances and a predetermined theta. Then, I used that same ‘from scratch’ code to complicate the model by estimating theta within the model rather than beforehand. Using improper priors there were some sampling difficulties, but adding a prior to theta tamed this a little (the model is quite slow, but does return the correct values).

I didn’t do the brms version, but you could easily plug that in with the same data to see how it compares. Should be similar, but with maybe better priors but no easy ability estimate theta.

To use the stan versions for a real analysis, I would think hard about reasonable, proper priors for all parameters (theta might be dependent on whether the tree has normalized branch lengths, for instance), and it could use optimization for efficiency. I didn’t yet implement the latent-ancestral-states explicit model of evolution version, since it will require some tweaking to indexing that always makes my brain hurt, but I plan to add that soon! I think it might be faster, and will have the added benefit of giving you the estimated ancestral states!

Here is a link to the code: analysis_templates/teaching_examples/phylogenetics_in_stan at main · McMinds-Lab/analysis_templates · GitHub

I just pushed the multilevel model version to the same github repo, and as I expected, it was MUCH faster, in addition to being easier for me to understand and getting ancestral states.

Thanks for the original question and the discussion, everyone - I’d actually been meaning to write up these kinds of comparisons for quite a while, now, and this was a great excuse to finally do it!


Just like the initiator of this thread I have been using brms for some time but am highly motivated to learn stan directly in part because I also want to use an OU-kernel. However, my math/programming skills are not good enough (yet) to fully grasp either the math or stan coding. I have read McElreath’s Rethinking and understand most of it.

I took a look at your code here
analysis_templates/teaching_examples/phylogenetics_in_stan at main · McMinds-Lab/analysis_templates · GitHub
and compared it to a model that I wrote in brms, but I let brms translate it to stan code. My model, I presume, are using a Brownian motion process and my hope was to “gaze” at your code and manipulate mine so to render it an OU-process. That was not so easy.

I do not mean to ask you to write my code but IF the manipulation of my code to make it an OU-process is small, then perhaps you can point me in the right direction?

In particular something that I don’t understand, when looking at McElreath explanation of the use of a Gaussian process in the context of phylogenetic regressions he writes that it can be “thought of as having a single, multi-variate outcome”. I understand that but my data has multiple observations per species. And somewhere in my brms Stan-code this must of course be specified. Does this circumstance severely restructure your code examples?

Here is the stan code that I want to have an OU-process:

functions {
data {
  int<lower=1> N;  // total number of observations
  vector[N] Y;  // response variable
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  matrix[N_1, N_1] Lcov_1;  // cholesky factor of known covariance matrix
  // group-level predictor values
  vector[N] Z_1_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_1;
  int prior_only;  // should the likelihood be ignored?
transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc;  // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i - 1];
parameters {
  vector[Kc] b;  // population-level effects
  real Intercept;  // temporary intercept for centered predictors
  real<lower=0> sigma;  // dispersion parameter
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  vector[N_2] z_2[M_2];  // standardized group-level effects
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  vector[N_2] r_2_1;  // actual group-level effects
  real lprior = 0;  // prior contributions to the log posterior
  r_1_1 = (sd_1[1] * (Lcov_1 * z_1[1]));
  r_2_1 = (sd_2[1] * (z_2[1]));
  lprior += normal_lpdf(b | 0, 10);
  lprior += normal_lpdf(Intercept | 0, 50);
  lprior += student_t_lpdf(sigma | 3, 0, 20)
    - 1 * student_t_lccdf(0 | 3, 0, 20);
  lprior += student_t_lpdf(sd_1 | 3, 0, 20)
    - 1 * student_t_lccdf(0 | 3, 0, 20);
  lprior += student_t_lpdf(sd_2 | 3, 0, 20)
    - 1 * student_t_lccdf(0 | 3, 0, 20);
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += Intercept;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_2_1[J_2[n]] * Z_2_1[n];
    target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(z_1[1]);
  target += std_normal_lpdf(z_2[1]);
generated quantities {
  // actual population-level intercept
  real b_Intercept = Intercept - dot_product(means_X, b);

Hi Andreas -

It looks to me like phylogenetic effects are included in your model via the variable ‘Lcov_1’. Not sure exactly what brms does within R to generate this, but I would imagine that, yes, if you are providing a phylogentic tree, it is getting converted to a covariance matrix assuming strict Brownian motion, and that covariance matrix is then decomposed into a cholesky factor. My model ‘01_gaussian_ou_fixed_precalc.stan’ uses the same idea - the covariance is precalculated in R and simply fed to the Stan model as a static piece of data (in my case, the variable ‘L’). The primary difference there is that my model takes the covariance matrix itself as data, whereas your model takes the cholesky factor of the covariance matrix. The extra complications in your model (such as repeat observations) shouldn’t matter if you just focus on replacing that single covariance matrix with one based on an OU model.

You can compare ‘01_gaussian_ou_fixed_precalc.stan’ to ‘02_gaussian_ou_fixed_stancalc.stan’ to see how you can calculate the OU covariance within Stan rather than pre-calculating the Brownian covariance in R.

Hopefully that helps. Let me know if it doesn’t!