Phylogenetic model with varying slopes and intercept using brms

I try to construct a phylogenetic model using brms package.
I would like to set random intercepts and slopes in a model, but it is time-consuming and wonder it is a correct code or not.
“examples” has about 10000 records and 1000 species.

bmodel<- brm(examples ~ Temperature + Precipitation + (1+Temperature + Precipitation |gr(sn, cov = tree_vcv)),
data = df,
family = cumulative(link = “logit”, threshold = “flexible”),
data2 = list(tree_vcv = tree_vcv),
prior = c(set_prior(“normal(0,10)”, class = “b”)),
warmup = 200,
iter = 1000,
chains = 4,
save_all_pars = TRUE,
backend = “cmdstanr”,
control = list(adapt_delta = 0.99),
save_model = “ex_stanscript”,
file = “ex”)

When I tried to model with only random intercept, the running time was reasonable (about 2 days).

Would you possibly check my above code?



A couple of things:

  1. Your warmup time is almost certainly too short for a model this complicated, especially with adapt_delta as high as 0.99. Try increasing the warmup length dramatically (such as to the default 1000 iterations). The model might well be able to adapt to the posterior geometry well enough to sample much much faster.
  2. The random effects structure as written assumes very strong (“perfect”) phylogenetic signal. If your domain knowledge suggests that this is a good assumption, then that’s fine. But if it’s a poor assumption, it could lead to lack-of-fit with in turn could create computational difficulties. One common extension of the model in this situation is to also include standard random effects of species everywhere you include the phylogenetic random effect. In this case, the model becomes a “Pagel’s lambda” model.
  3. It’s possible that the data just don’t really support a model this complex, which can lead to problems with identification and thus poor sampling. However, my gut feeling is that this is not likely to be a problem here.
  4. The math operations required to sample from these large multivariate random effects are computationally intensive, and this will create some hard limit on how fast this thing can run, even if it’s well adapted. On the other hand, this very same issue means that this model might be a good candidate for speedups with multi-threading, if you have 8 (= chains*2) or more cores available (make sure that you have 8 or more physical cores, not just virtual cores!).

Just as a note, it is indeed expected for the model to be much slower than the intercept-only phylogentic effect, according to the phylogentics vignette (Estimating Phylogenetic Multilevel Models with brms • brms):

In the above examples, we have only used a single group-level effect (i.e., a varying intercept) for the phylogenetic grouping factors. In brms , it is also possible to estimate multiple group-level effects (e.g., a varying intercept and a varying slope) for these grouping factors. However, it requires repeatedly computing Kronecker products of covariance matrices while fitting the model. This will be very slow especially when the grouping factors have many levels and matrices are thus large.

Also see jsocolar’s point 4.

1 Like

Thank you very much for fruitful advices. I’m sorry for the late response.

  1. I’m sorry I have a lack of explanation. I changed the number of wamups and iterations into 200 and 1000 because I wanted to see whether the running time was the cause of time-consuming. So in real settings, I used 1000 and 10000 for warmup and iterations. Thank you for sincere comments.

  2. I believe that considering phylogenetic relationships is valid for this data set. However, I think that random intercepts are difficult to interpret with random slopes, so I am considering adding them as categorical variables. Then the random effect will be given only for the slope.

  3. I hope the data is enough to support the model complexity.

  4. Yes, I changed the setting of cores as you supposed. My PC has 18 cores, so

         chains    = 4,
         cores     = 4,
         threads = threading(12),

And I set one variable as a random slope for starters. Then running-time was about 20 hours (with 1000 warmups and 10000 iterations). There are 9 variables, so I hope it will become much more realistic running-time.

Thanks a lot,

Yes, I read this explanation, but it was already taking an unrealistic amount of time to calculate, so I was wondering if there was anything I could do to improve or overcome the problem.

For what it’s worth, you almost certainly don’t need 10000 iterations post-warmup. In most models, 1000 (x 4 chains) will be sufficient.

1 Like

I think there are a couple of things to unpack here. Let’s distinguish between “standard random effects (of species)” that are independent, “phylogenetic random effects (of species)” that are modeled based on a covariance matrix that is a priori known (e.g. the covariance structure implied by a Brownian motion model of character evolution), and fixed effects (of species) that are estimated completely independently for each species in the dataset. Each of these three classes of effect can be applied to an intercept or a coefficient (i.e. a slope; here I use “coefficient” to maintain clarity about the fact that there might be multiple coefficients in the model).

Whatever term(s) you apply the phylogenetic effect to, you cannot also apply the fixed effect of species on the same terms, because with suitably weak priors the fixed effect will soak up all of the variation. However, whatever terms you apply the phylogenetic effect to, you can also apply a standard random effect of species on the same terms. The model with just the phylogenetic random effect assumes strong phylogenetic signal, in the sense of \lambda = 1 where \lambda is Pagel’s lambda . If we additionally include a standard random effect of species, we now have a model that allows for alternative values of \lambda, and the true value is to be estimated from data.

1 Like

Thank you for advising! I changed into 300 warmups and 1000 iterations.

Thank you very much for advising.
When a species is set to a fixed effect, the calculation no longer moves from 0%. When I ran the calculation with phylogenetic information for both the random intercept and the random slope, I was able to perform the analysis. I thought this might be because the data represents plant population trends over two time periods and includes populations that are going extinct, so if I include species as a fixed effect, the calculation might not converge.