Phylogenetic signal for bernoulli families by brms

Hi, I am analyzing phylogenetic signal for bernoulli families as the following:

model <- brm(
  trend ~ pop + art + (1 | gr(phylo, cov = A)) + (1 | sci_name),
  data = fish, data2 = list(A = A), family = bernoulli(),
  sample_prior = TRUE, prior = prior_my, 
  iter = 10000, warmup = 5000,
  chains = 4, cores = 4,
  thin = 10, 
  save_all_pars = TRUE,
  seed = T, 
  control = list(adapt_delta = 0.99)

And then I use


But I find

  1. there is no sigma with Bernoulli families;
  2. I do not know how to calculate the phylogenetic sigma;
  3. when I use the code: tab_model(model), there are some warnings and error
    Warning: Following potential variables could not be found in the data: grphylo, cov = A
    Warning: Following potential variables could not be found in the data: grphylo, cov = A
    Error in [.data.frame(insight::get_data(model), , insight::find_random(model, : **
    ** undefined columns selected

I don’t know how to solve these problems. Can somebody can help me?

1 Like

Hi @Xiaodan_Wang welcome to the Stan forums.

I have a few suggestions and a few questions:

Is there a reason you increased these above the default values? Were you having convergence problems or did you need a very large effective sample size? Usually for the algorithm in Stan you shouldn’t need this many sampling and warmup iterations.

My guess is that you don’t need to thin here. Thinning is really only useful if the size of the posterior is too big to fit in memory on your computer. Otherwise it just throws away information. It used to be necessary with algorithms that required running tons of iterations, but Stan doesn’t need that many so thinning isn’t usually recommended.

Is seed=T intentional? I guess this works because R will translate T to 1 but in general I recommend either letting Stan set the seed randomly or picking a particular integer. Weird things can happen in R if you use T or F in unexpected places (it’s happened to me several times in the past).

Right, the bernoulli distribution doesn’t have a standard deviation parameter. But you are fitting a hierarchical model here so there should be hierarchical standard deviation parameters.

I don’t know much about phylogeny unfortunately. Can you say more about what you’re trying to calculate? Hopefully we can help you figure it out.

I’m not familiar with tab_model. Which package is that from? I don’t think that is in any of our R packages, so you might want to check which package that is from and if they have a place where you can ask questions (they might not have a forum but they may have a GitHub page where you can open an issue).


The covariance matrix for the phylogenetic random effect is calculated outside of the statistical model. In R you can achieve this with A <- ape::vcv.phylo(phylogeny), where phylogeny is an object of class phylo. This yields the expected covariance across species under a Brownian motion model of trait evolution. Perhaps this covariance matrix is what you mean by the phylogenetic sigma?

People who are used to estimating phylogenetic linear mixed models might find phylogenetic generalized mixed models to be a bit unfamiliar initially. In the linear case with one observation per species, models may be fit with no error term other than the multivariate normal distribution encoded in the phylogenetic covariance matrix itself. In the generalized case, there is a separate error term (for example a Bernoulli error term) in addition to the phylogenetic covariance. The phylogenetic covariance is applied hierarchically to the link-scale location parameter of the error distribution.


Nice answer, thanks for bringing some actual phylogeny knowledge @jsocolar!

Thank you for reminding me. I do not have convergence problems. I did not know the goodness of default value. So I think the bigger sampling and warmup iterations are better to improve convergence. Thank you again.

1 Like

Thank you for explaining this.

Thanks, Jonah, I learn a lot from your reply.

In the picture, I show my result

We can get the sd of phylo and sd of sci_name. The former sd is for phylogenetic correlation and the latter for species repeat measurement.
I have read the website Estimating Phylogenetic Multilevel Models with brms ( which shows how to calculate the phylogenetic signal in Gaussian distribution by the algorithm: ** **hyp <- paste(** ** "sd_phylo__Intercept^2 /", ** ** "(sd_phylo__Intercept^2 + sd_species__Intercept^2 + sigma^2) = 0"** **)** **(hyp <- hypothesis(model_repeat2, hyp, class = NULL))** **
But for Bernoulli distribution, I have no idea.

The tab_model () function is in the sjPlot package, which can output table including conditional and marginal R2 and creates HTML tables from regression models. When I use the code tab_model (), there are some errors and warnings as shown in the picture:

, But if I use the code ** trend ~ pop + art + (1 | phylo) + (1 | sci_name),cov_ranef=list(phylo=A)** instead of trend ~ pop + art + (1 | gr(phylo, cov = A)) + (1 | sci_name) , the tab_model (model) function can work.

Thank you for detailed explanation.

Hi @Xiaodan_Wang,

HERE and HERE are some links to past conversations on the r-sig-phylo and r-sig-mixed-model listserves about this topic, though not exactly about the binary response.

From what I understand, the general formula for the multilevel-model version of Pagel’s lambda is lambda = var(phylo)/(var(phylo)+var(all other random effects + residuals)). The brms vignette and the chapter 11, page 292 from Modern Phylogenetic Comparative Methods (where Paul Buerkner got the information for the phylogenetic models in the brms vignette) squares the values in the equation.

I may be wrong here, but I think, since there is no sigma value in the model, your lambda value would be estimated as var(phylo)/(var(phylo)+var(all other random effects )) or lambda = sd_phylo / (sd_phylo + sd_sci_name). However, this value may not actually be the same as the Pagel’s lambda value since the Bernoulli family does not generate a sigma value… some experts would have to weigh in on that.

The goal of calculating phylogenetic signal is to quantify the error introduced to the model from the phylogenetic history versus the error in the model from other sources (like residual error or other group-level effects). It might be easier to think of this equation not as a precise calculation of lambda, but instead a comparison of the error each term introduces to your model. In your case, you have a high value for sd_sci_name and a low value for sd_phylo. I am guessing that the sci_name parameter means that you have multiple measurements per species? Your results suggest, to me, that most of the error in your model is from intraspecific variation (0.92) rather than variance from phylogenetic history (0.09). In other words, low “phylogenetic signal”, even if you don’t have a specific “lambda” or “K” value to report.

I hope that made sense…


1 Like

Thank you for your explanation in detail and very hopeful links. I will try to calculate the lambda by the equation lambda = sd_phylo / (sd_phylo + sd_sci_name) . sci_name parameter means that I have multiple measurements per species, and the bigger variance came from the interspecies than that phylogenetic history.
Thank you again

No problem. Again, I am not sure if that is actually “lambda”, but it should provide similar information.

After thinking about it, I think the equation would be lambda = sd_phylo^2 / (sd_phylo^2 + sd_sci_name^2) because brms provides the variance, not the variance^2. Good luck!

Yes, you are right. Thanks