Am I being slow or is my model: Accelerating/parellelizing BRMS phylogenetic multilevel model

Hi,

I am trying to run phylogenetic multilevel models in BRMS but the models are taking a long time. I realise it might be the complexity of them but was hoping other more informed parties might be able to tell me if I am implementing multithreading code correctly as I have my doubts that it is doing much for me. Any advice about acceleration would be greatly appreciated as models take >50 hours to run at the moment.

For context I am trying to model whether particular plant pests are hosted by given plants from outside their native range, based upon the the phylogenetic closeness of the plants to those in the native range of the pests and their known host status.

My model form is as follows.


fullform <- bf(mvbind(Adelges_piceae,
                      Megastigmus_milleri,
                      Megastigmus_pinus,
                      Megastigmus_rafni,
                      Megastigmus_suspectus,
                      Armillaria_gallica,
                      Armillaria_mellea,
                      Armillaria_ostoyae,
                      Heterobasidion_annosum,
                      Lymantria_dispar,
                      Lymantria_monacha,
                      Operophtera_brumata,
                      Agelastica_alni,
                      Chrysomela_aenea,
                      Eriophyes_laevis,
                      Eupterycyba_jucunda,
                      Fenusa_dohrnii,
                      Hemichroa_crocea,
                      Melampsoridium_hiratsukanum,
                      Pterocallis_alni,
                      Rhagium_mordax,
                      Rhynchaenus_testaceus,
                      Stereum_rugosum
)
               ~ 1 + (1|p|gr(Host, cov = A)),
               family = bernoulli())

The modelling works and seems to do everything I want it to, 1000 iterations appears to be plenty to resolve the model, just very slowly.

I have tried to implement within-chain multithreading to accelerate things but I am not sure I have implemented it correctly and would really appreciate a sanity check and/or further advice on how to speed up my code.
I am not sure the current code actually makes use of the 8 cores that I am requesting from the HPC and I dont want to go queuing for more cores before checking that they’ll actually be of use.

I’m using the cmdstanr backend (cmdstanr 0.8.1.9000, cmdstan-2.33.1) and everything else should be pretty much up to date as well.

fullfit <- brm(fullform, data = host_matrix, data2 = list(A=A) , prior = priors,
               chains = 1, init = 0,
               iter=1000, warmup=500,
               backend = "cmdstanr",
               threads = threading(8)
)

TLDR: I don’t understand multithreading well enough to know if I am doing it right and could use a hand making it work. No shortage of available processing resource, just a shortage of understanding to utilise it.

Switch to binomial? I can’t create a dummy version of the phylogenetic part of your model but the third model below is much faster.

library(tidyverse)
library(brms)

d = data.frame(a = rbinom(1000,1,.3),
               b = rbinom(1000,1,.5),
               c = rbinom(1000,1,.1),
               h = rep(letters[1:5]))

d.1 = d %>% pivot_longer(1:3,
                         values_to = "v",
                         names_to = "p")

d.2 = d.1 %>% group_by(p,h) %>% 
  summarise(v = sum(v),
            tr = n())


m1 = brm(data = d,
         bf(mvbind(a,b,c) ~ 1 + (1|h)),
         family = bernoulli(),
         backend = "cmdstanr",
         chains = 4,
         threads = 3,
         iter = 4000,
         cores = 4)
# Total execution time: 13.3 seconds.
summary(m1)

m2 = brm(data = d.1,
         bf(v ~ 0 + p + (0+p|h)),
         family = bernoulli(),
         backend = "cmdstanr",
         chains = 4,
         threads = 3,
         iter = 4000,
         cores = 4)
# Total execution time: 19.5 seconds. (I did not expect that to be slower)
summary(m2)

m3 = brm(data = d.2,
         bf(v|trials(tr) ~ 0 + p + (0+p|h)),
         family = binomial(),
         backend = "cmdstanr",
         chains = 4,
         threads = 3,
         iter = 4000,
         cores = 4)
# Total execution time: 1.1 seconds.
summary(m3)

edit:
I did not know this but this works too!

d.3 = d %>% group_by(h) %>% 
  summarise(a = sum(a), 
            b = sum(b),
            c = sum(c),
            tr = n())
m4 = brm(data = d.3,
         bf(mvbind(a,b,c)|trials(tr) ~ 1 + (1|h)),
         family = binomial(),
         backend = "cmdstanr",
         chains = 4,
         threads = 3,
         iter = 4000,
         cores = 4)
# Total execution time: 1.2 seconds.
summary(m4)
1 Like

Thanks but I am not sure I can without changing what I am modelling or the meaning of the results.

The current model structure is such that each species’ presence/absence is being modelled separately for each host. Given this, Bernoulli seems more appropriate because I am modelling the presence or absence of each species for each host individually.
i.e. for every threat and host combination I get an estimate between 0 and 1 of likelihood of host status

Wouldn’t binomial would give me just a count of the number of hosts where a threat species was found out of a total number of hosts sampled?
i.e. For a list of 100 hosts I get a value representing the proportion that might be hosts, without informing which species are hosts.

That said I am reassured slightly that:

  1. the multithreading code you use looks the same as mine, so I might not be using that incorrectly.
  2. the data structure for dataframe matches that of d rather than d.1 so should hopefully not be making I even slower.

Look at the model outputs; they are all the same.

Both binomial and bernoulli give you the probability that a plant species is a host of a given pathogen, or maybe better, the probability that a pathogen infects a species.

I can see that in your example cases that’s true but I’m afraid not for this. As far as I can see binomial really is for a different use case for different input data. I realise I wasnt clear about my data structure, it doesn’t have any repeats with which to establish trials - I have just found I can upload files so here is an example of the interaction matrix used as the input.
HSMforPlot.csv (8.5 KB)

As you hopefully can see, the data is arranged such that there is a single entry of 1/0 for each unique host-threat combination as the data is derived from a database of interactions and not trial data. So I don’t see how binomial could be applied to this dataset to get the results, hence why I thought bernoulli was the only option.

Really my question lies in whether anyone has advice for accelerating the modelling through the use of the multithreading functions and whether I am using them correctly. I was under the impression that the fasted way to do it would be to point as many cores at a single chain and do within-chain parralelisation but I just don’t know if that is true, and or if I am doing it right. If it simply needs more cores that is something I can source but I just don’t know if I need to…