Help building a faster model

I’m currently working on building a model that looks at the number of late spring freezing events based on mean spring temperature, NAO, and elevation in brms. I have about 1 billion rows of data so it is very, very slow. It can take up to a week to run.

Here’s a made up dataframe:

cc <-sample(c(0,1), replace=TRUE, size=150)
species <-sample(c("Acer", "Betula", "Quercus", "Fagus"), replace=TRUE, size=150)

df<-data.frame(freeze=rnorm(150, mean=3, sd=1),
mat=rnorm(150, mean=0, sd=5),
nao=rnorm(150, mean=1, sd=2),
elevation=rnorm(150, mean=4, sd=1),

And my current model:

fit<- brm(freeze ~ nao + mat + elevation + cc + nao:cc + mat:cc + elevation:cc +
(nao + mat + elevation|species), data=df, control = list(max_treedepth = 12,adapt_delta = 0.99),
chains = 4, cores = 4)

Do you have any suggestions to speed this up? Thanks!

With 1 billion observations and a Gaussian likelihood, you should checkout stan_biglm in the rstanarm package, which uses sufficient statistics.

Thanks, Ben, it looks great. Is there any way to do a mixed effects model with stan_biglm?

No, but you can do interaction terms with group indicators.

Hi Ben, Just to be dense on language, do you mean adding a fixed effect for the group and interactions with it? So if species is the group, instead of:

fit <- brm(freeze ~ nao + mat + mat:cc +(nao + mat|species)...)

It would be something like:

fit <- brm(freeze ~ nao + mat + mat:cc + species + nao:species + mat:species)...)



Hi Ben, is there a way to use binomial or poisson models with stan_biglm?


No. If the GLM is not Gaussian and / or the inverse link function is not an identity, then there are no sufficient statistics.