Dynamically omit interactions from brm model

Currently I have the following brm model with a continous outcome variable y and a mixture of continuous and ordinal predictors:

original_model = brm(y ~ (var1 + var2)*(var3)+(var1+var2|ParticipantID),

I am now trying to make the formula dynamic so that at every iteration in a loop a different predictor can be left out. The purpose of this is for power calculations; for every predictor, I want to calculate bayes_R2 for the model with the predictor of interest and again for the model without the predictor of interest.

The difficulty is that sometimes I want to dynamically omit an interaction term. For example, in the model above I may want to include interaction (var1 x var3) but omit (var2 x var3) from the model. So far I have come up with the following solution:

# First I construct two strings that describe all the main and 
# interaction terms I want in each version of the model: 
formula_withKeyVar <- as.formula("y ~ var1 + var2 + var3 + (var1*var3) + (var2*var3) 
+ (var1 + var2 | ParticipantID")
formula_withoutKeyVar <- as.formula("y ~ var1 + var2 + var3 + (var1*var3) 
+ (var1 + var2 | ParticipantID") # omitting (var2*var3)

# Then I plug each formula into a model and calculate bayes_R2 for each:
model1 = brm(formula = formula_withKeyVar, data=df,family='gaussian',iter=1000,
model2 = brm(formula=formula_withoutKeyVar,data=df,family='gaussian',

So my question is: is the above method an appropriate way to make a brm formula dynamic in order to omit interaction terms? I am concerned because model1 takes a lot longer to run than original_model, and the power calculation seems erroneous, so I wonder if my approach is achieving what it’s intended to achieve. Thanks, I appreciate any advice.

I guess is the concern that as.formula isn’t working with brms?

If you do:

model1 = brm(formula = y ~ var1 + var2 + var3 + (var1*var3) + (var2*var3)  + (var1 + var2 | ParticipantID), data=df,family='gaussian',iter=1000,

do you get something different than:

formula_withKeyVar <- as.formula("y ~ var1 + var2 + var3 + (var1*var3) + (var2*var3) 
+ (var1 + var2 | ParticipantID"))

model1 = brm(formula = formula_withKeyVar, data=df,family='gaussian',iter=1000,


1 Like

Just a heads up, bayes_R2 may not be the optimal way of selecting the best model, because your model might overfit (i.e. if you have small amount of data, a model with some redundant parameter may fit to the noise & show better fit than the model that’s closest to the true data generating process). PSIS-LOO may be a better choice.

What do you mean by power calculations? Typically, power is used to describe the ability to detect some parameter of interest (as “significant” in frequentist setting), in the Bayesian setting I guess the equivalent goal would be to see how much data you need for narrow enough credible intervals around parameters of interest (i.e. have enough precision). How does R2 figure in that?

Thanks for your response. The purpose of the power calculations is to determine the sample size required to replicate the current significant result (a similar approach as described here: Power calculation for replication sample). I am using the function wp.regression() from the WebPower package to make the power calculation for each regression, and this function uses the statistic f2 which requires R^2 from the model both with and without the variable of interest. For this R^2 calculation, would you suggest I use the function brms::loo_R2() instead of brms::bayes_R2()?

As far as I know, you may be barking up the wrong tree, because for Bayesians, power is much less of a concern than for frequentists. In frequentist error-control setting, power tells you how likely you are to correctly reject a null hypothesis if an effect indeed exists. The key dilemma for frequentists is balancing type-I and type-II error: the probability that: a) you decide that an effect exists when in reality it doesn’t (type-I), and b) you decide that there’s no evidence to conclude that an effect exists when it does (type-II). Therefore, the ratio of alpha (ceiling on the type-I error level) to beta (type-II error level; 1 - power) is very important because if your beta is a lot higher then your alpha (i.e. you have low power), then in the long run you might end up with most of your positive findings being false positives.

In the Bayesian setting, power is not as much of an issue, for a couple of reasons. First, Bayesians don’t like to make binary decisions (“reject/fail to reject H0”) as much and instead include uncertainty about the presence/absence of effects in the model or average across many models. Second, Bayesians tend to summarize their findings on marginal posterior distributions for parameters of interest (i.e. how big or small an effect could get, under assumptions/priors), which automatically let you know the effect size, so they typically don’t base their inference just on whether an effect “exists/doesn’t exist” but also how big it is. P-value alone doesn’t tell you how big an effect is - you could have an inconsequentially small effect with really nice small p-value. Marginal posteriors shows you how big/small the effect may be, so e.g. you may find really convincing evidence that owning dogs leads to a longer life, but if your marginal credible interval goes from 0.01 - 0.02 years, that tempers how big of a claim you can make & get published. Thirdly, if you use weakly informative priors centered on the null (which is a good practice), that should regularize your findings, and so reduce the number of long run false positives compared to frequentist MLE.

There are probably a lot more reasons that I can’t think of right now, but these are the ones off-the-top of my head.


Thank you for this advice - you make some very interesting points. In light of power not being as much of an issue in the Bayesian setting I wonder what you might suggest for determining the sample size for a replication study to confirm our current results?

1 Like

Afaik there are a lot of different options. Theoretically, you may not need a stopping rule for sample size at all, and just gather data until you can reach some desired credibility, given that multiple comparisons aren’t as much of a big deal in the Bayesian framework (https://statmodeling.stat.columbia.edu/2016/08/22/bayesian-inference-completely-solves-the-multiple-comparisons-problem/ and http://www.stat.columbia.edu/~gelman/research/unpublished/multiple2.pdf). Then again, some argue that multiple comparisons can still be an issue, even in Bayes (see Kruschke’s book: https://books.google.co.nz/books?hl=en&lr=&id=FzvLAwAAQBAJ&oi=fnd&pg=PP1&dq=kruschke&ots=CfxpUXvh3K&sig=v3tGMdtzQv1tQ8QSmPEcrilSrKY&redir_esc=y#v=onepage&q=kruschke&f=false). Personally, I think that having both stopping rule + regularizing prior is always going to be cleaner than just one or the other.

However, even when doing Bayes with a stopping rule, you still have a lot of options open. The Kruschke’s book I linked has a whole chapter on it, have a read.

1 Like

Just a quick note (can expand if not clear): one thing I like to do is to simulate data a lot of datasets with an effect of interest and see how my posterior credible intervals are looking (e.g. how frequently does the 95% CI exclude zero and has the same sign as effect, but you can choose many metric depending on your goals, the width of some posterior CI and so on).

This tends to be computationally taxing, but works neatly.

Also for linear models, in the (few) models I tried that with, using a frequentist power calculation alignedquite well with results from simulations as described above (the 95% CI excluding zero criterion), and as those are quick and easy to calculate, you can use them as rough guides.

Best of luck!


I have a few blog posts on this topic. They’re based heavily on Kruschke’s text. You can find the first one here.