(Dummy) coding/scaling of binary variable and prior choice in BRMS

Hey,
I just want to run this by people before I submit a pre-registration of a project. Here is the model that I like to fit:

value ∼ x * z + (1 + x + z | subject) 

the model will have Gaussian link function. I’ve read the
Prior recommendations and value will be scaled (unit-scale). The variables x and z are binary that will be coded zero and one if I pass them on to brm as a factor. Following the recommendations, I want use N(0, 1) priors for the fixed effects (as I’d like to calculate BF for those effects being zero) but I am wondering whether I have to code or scale my predictors differently than just dummy coding in order for the prior to be sensible.

I vaguely remember that a previous version of the WIKI said something about also coding binary variables so they have mean of zero etc. but I can’t find this. Is there any paper or tutorial or recommendation that you can give me?

I will have another logistic model with the same structure:

success ∼ x*z + (1 + x + z | subject)

can I assume the same (whatever the answer is for the question above) for this model provided using slightly different priors as suggested in the WIKI.

I’m unsure about the requirements of N(0, 1) priors for the calculation of Bayes factors, but you might find one of @andrewgelman’s papers useful on standardising regression coefficients: http://www.stat.columbia.edu/~gelman/research/published/standardizing7.pdf

A perfectly balanced binary variable will have standard deviation of 0.5 (e.g. look at rbinom(1e4, 1, 0.5) for a simulated example), but many often scale continuous predictor variables by 1 standard deviation, meaning that the regression coefficients for binary and continuous predictors will not be easily comparable. For this reason, Gelman argues for standardising continuous predictors by 2 standard deviations in his paper linked to above.

IMO, whether you scale the binary predictor as c(0, 1) or mean-center, it won’t have much influence on the choice of prior, because the change from level 1 to level 2 will be approximately 2 standard deviations, and the regression coefficient will describe the difference between the levels. However, sometimes binary variables are coded as c(-1, 1), and the regression coefficient describes the difference of each level to the overall mean of the dependent variable across the levels. In that case, we might expect the coefficient to be smaller than the coefficient describing the c(0, 1) or mean-centered variable.

For the logistic regression, remember that the prior is on the log-odd scale, and so the reasonable values of the coefficient might be different, supporting a different prior.

3 Likes

Thank you so much. For now, I will go with c(0, 1) interestingly in a simulation with a model y ~ x + (1 | id) I found that hypothesis() gives me very strange results for c(-.5, 0.5). When I compare the results I get from the ttestBF (from the BayesFactor package), with the results when I calculate the BF myself with Logspline Density Estimation:

library(polspline)
fit.posterior <- logspline(posterior_samples(model)$b_x)
posterior <- dlogspline(0, fit.posterior)
prior <- dnorm(0, 0, 1)
BF <- prior/posterior

with the results of hypothesis(model, 'x = 0'). ttestBF and Logspline Density Estimation give me very similar results, while the BF of hypothesis is nearly only half the size of the other.

This problem does not exists for c(0, 1). When I have time, I will try to investigate this further but hypothesis sometimes gives really strange results.