I am new to using brms and Bayesian modeling in general. However, I wanted to ask a basic question: I am planning to run mixed models of various types (binomial, zero-inflated poisson, and zero-one-inflated beta). I have collected information on lesions of various types in various regions within individuals. Therefore, I have nested categorical variables (region and lesion type) and continuous covariates (age, disease duration, etc). Is it advisable to scale the continuous predictors before running the models or does brms handle this internally when I run the models?
brms takes the predictors that you give it, so as far as I am aware, if you want to have the responses scaled, then you need to do so yourself before sending anything to brms.
I would add that given your covariates you probably do want to scale them in some way - otherwise the regression output will by default be at value 0 for age (age 0) with 0 disease duration. That is probably not what you want.
No, the coefficients come back as what you have given them. Let’s say you scale it with a z-score (subtract the mean, and divide by the standard deviation). Then 0 is the average Age and +/-1 is +/-1 standard deviation in age from the average. That is how the coefficients will come back - so other parameters would be estimated at the average age (0) and the age coefficient would represent going up or down 1 SD in age. If you want to go back to the original scale then you could say that the SD of Age is XYZ years, and if we go up or down this amount in age vs. the average age, we expect the value of the outcome to change the amount that corresponds to the coefficient.
@JimBob is right about the technical aspects - there is no built-in function and one of the reasons is that specifics of which scaling would be reasonable are heavily dependent on your data and application. There are multiple goals you might want to achieve with scaling, including:
Being able to easily set priors
Making the model coefficients easier to interpret
Making the model coefficients comparable across studies
I would generally advise against scaling by SD of your data as that doesn’t really help with either 1 or 3 and it is questionable if it helps with 2. In many cases, there is some relatively reasonable way to scale the predictors without considering the actual data you collected. E.g. for age it might make sense to subtract 50 and divide by 10, so that your intercept will correspond to the response for a 50-year old and your coefficient for age will correspond to change with each decade. Presumably a clinician will find it easier to think how lesions change between people 10 years apart than 1 year apart and it will prevent your coefficient from being super small.
If your data is informative, your inferences should not be affected very much by shifting and scaling the predictors. One way to make it easier to think about this is to not inspect model coefficients directly, but rather make predictions, e.g. “what is the difference in average lesion size of type A in 50 year old patients and 70 year old patients” - this quantity will not change however you scale your predictors (as long as you also scale your priors). You can use posterior_epred/posterior_linpred/posterior_predict to make such predictions for any comparison you want.
Yes - I agree with @martinmodrak - was just giving the z-scoring as an example! Exactly what age you pick to center the variable may depend on what age ranges you are looking at so it depends on your data. You might consider also for example taking the most typical age at which the disorders you are investigating develop or some kind of middle age in your data rounded to the nearest 5 or 10 years, as other possibilities. But I would certainly consider some kind of scaling/centering just so that the coefficients aren’t estimated for people with age 0 or 0 duration of disease.