 # Non-normally distributed predictor and outcome variables

Does anyone have any suggested papers/vignettes for how to tackle a regression situation in which both predictor and outcome variables are non-normally distributed, in a way that could be applied in brms? E.g., both may be best represented as a beta-distribution (alongside other categorical or normally distributed predictors), or more simply, one predictor may be beta distributed but the outcome variable is not? I have gotten to grips with a beta-distributed outcome variable already, but not how to run non-normal predictors.

I can’t quite tell if something like this is what is being done in “Estimating Non-Linear Models with brms” (not a beta distribution, but as an example of the general principles). In this case it looks like you would put some sort of function alongside the predictor into the regression equation.

Any suggestions would be much appreciated! I also understand that people may not like the term ‘outcome’ and ‘predictor’ variable, but I am just trying to make obvious what side of the regression equation the non-normally distributed variable would be.

• Operating System: MacOS 10.12.6

Hi @JimBob!

Short answer: the distribution is important for the outcome, but not the predictors. And you specify the outcome distribution via the family argument in brms.

Usually you don’t have to make any assumption about the predictors distribution, however their location and scale (first two moments) determine the regression coefficients - change of location usually changes the regression intercept, change in scale changes the size of the predictors coefficient. A non linear transformation of the predictor can change the interpretation of its coefficient - the log transformation approximately yields a regression coefficient that is the change in outcome given a percentage change in predictor. All this generally works without further (distributional) assumptions about the predictors.

A non-linear regression usually (not always, but certainly in brms) means a regression that is not linear in coefficients. Something like y = \alpha + \beta \log(x) + \epsilon or y = \alpha + \beta_1 x + \beta_2 x^2 + \epsilon would not classify as non-linear regression. Something like y = \alpha + \beta_1 x^{\beta_2} + \epsilon, however, would. Again, there are no distributional assumptions*) about the predictors (not saying that location and scale of predictors don’t matter; they do!).

Distributional assumptions about predictors become important if you do, for example, something like an error measurement or missing values models.

Hope this helps for now. Feel free to ask follow up questions.

Cheers!

*) Edit: well, in the log case you’re of course assuming that x > 0… But that’s not really a sufficient distributional assumption…

3 Likes

Dear @Max_Mantei thanks so much for your rapid response and apologies for taking a while to get back to you! I’m somehow really surprised the distribution of the predictors is not important. Maybe it is just coming from a frequentist background where it’s like ‘non-normal = use a non-parametric test’! So it is actually not an issue if I have a variable that is clearly beta-distributed, and I am predicting another variable that is beta-distributed, I only need to specify the distribution of the outcome variable? This is quite simple to do, I only find it surprising somehow!

If I understand you correctly, the nonlinear regression in that vignette would be something like where you are using a polynomial of a predictor, and not related to the distribution of the predictor itself (the effect of the predictor is not linear, but it doesn’t mean anything about how the predictor is distributed)?

Hey JimBob! This is actually not really a Bayesian vs. Frequentist thing.

Maybe it is just coming from a frequentist background where it’s like ‘non-normal = use a non-parametric test’!

I heard this a lot as well (I’m from an PolSci/Econ background) and was a bit befuddled until I realized that people often don’t really have a strong methods background and find everything non-parametric super-appealing–because, less/weaker assumptions! (Seemingly!) And while there is some truth to being careful using methods with very strong assumptions, often people are not really aware of what these assumptions are or what they are there for. For regression analysis you don’t need to make distributional assumption about the predictors - neither in a Frequentist nor in a Bayesian setting.

If I understand you correctly, the nonlinear regression in that vignette would be something like where you are using a polynomial of a predictor, and not related to the distribution of the predictor itself (the effect of the predictor is not linear, but it doesn’t mean anything about how the predictor is distributed)?

This is basically correct, although your example of a polynomial would still be a linear model.

\begin{align} X &= [x, x^2] \\ \mathbf{\beta} &= [\beta_1,\beta_2]' \\ \mathbf{\hat{y}} &= \alpha + X\beta \end{align}

While this is non-linear in predictors, it is a model, which is linear in parameters. Note that I couldn’t really represent a model like y = \alpha + \beta_1 x^{\beta_2} as a system of linear equations – it’s non-linear in parameters.

Sometimes people refer to Generalized Linear Models (GLMs) as non-linear regression and that can be confusing. A simple example would be a Poisson regression

\begin{align} \eta &= \alpha + \beta x \\ y &\sim \text{Poisson}(\exp(\eta)) \end{align}

where you have a non-linearity, \exp(), but \eta is still a linear function of the parameters (it is therefore also called the linear predictor of the model).

As such, the Beta regression you have in mind is “kind of” like a GLM (not really, but I don’t want to get into the specifics here). However, the model formula is likely to be linear in parameters, and you probably won’t need the “brms non-linear” functionalities.

Hope this helps! (If not you can also post model and code here, so we can have a closer look.)
Max

Yes indeed, if it’s not correct then it’s not correct. I guess what I meant was, having been trained in statistics in a pretty suboptimal way, which happened in my case to be frequentist. I was referring to the decision-tree type approach you see discussed in books such as McElreath’s statistical rethinking, where you go through the assumptions and every yes/no answer leads to an apparently totally different type of test (t-test, ANOVA, mann-whitney test, etc.). Reading up more myself and seeing most things as a form of regression really changed the ways I view analyses and I find it much clearer now, and I learned that through Bayesian-based books such as McElreath’s and Kruschke’s. The decision-tree type approach is very often ‘frequentist’, or is at least exactly how I was taught frequentist statistics.