Running regressions with compositional data (sum up to 1, leading to multicollinearity)

TA-White · December 20, 2024, 11:39am

Hi,

Context:
I have 7 time-series variables that represent compositional data - they are proportions that sum to 1 at each point in time. I want to use these as predictors in a regression model to explain a target variable.

The challenge is that these predictors exhibit perfect multicollinearity due to their compositional nature. At any time t, if we know the values of 6 of the predictors, the 7th is deterministic (it must be 1 minus the sum of the other 6).

# Example model formula
target ~ regressor1 + regressor2 + regressor3 + regressor4 + regressor5 + regressor6 + regressor7

My question:

What approaches would you recommend for handling compositional predictors in a regression context using brms?
For now, I’ve been dropping one component in my regressions (but am concerned about interpretation).
My understanding is there might be some compositional regression methods out there, but I’m not sure where to start.
Has anyone implemented some solutions successfully with brms in a similar context? Any guidance on the pros/cons of different approaches would be greatly appreciated.

Thanks a lot.

amynang · December 20, 2024, 2:35pm

Have you lookied into this?

Bob_Carpenter · December 28, 2024, 11:54pm

Thanks for posting, @amynang—that’s a really cool approach. @spinkney just applied the isometric log ratio transform to our sum-to-zero vectors, which provide a similar issue. And we’ll be using it to parameterize simplexes. The simple idea is that it’s a more clever way of turning a simplex with N entries that sum to zero into N - 1 values that are independent on the unconstrained scale.

JohnnyZoom4H · December 30, 2024, 2:17am

The isometric log ratio method, what I like to call the Aitchison method (after John Aitchison, father of compositional data analysis) is a valid way to approach a problem like this.

Aitchison developed his approach where it was the dependent variable that was compositional. Having the independent variable be compositional does not invalidate this approach in the least, but suggests approaching your problem as a mixture model, which is exactly this situation.

The standard in this area is J. Cornell’s “Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data”. If I recall, these methods were developed or cases where the mixture (here, your 7 time series components) was set as part of the experimental design. Think food science recipes or pharmaceutical formulations. Again, if your components are observational, it still shouldn’t make the approach unworkable.

The key idea in modeling is to avoid using an intercept term, as it would be collinear with the sum of your main effects. Care must be taken when extending your model to e.g. interactions, because the sum constraint sets limits on how this can be done (but it still can).

While the interpretation of main effects is not as simple as “how does response change if I change one factor one unit, keeping all others the same”, because of course you can’t do that, they can be interpreted as contributions to the response, and they can be compared to each other. Note when using the Aitchison approach, there is no easy way to interpret the main effects, only ratios of them.

I encourage you to try both.

Topic		Replies	Views
Understanding Dirichlet Regression output for compositional Data General brms	1	377	February 21, 2024
Using stan via brms to model compositional response brms dirichlet-multinomial	13	1362	July 3, 2020
Phylogenetic Dirichlet regression brms dirichlet-multinomial , phylogenetic , brms	2	880	April 29, 2022
Best way to model time series with autocorelation and parameters colinearlity? Modeling specification , brms	0	180	April 4, 2024
Autoregressive Residual Structure With Multiple Grouping Variables brms specification , brms	0	360	April 27, 2023

Running regressions with compositional data (sum up to 1, leading to multicollinearity)

Related topics