Collinearity

bbbales2 · February 21, 2018, 7:25pm

I’ve got a simple little regression:

y ~ normal(a * s + b * h + c, sigma)

The way the experiment was designed, to s was picked as a function of h (without doing this, the measurement wouldn’t be possible – the experiment stops working)

Anyhow, s and h are collinear, and our posteriors on a and b end up being correlated (and it makes sense that they would be). Is there anything to be done in a situation like this? It seems like the answer is no, but I just wanted to be clear.

With that in mind, how best to report these variables? The target audience isn’t statisticians. I think a multivariate normal approximate to the two variables we care about in the posterior is fine. I could probably do a 2d density plot of the approximate posterior if that seems best too.

I saw somewhere in the comments of a Gelman blog where McElreath said something about preferring to make predictive plots to try to explain what’s happening (as opposed to reporting parameter values). We’re making a couple of those, but no way folks don’t ask us for numbers on the parameters themselves.

bgoodri · February 21, 2018, 7:35pm

Do the QR reparameterization (as long as s is not an exact linear function of h) to get a good n_eff and then when you do the posterior predictions for a given h you may have to choose values of s in the same way as in the experiment.

avehtari · February 24, 2018, 3:31pm

Ben’s QR proposal can help with sampling speed, but doesn’t help with explanation.

2d plot is useful for for collinear as demosntrated here https://rawgit.com/avehtari/modelselection_tutorial/master/colinear.html

You could also consider whether there is sensible easy to explain transformation of s which would make it more orthogonal.

You could plot predictions as a function of h, so that you would have y ~ normal(a*s(h) + b*h + c, sigma), ie when making predictions you use that same function you used to pick s (and if it’s not deterministic you can make a probabilistic model for that, too). Then you can plot h against `y’.

bbbales2 · February 25, 2018, 12:40am

Thanks @bgoodri and @avehtari.

How much would this be preferred over predictions over a grid of s/h where I haven’t tried to model how s is chosen as a function of h?

The relationship isn’t deterministic, but I’m hesitant to try to build a model of it cause there are only 4 values of h and a 5-8 s values at each one. I just don’t think the margin between ending up with an overfit or an underfit model would be large enough to try to take a shot at that.

What we did so far was just plot some lines y vs. h for a few s values (it’s the b parameter we’re trying to show off here).

avehtari · February 25, 2018, 6:55am

Depends on how strong the dependency is, the specific phenomenon and the target audience.

This is what Andrew would very probably do.

bbbales2 · February 25, 2018, 2:17pm

If nothing else, if I do it badly then maybe someday I can be a blog post ^^

Topic		Replies	Views
Correlated predictors vs fewer predictors with small dataset? Modeling specification , r , brms	8	611	October 3, 2022
QR Regression Questions Modeling	7	1245	July 29, 2017
Calcualte VIF? Modeling techniques , specification , performance	8	2687	March 2, 2020
"Pairwise" alternative to multivariate normal isn't behaving for the hierarchical case; help? Modeling techniques , specification	13	872	June 10, 2021
Collinearity and Bayesian modeling Modeling	1	1092	May 10, 2019

Collinearity

Related topics