Collinearity

I’ve got a simple little regression:

y ~ normal(a * s + b * h + c, sigma)

The way the experiment was designed, to s was picked as a function of h (without doing this, the measurement wouldn’t be possible – the experiment stops working)

Anyhow, s and h are collinear, and our posteriors on a and b end up being correlated (and it makes sense that they would be). Is there anything to be done in a situation like this? It seems like the answer is no, but I just wanted to be clear.

With that in mind, how best to report these variables? The target audience isn’t statisticians. I think a multivariate normal approximate to the two variables we care about in the posterior is fine. I could probably do a 2d density plot of the approximate posterior if that seems best too.

I saw somewhere in the comments of a Gelman blog where McElreath said something about preferring to make predictive plots to try to explain what’s happening (as opposed to reporting parameter values). We’re making a couple of those, but no way folks don’t ask us for numbers on the parameters themselves.

2 Likes

Do the QR reparameterization (as long as s is not an exact linear function of h) to get a good n_eff and then when you do the posterior predictions for a given h you may have to choose values of s in the same way as in the experiment.

Ben’s QR proposal can help with sampling speed, but doesn’t help with explanation.

2d plot is useful for for collinear as demosntrated here https://rawgit.com/avehtari/modelselection_tutorial/master/colinear.html

You could also consider whether there is sensible easy to explain transformation of s which would make it more orthogonal.

You could plot predictions as a function of h, so that you would have y ~ normal(a*s(h) + b*h + c, sigma), ie when making predictions you use that same function you used to pick s (and if it’s not deterministic you can make a probabilistic model for that, too). Then you can plot h against `y’.

1 Like

Thanks @bgoodri and @avehtari.

How much would this be preferred over predictions over a grid of s/h where I haven’t tried to model how s is chosen as a function of h?

The relationship isn’t deterministic, but I’m hesitant to try to build a model of it cause there are only 4 values of h and a 5-8 s values at each one. I just don’t think the margin between ending up with an overfit or an underfit model would be large enough to try to take a shot at that.

What we did so far was just plot some lines y vs. h for a few s values (it’s the b parameter we’re trying to show off here).

Depends on how strong the dependency is, the specific phenomenon and the target audience.

This is what Andrew would very probably do.

If nothing else, if I do it badly then maybe someday I can be a blog post ^^