Correlated slopes and intercepts for a 5-level factor: too many correlation params

I have a grouping factor G and an quinary unordered population-level predictor x — call its levels xA, xB, xC, xD, xE. xA is the reference level (I want to stick with contrast coding because of the interpretive convenience). For those who care, it’s a categorical model which also includes 29 other covariates.

My model has varying intercepts for G, and now I want to add varying slopes for the non-reference levels of x, along with slope-intercept correlation parameters. The problem is that when I specify the formula as y ~ (1 + x|G), I end up not only with the desired parameters, but also correlation parameters between every single non-reference level of x. I don’t want that s***. It makes the model extremely complex and hard to fit, as well as potentially removing credit from predictors of actual interest. How can I tell the model to only fit correlation parameters between 1 and xB, 1 and xC, 1 and xD, and 1 and xE?

I hope someone can help. My thesis deadline looms, and many models remain to be fit.

1 Like

Explicitly expand your factor G to 4 covariates that give the relevant four columns of the design matrix. Then use || (or the |<ID>| syntax if you want to retain some correlations but not all).

1 Like

Could you show me how exactly it’s done? My datafile is attached, along with both the expanded and unexpanded versions of x. I can’t seem to figure out how to use the double pipe syntax for targeted suppression/retention of correlation parameters.

The group-level parameters that I need to estimate are:

sd(muB_Intercept)
sd(muC_Intercept)
sd(muD_Intercept)
sd(muB_xB)
sd(muC_xB)
sd(muD_xB)
sd(muB_xC)
sd(muC_xC)
sd(muD_xC)
sd(muB_xD)
sd(muC_xD)
sd(muD_xD)
sd(muB_xE)
sd(muC_xE)
sd(muD_xE)
cor(muB_Intercept,muB_xB)
cor(muC_Intercept,muC_xB)
cor(muD_Intercept,muD_xB)
cor(muB_Intercept,muB_xC)
cor(muC_Intercept,muC_xC)
cor(muD_Intercept,muD_xC)
cor(muB_Intercept,muB_xD)
cor(muC_Intercept,muC_xD)
cor(muD_Intercept,muD_xD)
cor(muB_Intercept,muB_xE)
cor(muC_Intercept,muC_xE)
cor(muD_Intercept,muD_xE)

But No Others.

anon.txt (32.0 KB)

I think that you already caught that I made a typo above. I meant to write “expand x”, not “expand G”.

I think the set of correlations you are asking for doesn’t make sense, because (for example) once you ask for cor(muB_Intercept,muB_xB) and cor(muB_Intercept,muB_xC), I don’t see how you avoid also needing to estimate the correlation between muB_xB and muB_xC.

1 Like

So you’re saying, basically, that what I viewed as superfluous correlation parameters are not superfluous after all.

And as usual, you’re probably right.

The problem as I see it is that for some estimates of the non-superfluous correlations, it is literally impossible for the superfluous correlation to be zero (see this intuitively in the limit that the correlations approach 1). If you want to fix that superfluous correlation to zero (which I assume is what you mean when you say that you don’t want to estimate it), you impose a constraint on the other correlations in order to keep the correlation matrix positive semi-definite. So the question is what is the joint prior that you want on all of the correlations that respects the underlying constraint of a positive semi-definite matrix? When you try to answer this question, you will find that either you cannot fix these “superfluous” correlations to particular values (like zero), or else you will end up with possibly unexpected and pretty funky prior constraints on the correlations that you view as non-superfluous.

2 Likes

Can I ask a follow-up, @jsocolar (if/when you find the time)? In what sense are the non-estimated correlations “fixed to zero”?

I’ve been struggling with this notion a bit, since random intercepts and slopes will often be correlated even when the correlation parameter is not estimated.
(see here for the frequentist case: Doing Bayesian Data Analysis: Shrinkage in hierarchical models: random effects in lmer() with and without correlation)

1 Like

The intercepts and slopes are virtually always correlated at some non-zero amount, so the important question is not whether or not, but how strongly. The implication of including vs not including an explicit parameter for the correlation is that if you do include one, then the random slopes and intercepts will not only be shrunk toward the mean of all groups but also adjusted toward the estimated correlation.

At least that’s how I’ve come to see it. The gurus will confirm or disconfirm.

Yea, that’s my understanding as well, that when estimating correlation parameters this becomes a ‘source of shrinkage’, so to speak.

However, when NOT including such a parameter… can it be said to be “fixed at zero”, such that the true correlation is assumed to be zero? In that case, it would still be a ‘source of shrinkage’ (im sure there’s a better term), but one that pulls intercepts and slopes towards a correlation of zero.

EDIT: Don’t mean to derail the original question (which was about how to do it) with this specific point… i’m just trying to understand what happens when you do not include such parameters.

1 Like

The sample correlations are not fixed to zero, but the underlying population correlation is fixed to zero. Th assumption is that the fitted random effect coefficients arise from a multivariate normal distribution whose covariance matrix has zeros at the relevant entries.

2 Likes

Btw if you want to fix elements of a correlation matrix to zero and maintain the PD requirement there is a way. @Seth_Axen worked it out over on the turing github Covariance conditioned on undirected graph · TuringLang/Turing.jl · Discussion #2067 · GitHub.

The log abs jacobian isn’t worked out though.

3 Likes

The linked gist has been updated to include the Jacobian absolute determinant: Bijector from Cholesky factor of correlation matrix with structural zeros · GitHub

1 Like

Even with the correct Jacobian adjustment, it’s important for the modeler to carefully consider whether the prior constraints on the remaining correlations are actually desired. Fixing certain correlations to zero because they are viewed as nuisance does not merely yield a more parsimonious model with all else equal; rather it yields a model with extra hard prior constraints on the joint distribution of the remaining correlations.

1 Like