Fixed-effect model matrix is rank deficient so dropping 1 column / coefficient

I am using rstanarm and the following model

When including Stan code in your post it really helps if you make it as readable as possible by using Stan code chunks (```stan) with clear spacing and indentation. For example, use

mod1 <- stan_glmer(round( ~ bag.limitLS + min.sizeLS +  bag.above.maxLS + max.sizeLS +
                       gdp + pop.densityL + med.age + med.income + + unemp.rtL + price +
                       p.max + t.maxL + w.maxL +  (1|state),
                   data = rd,  family = neg_binomial_2)

The model converges and check_collinearity from package performance calculates the largest VIF as 4.9.

range(summary(mod1)[, "Rhat"])
[1] 0.9994929 1.0042284

so that looks ok.
When I run loo(mod1, k_threshold = 0.7) I get “fixed-effect model matrix is rank deficient so dropping 1 column / coefficient”. How do I figure out which column is ranked deficient?

If a matrix is not full column rank by definition any column can be written as a linear combination of other columns (how many columns you need to express any other columns is a function of the rank), so there is no individual column that is the culprit. If the rank deficiency is one it means any column can be eliminated, so you can choose what is the least useful predictor (you are not losing any information because it would just be hampering identifiability and being redundant by definition).

The warning should refer to the design matrix, so it will not be full rank regardless of parameter values, and you should be able to check if there are linear dependencies there (I’m guessing you can extract the design matrix, otherwise you can construct it manually). You have a few predictors, but for a rank deficiency of one it may still be feasible to check for combinations of them or predictors that are the same (e.g. income is simply a multiple of age, or something else that stands out).

More generally, if you have even lower rank, you need to choose a set of linearly independent columns that has the same number of columns as the rank.

(Disclaimer: I don’t use brms, so I may be missing something of what’s under the hood or what the warning messages mean)

This isn’t always true. Simplest counterexample is if we take a full-rank matrix and duplicate one of its columns. It is now rank deficient, but only dropping one of the duplicated columns restores it to a full-rank matrix. In general, if we take a full rank matrix and then add a column that is a linear combination of a subset of the columns, then to eliminate the rank deficiency we must drop either the new column or one of the subset.

In R, Matrix::rankMatrix will give you the rank of a matrix. If the design matrix M has N columns, then the rank of the design matrix should be N. If the rank is less than N, then the goal is to drop a column (or iteratively drop multiple columns if necessary) such that the rank of the new reduced matrix does not decrease. This will always be possible to do until you arrive at a new matrix whose rank is equal to the number of columns.

That’s right. I got ahead of myself and was thinking under common assumptions that don’t have to hold. Better to use the general approach: (i) find the column rank r of the matrix; (ii) get a set of r independent columns for the design matrix.

1 Like