# Logistic regression with less-than-full-rank feature matrix

Hello and much love from a burgeoning stan user.

I’m currently revisiting Richard McElreath’s wonderful book, statistical rethinking, and wanted to get a better understanding of how I should think about matrix rank. The rethinking r package contains the data I am using in this question

``````#install.packages(c("devtools","mvtnorm","loo","coda"),dependencies=TRUE)
#library(devtools)
#install_github("rmcelreath/rethinking",ref="Experimental")
library(rethinking)
data(chimpanzees)
d <- chimpanzees
d\$treatment <- 1 + d\$prosoc_left + 2*d\$condition
dat_list <- list(
pulled_left = as.integer(d\$pulled_left),
actor_id = as.integer(d\$actor),
block_id = as.integer(d\$block),
treatment_id = as.integer(d\$treatment))
str(dat_list)
``````

In the book, McElreath includes every “level” of each of the categorical variables in his logistic regression to predict whether or not a chimp “pulled_left”. Note that there are 7 actors, 6 blocks, and 4 treatments.

``````model{
vector p;
theta ~ normal( 0 , 1.5 );
beta ~ normal( 0 , 1.5 );
alpha ~ normal( 0 , 1.5 );
for ( i in 1:504 ) {
p[i] = alpha[actor_id[i]] + beta[block_id[i]] + theta[treatment_id[i]];
p[i] = inv_logit(p[i]);
}
pulled_left ~ binomial( 1 , p );
}
``````

In the frequentist world, this would be problematic for maximum likelihood to handle. GLM in R would drop one factor level from two out of three factors assuming I don’t include an intercept term. Additionally, BRMS has default behavior that drops one factor level from the second two categorical variables.

``````library(brms)
library(tidyverse)

df <- as_tibble(dat_list)
fit <- brm(pulled_left ~ 0 + factor(actor_id) + factor(block_id) + factor(treatment_id),
family = bernoulli, data = df, prior = ...
)
summary(fit)
``````

So how should I interpret this? Does matrix rank affect MCMC exploration?

Thanks!