Hello and much love from a burgeoning stan user.
I’m currently revisiting Richard McElreath’s wonderful book, statistical rethinking, and wanted to get a better understanding of how I should think about matrix rank. The rethinking r package contains the data I am using in this question
#install.packages(c("devtools","mvtnorm","loo","coda"),dependencies=TRUE)
#library(devtools)
#install_github("rmcelreath/rethinking",ref="Experimental")
library(rethinking)
data(chimpanzees)
d <- chimpanzees
d$treatment <- 1 + d$prosoc_left + 2*d$condition
dat_list <- list(
pulled_left = as.integer(d$pulled_left),
actor_id = as.integer(d$actor),
block_id = as.integer(d$block),
treatment_id = as.integer(d$treatment))
str(dat_list)
In the book, McElreath includes every “level” of each of the categorical variables in his logistic regression to predict whether or not a chimp “pulled_left”. Note that there are 7 actors, 6 blocks, and 4 treatments.
model{
vector[504] p;
theta ~ normal( 0 , 1.5 );
beta ~ normal( 0 , 1.5 );
alpha ~ normal( 0 , 1.5 );
for ( i in 1:504 ) {
p[i] = alpha[actor_id[i]] + beta[block_id[i]] + theta[treatment_id[i]];
p[i] = inv_logit(p[i]);
}
pulled_left ~ binomial( 1 , p );
}
In the frequentist world, this would be problematic for maximum likelihood to handle. GLM in R would drop one factor level from two out of three factors assuming I don’t include an intercept term. Additionally, BRMS has default behavior that drops one factor level from the second two categorical variables.
library(brms)
library(tidyverse)
df <- as_tibble(dat_list)
fit <- brm(pulled_left ~ 0 + factor(actor_id) + factor(block_id) + factor(treatment_id),
family = bernoulli, data = df, prior = ...
)
summary(fit)
So how should I interpret this? Does matrix rank affect MCMC exploration?
Thanks!