I have a dataset that consists of subjects coming into the clinic (for treatment of another disease) and they are screened for Tuberclosis (as they are a high risk population). Every time they are screened, they are either screen positive or negative (1 or 0) and then for each visit, there is an assessment of TB disease. My colloborator wants to assess the utility of this screening test.
The problem I’m running into goes like this  -
Since each subject is screened multiple times, I incluide a random intercept for each subject and a fixed effect for the screen result and run a mixed model logistic regression
library(rstanarm)
m1<-stan_glmer(disease_binary ~ screen_binary + (1|chart_number_2),data=df,family="binomial")
There’s about 11000 subjects with a total observations of 55000. This took me about 30 minutes running all 4 chains in parrallel.
To assess the screening test, I would like to calculate an AUC which is done using below -
screen_preds <- posterior_linpred(m1,newdata=df,transform=TRUE) %>% tbl_df
prob_data <- screen_preds %>%
                                    map_dfr(~median(.x))%>%
                                      gather(.,key=id,value="prob")%>%
                                          mutate(id=as.numeric(id))%>%
                                              inner_join(.,dta_bayes,by="id") .  ### I take the median of the predicted probabilities and join them with the original dataframe which has the 'truth' (dta_bayes is the original df)
## Using the probabiilites and the true responses, I calculate the AUROC 
library(pROC)
roc_random<-roc(prob_data$disease_binary,prob_data$prob,ci=TRUE,boot.n=100,print.auc=TRUE,plot=TRUE)
auc(roc_random)
I get an AUC of 0.998. This is quite surprising and bizarre to me.
Instead, I took a different approach. I split the data into training and testing - randomly sampled 80% of the subjects and took all of their data (so 80% of the groups were sampled), and left 20% as a hold out test.
I repeated the above, instead fitting the model on the 80% data.
m_train <- stan_glmer(disease_binary~screen_binary+ (1|chart_number_2),data=train,family="binomial",cores=4)
## Using the m_train model to now predict on the hold out test set 
test_preds <- posterior_linpred(m_train,newdata = test,transform = TRUE)%>%tbl_df  ## test set probabilities 
## Using these test set probabilities - 
## take the median of the posterior predictions
median_preds<-test_preds%>%
  map_dfr(.,~median(.x))
### Map the probabilities to the test df 
test_prob<-median_preds%>%
  gather(.,key=id,value="prob")%>%
   mutate(id=as.numeric(id))%>%
   inner_join(.,test,by="id")       ### The test set probabilites are now mapped to the test data (with the true responses)
AUC
test_roc<-roc(test_prob$disease_binary,test_prob$prob,ci=TRUE,boot.n=100,print.auc=TRUE,plot=TRUE)
auc(roc_random)
The result is 0.603 (0.544-0.662) - quite a drastic drop from 0.998.
Alright, so we maybe overfit our model with so many random effects - how about we do the same train-test split but this time, ignore the correlation and treat each individual visit as an independent response?
train_naive <- glm(disease_binary~screen_binary,data=train,family="binomial")
prob_naive <- predict(train_naive,newdata = test,type="response")
prob_naive <- as.data.frame(prob_naive)
test_prob <- prob_naive%>%
  tbl_df %>%
  mutate(id=seq(1,nrow(.),by=1)) %>%
  inner_join(.,test_prob,by="id")
test_roc_naive<-roc(test_prob$disease_binary,test_prob$prob_naive,ci=TRUE,boot.n=100,print.auc=TRUE,plot=TRUE)
auc(test_roc_naive)
I get the same exact AUC - 0.603 (0.544-0.662).
The naive and the random effect model produce exactly the same estimate, which makes me wonder that the random effect model is essentially using the same coefficient(s) as the naive model.
What am I missing here?
Adding a made up df here -
screen_binary<-rbinom(1000,1,0.5)
diseas_binary<-rbinom(1000,1,0.05)
chart_number_2<-seq(1,150)
df<-as.data.frame(cbind(screen_binary,diseas_binary,chart_number_2))%>%
  tbl_df%>%
  arrange(.,chart_number_2)