I am having trouble running what I thought was a simple Bayesian repeated measures logistic regression. I can get the model to run but the diagnostics are terrible. I was hoping I could get some tips on how to improve the model
Here is the data. There are two groups, not trained and trained, each measured twice, at baseline (pre-intervention) and follow-up (post-intervention). The outcome is a binary variable, incorrect (=0) vs correct (=1). I’m interested in whether prior training results in greater odds of being correct at follow-up.
# Here is the data
d <- data.frame(id = 1:94,
training = c(0,1,1,0,0,0,0,1,0,1,1,1,0,1,0,0,1,1,0,0,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,1,0,0,1,0,0,0,0,1,0,1),
baseline = c(1,1,1,1,1,0,1,1,1,1,0,1,0,1,1,1,1,1,0,1,1,1,0,1,0,1,0,0,0,1,1,1,1,0,0,1,1,1,1,0,1,1,0,1,0,1,1,1,0,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,0,1,1,0,1,0,1,0,1,0,1,0,1,0,0,1,0,1,0,1,0,1,1,1,1,1,1,1,1,0),
followup = c(1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
Now let’s look at the proportions correct in each cell - baseline/not trained, baseline/trained, followup/not trained, followup/trained.
library(dplyr)
d %>% group_by(training) %>%
summarise(countBase = sum(baseline),
countFU = sum(followup),
tot = n()) %>%
mutate(percBase = round(100*countBase/tot,2), percFU = round(100*countFU/tot,2)) %>%
transform(training = ifelse(training == 0, "not trained", "trained"))
# output
# training countBase countFU tot percBase percFU
# 1 not trained 40 62 65 61.54 95.38
# 2 trained 26 28 29 89.66 96.55
Looks like training means greater odds of being correct at baseline before the intervention, but that this difference disappears at follow-up (unsurprising because the rate correct at baseline in the trained group was so high that they really had nowhere to go at follow-up).
Let’s change to a long dataframe for the repeated measures logistic regression
longDF <- d %>% dplyr::select(-one_of("changed")) %>%
gather(key = time, value = correct, baseline, followup) %>%
mutate(id = factor(id),
training = factor(ifelse(training == 0, "not trained", "trained"), levels = c("not trained", "trained")),
correct = factor(ifelse(correct == 0, "incorrect", "correct"), levels = c("incorrect", "correct")),
time = factor(time))
Now for the model. I am using a bernoulli logit model which estimates a grand mean (parameter ‘a’) and then estimates main effects of both training, time and the training x time interaction. I also have an effect estimated for each subject. I have specified hyper-priors on the standard deviation of the deflections of the main and interaction effects from the grand mean. For the priors on these deflection parameters I chose normal distributions truncated at 0 with sd = 2 (because this is on a log-odds scale, no idea if this was the right thing to do or not).
library(rstan)
# put the the data into a list
dList <- list(N = nrow(longDF),
nID = nlevels(longDF$id),
nTrain = nlevels(longDF$training),
nTime = nlevels(longDF$time),
sIndex = as.integer(longDF$id),
trainingIndex = as.integer(longDF$training),
timeIndex = as.integer(longDF$time),
correct = as.integer(longDF$correct)-1)
# now for the model string
mString <- "
data{
int<lower=1> N;
int<lower=1> nID;
int<lower=1> nTrain;
int<lower=1> nTime;
int<lower=1,upper=nID> sIndex [N];
int<lower=1,upper=nTrain> trainingIndex [N];
int<lower=1,upper=nTime> timeIndex [N];
int<lower=0,upper=1> correct [N];
}
parameters{
real a;
vector[nTrain] bTraining; // main effect of training
vector[nTime] bTime; // main effect of time
vector[nTrain] bTxT[nTime]; // vector of vectors for the interaction
vector[nID] bSubj; // subject-level effects
real<lower=0> sigma_train; // sd of training main-effect deflections
real<lower=0> sigma_time; // sd of time main-effect deflections
real<lower=0> sigma_TxT; // sd of interaction deflections
real<lower=0> sigma_s; // sd of subject deflections
}
model{
vector[N] p;
//hyper-priors
sigma_train ~ normal(0,2);
sigma_time ~ normal(0,2);
sigma_TxT ~ normal(0,2);
sigma_s ~ normal(0,2);
// priors
a ~ normal(0,2);
bSubj ~ normal(0, sigma_s);
bTraining ~ normal(0, sigma_train);
bTime ~ normal(0, sigma_time);
for (j in 1:nTrain) { bTxT[j] ~ normal(0, sigma_TxT); }
//likelihood
for (i in 1:N) {
correct ~ bernoulli_logit(a + bTraining[trainingIndex[i]] + bTime[timeIndex[i]] + bTxT[trainingIndex[i]][timeIndex[i]] + bSubj[sIndex[i]]);
}
}
"
# run the model
twoByTwoLogit <- stan(model_code = mString,
data = dList,
iter = 2e3,
warmup = 1e3,
cores = 3,
chains = 3)
Here is the print out of the model coefficients as well as chain diagnostics
print(twoByTwoLogit, pars = c("a", "bTraining", "bTime", "bTxT"), probs = c(0.025, 0.975))
# output
Inference for Stan model: 4d5f505643ec1a63680be017d6d7bcf4.
3 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=3000.
mean se_mean sd 2.5% 97.5% n_eff Rhat
a 1.43 0.07 0.79 -0.65 2.80 133 1.02
bTraining[1] 0.15 0.06 0.69 -0.84 2.47 141 1.01
bTraining[2] 0.14 0.04 0.62 -0.85 1.95 201 1.01
bTime[1] 0.03 0.06 0.52 -1.13 1.27 79 1.07
bTime[2] -0.01 0.07 0.51 -1.20 1.22 56 1.08
bTxT[1,1] -0.03 0.02 0.20 -0.61 0.32 102 1.03
bTxT[1,2] 0.01 0.02 0.21 -0.49 0.38 90 1.03
bTxT[2,1] -0.01 0.01 0.18 -0.47 0.42 187 1.01
bTxT[2,2] 0.03 0.02 0.19 -0.32 0.45 133 1.00
Samples were drawn using NUTS(diag_e) at Tue Feb 25 10:48:35 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
As you can see, the effective sample sizes are really low and the Rhats are terrible.
Can anybody tell me what I’m doing wrong and what I could do better?