Hi, I am slowly self-learning Bayesian regression modelling using brms and Stan in R, and would be grateful for advise about whether I am specifying and interpreting my models correctly.
Say for example I have a cohort of adults with an eye disease that can affect one or both eyes. Surgeons operate on the eye(s), and after 6 months, the eye is either cured or not cured. We have data from eight surgeons working at two hospitals. We also think that covariates such as age, sex, eye disease severity, the presence of diabetes (and severity thereof, as measured by HBA1c), and the numbers of years of experience of the surgeon are all likely to be important factors in whether the eye disease is cured by the operation.
Thus, diseased eyes are nested within patients, who are nested within surgeons, and who are nested within hospitals. It seems to me that a Bayesian hierarchical regression model would be a good choice for analysing these data.
We want to ask two questions of the data:
- Are diseased eyes among diabetic adults with more advanced diabetes less likely to be cured than among non-diabetic adults, after adjusting for the effect of other eye disease, clinical, surgeon and hospital variables?
- Does eye cure among diabetic adults with more advanced diabetes occur equally in each hospital and for each operating surgeon?
My questions that I am looking for help with:
- Have I set up the hierarchical model structure correctly? If not, very grateful for advice about where I have gone wrong.
- How to include a term for diabetes severity (hba1c) within the model, noting that only a fraction of participants will have diabetes?
- Do I need to consider adding additional correlation matrix priors for random effects?
- Any further suggestions that might help me refine this and similar future models as I learn Bayesian hierarchical modelling?
Here are some made-up data
library(tidyverse)
library(brms)
set.seed(1234)
patients <- tibble(id = seq(1:300),
#Number of diseased eyes per patient
diseased_eyes = sample(c(1,2),300, replace=TRUE),
#Age of patient
age = round(runif(18,65,n=300)),
#Sex of patient
sex = sample(c("Male", "Female"), 300, replace=TRUE, prob=c(0.8,0.2)),
#Diabetes status of patient
diabetes = sample(c("Diabetes", "No diabetes", "Unknown"), 300,
replace=TRUE,
prob = c(0.3,0.6,0.1)),
#severity of diabetes (HBA1c)
hba1c = case_when(diabetes == "Diabetes" ~ floor(runif(300,0,10)),
diabetes == "No diabetes" ~ NA_real_,
diabetes == "Unknown" ~ NA_real_))
#Surgeons at Hospital 1
surgeon1 <- tibble(hospital = rep("Hospital 1", 200)) %>%
mutate(surgeon = sample(c("Mr A", "Mr B", "Mr C", "Mr D"), 200, replace=TRUE)) %>%
arrange(surgeon) %>%
#Years of experience of surgeons at Hospital 1
mutate(years_experience = rep(c(10,16,25,8), c(50,50,50,50)))
#Surgeons at Hospital 2
surgeon2 <- tibble(hospital = rep("Hospital 2", 100)) %>%
mutate(surgeon = sample(c("Mrs E", "Mr F", "Mrs G", "Prof H"), 100, replace=TRUE)) %>%
arrange(surgeon) %>%
#Years of experience of surgeons at Hospital 2
mutate(years_experience = rep(c(4,17,12,33), c(25,25,25,25)))
#Bind surgeons together, and generate a patient ID for merge
surgeons <- bind_rows(surgeon1, surgeon2) %>%
mutate(id= sample.int(300,300, replace=FALSE))
#Merge patients to surgeons
patients <- left_join(patients, surgeons, by=c("id"))
#Expand dataframe, to give one row per diseased eye
df <- patients %>%
uncount(diseased_eyes) %>%
group_by(id) %>%
mutate(diseased_eye = row_number()) %>%
ungroup()
#Add in severity of eye disease, and outcome variable (cured)
df <- df %>%
#Severity of eye disease (1: least severe, 4: most severe)
mutate(disease_grade = sample(c(1,2,3,4), 449, replace=TRUE,
prob = (c(0.50,0.20,0.20,0.10)))) %>%
#Was eye cured?
mutate(cured = sample(c("0","1"), 449, replace = TRUE,
prob= c(0.3, 0.7)))
#Convert some variables to factors for modelling
df$sex <- factor(df$sex)
df$diabetes <- factor(df$diabetes)
df$hospital <- factor(df$hospital)
df$surgeon <- factor(df$surgeon)
df$disease_grade <- factor(df$disease_grade)
Created on 2018-10-03 by the reprex
package (v0.2.0).
Now we try a model.
mb <- brm(data = df, family = bernoulli,
cured ~ 1 +
(1 | hospital) +
(years_experience | surgeon) +
(sex + age + diabetes | id) +
(id | diseased_eye) +
disease_grade + age + sex + diabetes + years_experience + surgeon,
prior = c(set_prior("normal(0, 10)", class = "b"),
set_prior("cauchy(0, 5)", class = "sd")),
chains = 1, iter = 1000, warmup = 200,
control = list(adapt_delta = 0.95))
summary(mb)
There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.95 may help.
See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup Family: bernoulli
Links: mu = logit
Formula: cured ~ 1 + (1 | hospital) + (years_experience | surgeon) + (sex + age + diabetes | id) + (id | diseased_eye) + disease_grade + age + sex + diabetes + years_experience + surgeon
Data: df (Number of observations: 449)
Samples: 1 chains, each with iter = 1000; warmup = 200; thin = 1;
total post-warmup samples = 800
Group-Level Effects:
~diseased_eye (Number of levels: 2)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 1.97 3.53 0.01 10.76 262 1.00
sd(id) 0.02 0.09 0.00 0.14 157 1.00
cor(Intercept,id) -0.04 0.62 -0.99 0.94 97 1.01
~hospital (Number of levels: 2)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 4.80 6.55 0.08 19.27 140 1.00
~id (Number of levels: 300)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample
sd(Intercept) 0.87 0.68 0.01 2.51 74
sd(sexMale) 1.05 0.71 0.03 2.84 69
sd(age) 0.02 0.02 0.00 0.07 17
sd(diabetesNodiabetes) 0.98 0.63 0.05 2.29 41
sd(diabetesUnknown) 2.44 1.69 0.13 6.02 23
cor(Intercept,sexMale) -0.07 0.43 -0.81 0.76 63
cor(Intercept,age) -0.16 0.41 -0.82 0.67 52
cor(sexMale,age) -0.10 0.44 -0.87 0.70 50
cor(Intercept,diabetesNodiabetes) -0.18 0.39 -0.80 0.60 101
cor(sexMale,diabetesNodiabetes) -0.13 0.40 -0.83 0.69 46
cor(age,diabetesNodiabetes) 0.00 0.43 -0.80 0.74 53
cor(Intercept,diabetesUnknown) 0.04 0.40 -0.73 0.77 102
cor(sexMale,diabetesUnknown) 0.04 0.36 -0.65 0.71 107
cor(age,diabetesUnknown) -0.02 0.40 -0.79 0.71 93
cor(diabetesNodiabetes,diabetesUnknown) 0.04 0.42 -0.75 0.82 87
Rhat
sd(Intercept) 1.00
sd(sexMale) 1.00
sd(age) 1.01
sd(diabetesNodiabetes) 1.01
sd(diabetesUnknown) 1.05
cor(Intercept,sexMale) 1.01
cor(Intercept,age) 1.01
cor(sexMale,age) 1.02
cor(Intercept,diabetesNodiabetes) 1.00
cor(sexMale,diabetesNodiabetes) 1.05
cor(age,diabetesNodiabetes) 1.00
cor(Intercept,diabetesUnknown) 1.00
cor(sexMale,diabetesUnknown) 1.00
cor(age,diabetesUnknown) 1.00
cor(diabetesNodiabetes,diabetesUnknown) 1.02
~surgeon (Number of levels: 8)
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sd(Intercept) 2.58 2.02 0.12 7.43 161 1.00
sd(years_experience) 0.21 0.13 0.03 0.54 178 1.01
cor(Intercept,years_experience) -0.12 0.62 -0.96 0.95 80 1.00
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept 1.18 6.04 -8.46 16.20 70 1.00
disease_grade2 -0.30 0.40 -1.18 0.44 221 1.00
disease_grade3 -0.41 0.41 -1.26 0.34 246 1.00
disease_grade4 0.98 0.60 -0.12 2.30 429 1.00
age 0.00 0.01 -0.02 0.03 263 1.00
sexMale 0.03 0.44 -0.80 1.00 269 1.01
diabetesNodiabetes 0.28 0.35 -0.42 0.99 247 1.00
diabetesUnknown 1.48 1.32 -0.32 4.68 71 1.03
years_experience -0.05 0.12 -0.32 0.19 153 1.04
surgeonMrB -0.59 3.20 -7.87 5.08 120 1.00
surgeonMrC 0.48 2.98 -5.91 5.92 128 1.00
surgeonMrD 1.11 3.48 -6.34 8.00 116 1.02
surgeonMrF -0.26 4.88 -9.92 8.69 75 1.00
surgeonMrsE -0.37 5.14 -10.96 9.91 192 1.00
surgeonMrsG 1.98 5.42 -9.50 12.77 86 1.01
surgeonProfH 0.67 6.29 -12.09 13.03 130 1.03
Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
is a crude measure of effective sample size, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Created on 2018-10-03 by the reprex
package (v0.2.0).
- Operating System: MacOS 10.13.6
- brms Version: 2.4.0