Variability around a prediction after fitting a random intercept model

Can I ask how to get variability around a prediction after fitting a random intercept model in stan_glmer please? I have fit the following model where my event is 0/1 and x1 and x2 are continuous covariates:

 model <- stan_glmer(binary event ~ x1 + x2 +(1 | group), family="binomial"

I attach an example – but essentially my goal is to plot the predicted probability for a new individual (who doesn’t belong to an observed group) and therefore I marginalise over the random effects distribution, for different values of x2 whilst holding x1 at its mean value, say. I can plot this but without variability. I can bootstrap my mcmc sample of 4000 draws but I’m not sure this is adding the right amount of variability. So my question is how/can I get variability on my prediction plot or is it just not possible/needed?

Any help would be greatly appreciated.

#generate some data as an example: 4 groups of people, 20 people in each group each with covariates x1 and x2
groupn <- 15 # sample size for new study
group1 <- data.frame(group=“1”,
id=1:15, # id’s
x1=seq(40, 70, length.out = 15),
x2=seq(0, 10, length.out = 15),
Event=c(0,1,0,0,1, 0,0,1,0,0, 0,0,0,0,0))
group2 <- data.frame(group=“2”,
id=16:30, # id’s
x1=seq(30, 60, length.out = 15),
x2=seq(5, 15, length.out = 15),
Event=c(1,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0))
group3 <- data.frame(group=“3”,
id=31:45, # id’s
x1=seq(20, 40, length.out = 15),
x2=seq(0, 5, length.out = 15),
Event=c(0,0,0,0,1, 0,0,0,0,0, 0,0,0,0,0))
group4 <- data.frame(group=“4”,
id=46:60, # id’s
x1=seq(30, 50, length.out = 15),
x2=seq(2, 9, length.out = 15),
Event=c(0,1,0,0,1, 0,0,0,0,0, 0,0,0,0,0))
dat<-rbind(group1, group2, group3, group4)

set.seed(123)
model <- stan_glmer(Event ~ x1 + x2 +(1 | group), data=dat, family=“binomial”,
prior_intercept=cauchy(0, 10),
prior=cauchy(0, 2.5),
prior_covariance=decov(regularization=1, concentration=1, shape=1, scale=1),
cores=4)
loomod <- loo(model)
print(model, digits=3)
print(loomod)

#7/60=0.116
#Estimates:
#Median MAD_SD
#(Intercept) -0.874 2.067
#x1 0.013 0.062
#x2 -0.403 0.232

#Error terms:
#Groups Name Std.Dev.
#group (Intercept) 1.111
#Num. levels: group 4

#Sample avg. posterior predictive
#distribution of y (X = xbar):
#Median MAD_SD
#mean_PPD 0.117 0.049

summary(dat$x1)
summary(dat$x2)
new_n<-1

out<-prop<-NULL
x<-seq(0,15,0.01)
new_n<-1
for(i in (x)){
nd <- data.frame(group=“NEW group”, # to identify prediction over new study
id=1:new_n, # patient id’s
Event=rep(0, new_n)) # column is needed for posterior_predict,
#actual values are ignored
#vary x2 while keeping x1 fixed at the mean
nd$x2 <- c(i)
nd$x1<- c(42.5)

#Number of simulations from the predictive distribution
draws<-4000

ppred1 <- posterior_predict(model, newdata=nd, re.form=~(1|group), draws=draws , seed=123)

##Look at the individual probabilities - columns
sumcol <- data.frame(iter=1:1, count=apply(ppred1, 2, sum) )
totalcol<- filter(sumcol)%>% mutate(prop=count/draws)
prop <- cbind(prop, totalcol$prop)
out <- cbind(out, as.vector(ppred1))
}
probc<-t(prop)
x2var<-cbind(x, probc)
x2var<-data.frame(x2var)

#plot
ggplot(x2var, aes(x=x, y=V2)) + geom_point(size=1, color=‘red’)+
scale_y_continuous(breaks = round(seq(0, 1.2, by = 0.2),1)) +
scale_x_continuous(breaks = round(seq(0, 15, by = 1),1))+
labs(x=“X2”, y=“Probability of an event”)+
coord_cartesian(ylim=c(0, 1), xlim=c(0, 15))

Hi @gemma, here’s one of many ways to make the kind of plot that I think you’re asking about (hopefully I’ve understood your question correctly):

# assuming you already have the objects 'model' and 'dat' that you created above

library(dplyr)
library(reshape2)
library(ggplot2)

# get matrix of predicted probabilities (iterations x data points matrix)
# and then melt it to a data frame
nd <-
  data.frame(
    x2 = seq(0, 15, 0.1), # let x2 vary
    x1 = mean(dat$x1), # fix x1 at mean
    id = 1,
    group = "NEW group"
  )
probs <- posterior_linpred(model, newdata = nd, transform = TRUE)
probs_df <- melt(probs, value.name = "prob", varnames = c("iter", "x2id"))
probs_df$x2 <- nd$x2[probs_df$x2id] # add x2 values to probs_df

# compute summaries and then plot
probs_df %>%
  group_by(x2) %>%
  summarise(
    p = mean(prob), 
    lo = p - sd(prob),
    hi = p + sd(prob)
  ) %>%
  ggplot(aes(x = x2, y = p, ymin = lo, ymax = hi)) + 
  geom_ribbon() + 
  geom_line(color = "red")

This would give you a plot like this:

example_plot

where the red line is the average (over the draws) of the predicted probability and the black ribbon gives ± 1SD bounds (you could also do this with quantiles instead of mean ± SD).

Hope that helps!

1 Like

Can this be used on a stanfit object or is it restricted to stanreg objects? I tried to run this on a stanfit object and got:

Error in UseMethod("posterior_linpred") : no applicable method for 'posterior_linpred' applied to an object of class "stanfit"

Is there a workaround or particular part of the stanfit that I need to be pointing to?

It does not work for a stanfit object. The posterior_linpred, posterior_predict, etc. S3 methods are defined in packages such as rstanarm and brms that know what the model is on the R side. That is not the case for arbitrary Stan models that are estimated via stan or sampling.

Bummer. So the solution is to re-run the stan model on the new data frame (i.e. one with x1 constant and x2 varying) and get the posterior predictions for p from the stanfit object?

(sorry if this is obvious, I am having to do this for a number of variables and models and was hoping to avoid re-running each of the stan models)

You don’t need to re-run a model. You just need do take the posterior draws for the slope, multiply them by the new predictors, add the posterior draws for the intercept and you have the posterior distribution of the linear predictor. Apply the inverse link function, and now you have the posterior distribution of the conditional mean. Draw from the distribution you use for the likelihood with that conditional mean, and you have draws from the posterior predictive distribution. That is the same logic used by posterior_linpred(transform = FALSE), posterior_linpred(transform = TRUE), and posterior_predict() respectively.

Ah hah!! Thank you. That’s very helpful