# Variability around a prediction after fitting a random intercept model

Can I ask how to get variability around a prediction after fitting a random intercept model in stan_glmer please? I have fit the following model where my event is 0/1 and x1 and x2 are continuous covariates:

`````` model <- stan_glmer(binary event ~ x1 + x2 +(1 | group), family="binomial"
``````

I attach an example – but essentially my goal is to plot the predicted probability for a new individual (who doesn’t belong to an observed group) and therefore I marginalise over the random effects distribution, for different values of x2 whilst holding x1 at its mean value, say. I can plot this but without variability. I can bootstrap my mcmc sample of 4000 draws but I’m not sure this is adding the right amount of variability. So my question is how/can I get variability on my prediction plot or is it just not possible/needed?

Any help would be greatly appreciated.

#generate some data as an example: 4 groups of people, 20 people in each group each with covariates x1 and x2
groupn <- 15 # sample size for new study
group1 <- data.frame(group=“1”,
id=1:15, # id’s
x1=seq(40, 70, length.out = 15),
x2=seq(0, 10, length.out = 15),
Event=c(0,1,0,0,1, 0,0,1,0,0, 0,0,0,0,0))
group2 <- data.frame(group=“2”,
id=16:30, # id’s
x1=seq(30, 60, length.out = 15),
x2=seq(5, 15, length.out = 15),
Event=c(1,0,0,0,0, 0,0,0,0,0, 0,0,0,0,0))
group3 <- data.frame(group=“3”,
id=31:45, # id’s
x1=seq(20, 40, length.out = 15),
x2=seq(0, 5, length.out = 15),
Event=c(0,0,0,0,1, 0,0,0,0,0, 0,0,0,0,0))
group4 <- data.frame(group=“4”,
id=46:60, # id’s
x1=seq(30, 50, length.out = 15),
x2=seq(2, 9, length.out = 15),
Event=c(0,1,0,0,1, 0,0,0,0,0, 0,0,0,0,0))
dat<-rbind(group1, group2, group3, group4)

set.seed(123)
model <- stan_glmer(Event ~ x1 + x2 +(1 | group), data=dat, family=“binomial”,
prior_intercept=cauchy(0, 10),
prior=cauchy(0, 2.5),
prior_covariance=decov(regularization=1, concentration=1, shape=1, scale=1),
cores=4)
loomod <- loo(model)
print(model, digits=3)
print(loomod)

#7/60=0.116
#Estimates:
#(Intercept) -0.874 2.067
#x1 0.013 0.062
#x2 -0.403 0.232

#Error terms:
#Groups Name Std.Dev.
#group (Intercept) 1.111
#Num. levels: group 4

#Sample avg. posterior predictive
#distribution of y (X = xbar):
#mean_PPD 0.117 0.049

summary(dat\$x1)
summary(dat\$x2)
new_n<-1

out<-prop<-NULL
x<-seq(0,15,0.01)
new_n<-1
for(i in (x)){
nd <- data.frame(group=“NEW group”, # to identify prediction over new study
id=1:new_n, # patient id’s
Event=rep(0, new_n)) # column is needed for posterior_predict,
#actual values are ignored
#vary x2 while keeping x1 fixed at the mean
nd\$x2 <- c(i)
nd\$x1<- c(42.5)

#Number of simulations from the predictive distribution
draws<-4000

ppred1 <- posterior_predict(model, newdata=nd, re.form=~(1|group), draws=draws , seed=123)

##Look at the individual probabilities - columns
sumcol <- data.frame(iter=1:1, count=apply(ppred1, 2, sum) )
totalcol<- filter(sumcol)%>% mutate(prop=count/draws)
prop <- cbind(prop, totalcol\$prop)
out <- cbind(out, as.vector(ppred1))
}
probc<-t(prop)
x2var<-cbind(x, probc)
x2var<-data.frame(x2var)

#plot
ggplot(x2var, aes(x=x, y=V2)) + geom_point(size=1, color=‘red’)+
scale_y_continuous(breaks = round(seq(0, 1.2, by = 0.2),1)) +
scale_x_continuous(breaks = round(seq(0, 15, by = 1),1))+
labs(x=“X2”, y=“Probability of an event”)+
coord_cartesian(ylim=c(0, 1), xlim=c(0, 15))

Hi @gemma, here’s one of many ways to make the kind of plot that I think you’re asking about (hopefully I’ve understood your question correctly):

``````# assuming you already have the objects 'model' and 'dat' that you created above

library(dplyr)
library(reshape2)
library(ggplot2)

# get matrix of predicted probabilities (iterations x data points matrix)
# and then melt it to a data frame
nd <-
data.frame(
x2 = seq(0, 15, 0.1), # let x2 vary
x1 = mean(dat\$x1), # fix x1 at mean
id = 1,
group = "NEW group"
)
probs <- posterior_linpred(model, newdata = nd, transform = TRUE)
probs_df <- melt(probs, value.name = "prob", varnames = c("iter", "x2id"))
probs_df\$x2 <- nd\$x2[probs_df\$x2id] # add x2 values to probs_df

# compute summaries and then plot
probs_df %>%
group_by(x2) %>%
summarise(
p = mean(prob),
lo = p - sd(prob),
hi = p + sd(prob)
) %>%
ggplot(aes(x = x2, y = p, ymin = lo, ymax = hi)) +
geom_ribbon() +
geom_line(color = "red")
``````

This would give you a plot like this:

where the red line is the average (over the draws) of the predicted probability and the black ribbon gives ± 1SD bounds (you could also do this with quantiles instead of mean ± SD).

Hope that helps!

1 Like

Can this be used on a `stanfit` object or is it restricted to `stanreg` objects? I tried to run this on a `stanfit` object and got:

`Error in UseMethod("posterior_linpred") : no applicable method for 'posterior_linpred' applied to an object of class "stanfit"`

Is there a workaround or particular part of the `stanfit` that I need to be pointing to?

It does not work for a `stanfit` object. The `posterior_linpred`, `posterior_predict`, etc. S3 methods are defined in packages such as rstanarm and brms that know what the model is on the R side. That is not the case for arbitrary Stan models that are estimated via `stan` or `sampling`.

Bummer. So the solution is to re-run the `stan` model on the new data frame (i.e. one with `x1` constant and `x2` varying) and get the posterior predictions for `p` from the stanfit object?

(sorry if this is obvious, I am having to do this for a number of variables and models and was hoping to avoid re-running each of the `stan` models)

You don’t need to re-run a model. You just need do take the posterior draws for the slope, multiply them by the new predictors, add the posterior draws for the intercept and you have the posterior distribution of the linear predictor. Apply the inverse link function, and now you have the posterior distribution of the conditional mean. Draw from the distribution you use for the likelihood with that conditional mean, and you have draws from the posterior predictive distribution. That is the same logic used by `posterior_linpred(transform = FALSE)`, `posterior_linpred(transform = TRUE)`, and `posterior_predict()` respectively.

Ah hah!! Thank you. That’s very helpful