Extremely fundamental (possibly stupid) question on relationship between predicted probabilities and predictors

I’m fitting a mixed effect model in rstanarm of the form

m1<-stan_glmer(y~gender+country+ rcs(age) + rcs(AnotherContinousVar) + (1|subjectID),family="binomial")

I’d like to plot the relationship between the restricted cubic splines of each continous var VS the predicted probabilities. But I’m not sure I’m doing it correctly. Here’s what I do

First, create predictions on the same data using posterior_linpred(m1,transform=TRUE) which gives me predicted probabilites.

Next, I take the mean of these probabilities for each data point and store them in the original df so as to match the original datapoints

preds<-posterior_linpred(m1,transform=TRUE)
pred<-colMeans(preds)

dta%>%
  mutate(probability=pred)%>%
  ggplot(.,aes(Age,probability))+geom_smooth(method = "loess")

What I’m confused about is

  1. Is this the probability given Age only or is this the Pr given Age AND all other factors ? If so, what level are those factors at? Are these marginal effects or conditional?
  2. Is loess smoothing the appropriate way of visualizing the relationship between the continous var and the predicted probabilities?
  3. Is there another way that I can visualize the relationship and the uncertainty of the continous var vs probabilities but also, how do I interpret them.
  4. Should I use something like r expand.grid() to create all possible combinations and then get predicted probabilities on those data points instead?

Another thing I’m still not clear about is, subjects are treated as random effect with a separate intercept for each (mutliple response by each subject). Do the predicted probabilities take into account the variance explained by these random effects?

Conditional on all other factors

I think that is fine.

I would probably do

X <- model.matrix(m1)

and pull out the columns that are related to the cubic spline in question. Then do

betas <- as.matrix(m1)

and pull out the columns of coefficients that are related to the cubic spline question. Matrix multiply those two things together to get the distribution for how the log-odds changes holding everything else constant and then ggplot that however you want.

I don’t think so if what you are really try to show is the functional form of the spline.

For concreteness, here is an example with rstanarm::stan_lm:

fit_rcs <- stan_lm(mpg ~ rms::rcs(wt) + qsec + am, data = mtcars, 
                   prior = R2(0.75))
X <- model.matrix(fit_rcs)
X_rcs <- X[ , startsWith(colnames(X), "rms")]
betas <- as.matrix(fit_rcs)
betas_rcs <- betas[ , startsWith(colnames(betas), "rms")]
eta_rcs <- rstanarm:::linear_predictor.matrix(betas_rcs, X_rcs)
colnames(eta_rcs) <- mtcars$wt
bayesplot::mcmc_intervals(eta_rcs) + ggplot2::coord_flip()

@jonah mcmc_intervals needs a better way to do continuous horizontal axes for situations like this

Thanks Ben! This works perfectly.

How’s this?

library("bayesplot")
ribbon_plot <- ppc_ribbon(
  y = colMeans(eta_rcs), 
  yrep = eta_rcs, 
  x = mtcars$wt, 
  prob = 0.5, 
  prob_outer = 0.8
)
intervals_plot <- ppc_intervals(
  y = colMeans(eta_rcs), 
  yrep = eta_rcs, 
  x = mtcars$wt, 
  prob = 0.5, 
  prob_outer = 0.8
)

ribbon_plot + xlab("Weight") + legend_none()
intervals_plot + xlab("Weight") + legend_none()

That is better than what I wrote, but there should be some function that does this without (ab)using names like yrep when the thing is neither y nor a replication.

Definitely, this is just a short term hack. Should we also add a pars argument to rstanarm::posterior_linpred() so that it can just return eta_rcs (in this case) directly?

I’m not a big fan of that options because this isn’t a linear predictor either. Maybe a generalization of plot_nonlinear that works beyond just stan_gamm4 output?