Extremely fundamental (possibly stupid) question on relationship between predicted probabilities and predictors

dilsherdhillon · March 7, 2019, 3:20pm

I’m fitting a mixed effect model in rstanarm of the form

m1<-stan_glmer(y~gender+country+ rcs(age) + rcs(AnotherContinousVar) + (1|subjectID),family="binomial")

I’d like to plot the relationship between the restricted cubic splines of each continous var VS the predicted probabilities. But I’m not sure I’m doing it correctly. Here’s what I do

First, create predictions on the same data using posterior_linpred(m1,transform=TRUE) which gives me predicted probabilites.

Next, I take the mean of these probabilities for each data point and store them in the original df so as to match the original datapoints

preds<-posterior_linpred(m1,transform=TRUE)
pred<-colMeans(preds)

dta%>%
  mutate(probability=pred)%>%
  ggplot(.,aes(Age,probability))+geom_smooth(method = "loess")

What I’m confused about is

Is this the probability given Age only or is this the Pr given Age AND all other factors ? If so, what level are those factors at? Are these marginal effects or conditional?
Is loess smoothing the appropriate way of visualizing the relationship between the continous var and the predicted probabilities?
Is there another way that I can visualize the relationship and the uncertainty of the continous var vs probabilities but also, how do I interpret them.
Should I use something like r expand.grid() to create all possible combinations and then get predicted probabilities on those data points instead?

Another thing I’m still not clear about is, subjects are treated as random effect with a separate intercept for each (mutliple response by each subject). Do the predicted probabilities take into account the variance explained by these random effects?

bgoodri · March 7, 2019, 5:12pm

Conditional on all other factors

I think that is fine.

I would probably do

X <- model.matrix(m1)

and pull out the columns that are related to the cubic spline in question. Then do

betas <- as.matrix(m1)

and pull out the columns of coefficients that are related to the cubic spline question. Matrix multiply those two things together to get the distribution for how the log-odds changes holding everything else constant and then ggplot that however you want.

I don’t think so if what you are really try to show is the functional form of the spline.

bgoodri · March 7, 2019, 5:29pm

For concreteness, here is an example with rstanarm::stan_lm:

fit_rcs <- stan_lm(mpg ~ rms::rcs(wt) + qsec + am, data = mtcars, 
                   prior = R2(0.75))
X <- model.matrix(fit_rcs)
X_rcs <- X[ , startsWith(colnames(X), "rms")]
betas <- as.matrix(fit_rcs)
betas_rcs <- betas[ , startsWith(colnames(betas), "rms")]
eta_rcs <- rstanarm:::linear_predictor.matrix(betas_rcs, X_rcs)
colnames(eta_rcs) <- mtcars$wt
bayesplot::mcmc_intervals(eta_rcs) + ggplot2::coord_flip()

@jonah mcmc_intervals needs a better way to do continuous horizontal axes for situations like this

dilsherdhillon · March 7, 2019, 9:08pm

Thanks Ben! This works perfectly.

jonah · March 15, 2019, 9:14pm

How’s this?

library("bayesplot")
ribbon_plot <- ppc_ribbon(
  y = colMeans(eta_rcs), 
  yrep = eta_rcs, 
  x = mtcars$wt, 
  prob = 0.5, 
  prob_outer = 0.8
)
intervals_plot <- ppc_intervals(
  y = colMeans(eta_rcs), 
  yrep = eta_rcs, 
  x = mtcars$wt, 
  prob = 0.5, 
  prob_outer = 0.8
)

ribbon_plot + xlab("Weight") + legend_none()
intervals_plot + xlab("Weight") + legend_none()

bgoodri · March 15, 2019, 10:07pm

That is better than what I wrote, but there should be some function that does this without (ab)using names like yrep when the thing is neither y nor a replication.

jonah · March 15, 2019, 10:14pm

Definitely, this is just a short term hack. Should we also add a pars argument to rstanarm::posterior_linpred() so that it can just return eta_rcs (in this case) directly?

bgoodri · March 16, 2019, 1:40am

I’m not a big fan of that options because this isn’t a linear predictor either. Maybe a generalization of plot_nonlinear that works beyond just stan_gamm4 output?

Topic		Replies	Views
Does rstanarm::plot_nonlinear produce marginal or conditional plots? rstanarm	2	540	April 6, 2020
Visualizing predictive uncertainty rstanarm techniques , posterior-predictive , visualisation	3	638	April 1, 2021
bayes_R2 and conditioning on random effects Modeling rstanarm	20	3631	April 7, 2021
Clarifying how to calculate predicted response values from fit model rstanarm specification , posterior-predictive	4	201	June 24, 2024
Tidybayes, posterior prediction, and cmdstanr General techniques	4	482	May 19, 2024

Extremely fundamental (possibly stupid) question on relationship between predicted probabilities and predictors

Related topics