Prior for logistic regression when having knowledge on outcome

strengejacke · July 27, 2017, 8:19pm

Hi,
first, I wanted to thank all the contributors to Stan and rstan(arm) and related packages for this great work! I’m quite new to Bayesian analysis, but for my current research I’d like to apply Bayesian methods. I have a binary outcome (event of “falls” - yes/no - of people with dementia during their hospital stay) and various independent variables, and now I’m struggling with which priors I should use.

I’m using rstanarm to fit my model. If I know that in general the prevalence of falling for dementia patients in hospitals is about 30% - how do I use this knowledge to define my priors? So best prior knowledge I have is about my outcome variable. My guess is that I would use the default (=normal) priors for the independent variables, but I would specify a different prior-intercept in the stan_glm()-call… Is this correct? And if so, how does a “30% prevalence rate” translate into a prior-distribution for my logistic regression model?

Best
Daniel

bgoodri · July 28, 2017, 1:07am

Presuming you know that from past data rather than the y you have now, then you can use the fact that the rstanarm runs with the covariates centered. So, you can interpret the intercept as the log-odds of the outcome conditional on all covariates are at their means, which is not the same thing but often not so different than the log-odds of the outcome unconditionally. So, you can set prior_intercept accordingly, perhaps with prior_intercept = normal(location = qlogis(0.3), scale = 0.5, autoscale = FALSE).

strengejacke · July 28, 2017, 6:42am

Thanks! The prior knowledge comes from previous research and literature reviews, it’s not based on the data I collected. But now I think I’m much more clear about the choice of prior distributions for logistic regressions, and that I have to “think” on the log-scale rather than in probabilities or odds ratios.

Two follow-up questions:

You said “runs with the covariates centered” - is centering automatically done by rstanarm or would I need to center my covariates before fitting the model?
If I also have prior knowledge for certain covariates, I would use rstan instead of rstanarm, do define my model in a way that I can specify prior distribution for each term separately?

bgoodri · July 28, 2017, 2:45pm

Centering is done internally by rstanarm and in the output, the intercept is internally shifted back so that it corresponds to the expected log-odds of the outcome when the original covariates are zero. You can specify a vector of locations and scales (and degrees of freedom or other hyperparameters) for the priors in rstanarm to convey different priors within the same parametric family. In the brm function in the brms R package, you can specify different prior families, and of course you can do that (or even multivariate priors) if you use RStan directly.

strengejacke · July 28, 2017, 3:23pm

Thanks a lot, that did really help! For my studies, results between stan_glm and glm are not completely different, however, I feel more comfortable with the Bayesian way than frequentist way of inference.

Topic		Replies	Views
Mixing weakly informative and informative priors Modeling	1	560	August 1, 2017
How do I implement bayesian logistic regression with multivariate Normal prior using rstanarm? rstanarm	3	957	February 24, 2019
Stan_betareg intercept coefficient unaffected by prior rstanarm	15	2048	August 26, 2017
Help understanding prior specification Modeling specification	5	857	January 11, 2020
Checking if my train of thought in determining a prior makes sense General	2	488	January 16, 2021

Prior for logistic regression when having knowledge on outcome

Related topics