How to obtain P value from stan_glm() in rstanarm


#1

Hi!
I used the data “birthwt” in MASS package to constructed a model with stan_lm function in rstanarm package. I can get coefficients and Bayesian uncertainty intervals using coef() and posterior_interval() functions. In the models constructed with lm(), we can get P values from the summary() function. However, I can’t find any function to get P values for the variables in the model with stan_lm.

My codes are as follows:
data(“birthwt”, package = “MASS”)
birthwt$race <- factor(birthwt$race, levels = 1:3,
labels = c(“white”, “black”, “other”))
birthwt$bwt <- birthwt$bwt / 1000
birthwt$low <- factor(birthwt$low, levels = 0:1, labels = c(“no”, “yes”))
post1 <- stan_lm(-bwt ~ smoke + age + race + ptl + ht + ftv,
data = birthwt, prior = R2(0.5))
summary(post1)
post1$coefficients
posterior_interval(post1, prob = 0.95)


#2

That is intentional. With a Bayesian analysis, you cannot get the probability of seeing a more extreme estimate in a different dataset from the same population conditional on the parameter taking a given value. But that is not of interest anyway. What probability are you interested in computing?


#3

Hi bgoodri !
I want to investigate the association between the variable “bwt” and “smoke”," age" and " race". When I used lm() to constructed linear regression model, I can get the P Value, Odds Ratio and 95% confidential interval for “smoke”," age" and " race". So I can estimate their associations with “bwt” .

The code is :

post <- lm( -bwt ~ smoke + age + race + ptl + ht + ftv,birthwt)
summary(post)

Call:
lm(formula = -bwt ~ smoke + age + race + ptl + ht + ftv, data = birthwt)

Residuals:
Min 1Q Median 3Q Max
-1.55499 -0.51185 0.00166 0.47458 2.36292

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.276581 0.259169 -12.643 < 2e-16 ***
smoke 0.384130 0.111736 3.438 0.000728 ***
age -0.003435 0.009946 -0.345 0.730235
raceblack 0.416305 0.155529 2.677 0.008118 **
raceother 0.424521 0.118937 3.569 0.000458 ***
ptl 0.164074 0.104035 1.577 0.116518
ht 0.387411 0.205559 1.885 0.061078 .
ftv -0.003865 0.048631 -0.079 0.936745

exp(coef(post))
(Intercept) smoke age raceblack raceother ptl ht ftv
0.03775714 1.46833705 0.99657107 1.51634783 1.52885859 1.17830156 1.47316174 0.99614264
exp(confint(post))
2.5 % 97.5 %
(Intercept) 0.02264172 0.0629635
smoke 1.17781151 1.8305252
age 0.97720397 1.0163220
raceblack 1.11563364 2.0609909
raceother 1.20905559 1.9332515
ptl 0.95963359 1.4467965
ht 0.98197443 2.2100428
ftv 0.90499920 1.0964652

I can get Odds Ratio and 95% confidential interval with stan_lm() model. But I can’t get P value.


#4

I think the closest thing would be posterior_interval(post), but that is not a p-value or confidence interval. In fact, it is more useful since it is an interval that contains the the true parameter (under the model) with the probability that you specify. But it is not wise to construct a 95% or 99% credible interval because the endpoints are not estimated with very much precision with the default number of iterations.


#5

Hi bgoodri !
Thank you very much for your reply! Part of my problems are solved. Thanks!

Best,
Lisa