Rstanarm - for my own learning, trying to match credible interval to lm() confidence interval

Hi all,

I’m a relative beginner with regards to Bayesian inference. I’ve worked my way through Statistical Rethinking (a fantastic book), and I see it’s been translated into brms, which is great, but I’m really enjoying using rstanarm and am learning to use it as my next step. I like to tinker with things to see how they work, and rstanarm makes that so easy because it’s so fast. I can change things and re-run code in seconds. No compiling needed! Seems like rstanarm doesn’t get as much love as it should, but I’m certainly digging it.

One of the things I’m trying to do is figure out how to set everything up so that I can create credible intervals that coincide with confidence intervals. This is strictly as a learning tool, since I’m a firm believer in using informative priors if possible, and regularizing priors if not. My understanding is that getting Cred-I and Conf-I to coincide is possible for a Gaussian linear model using flat priors. Below is my first attempt. I’ve spent quite a few hours going down paths and learning a lot, but I’m stuck at this point. Any help would be appreciated.

The fundamental question is, why does the credible interval not match the confidence interval from lm()? Many more details within the code as comments.

# First, create the simple 5-point dataset with mean 0 and sd of 2

outcome <- c(-2,-2,0,2,2)
dat <-

# lm() --------------------------------------------------------------------

m.lm <- lm(outcome ~ 1, data=dat)
confint(m.lm, level = 0.95)

qt(p=0.025, df=4)
qt(p=0.025, df=4) * 2/sqrt(5)

# The SEM is 0.8944 for the lm() analysis. Matches my manual calculation of 0.8944272.
# The 95% confidence interval for the mean is -2.483328 2.483328

# rstanarm ----------------------------------------------------------------

# Ok, let's try this with stan_glm and flat priors


m.stan_glm <- stan_glm(
  formula = outcome ~ 1 ,
  family = gaussian ,
  prior_intercept = NULL ,
  prior_aux = NULL ,
  chains = 4 ,
  cores = 4 ,
  iter = 2e5 ,
  seed = 42 ,
  data = dat

print(m.stan_glm$stanfit, digits=8)

# sd of the intercept is 1.772181, far greater than 0.8944 from lm()
# The 95% interval is also far wider than for lm()
# -3.297692 to  +3.303978
# -2.483328 to  +2.483328 for lm()
# Is this because of sigma not being a point (without uncertainty)?  There's no
# uncertainty around sigma in the lm() analysis, right?  I'll try to fix that.

# What if I use a super-tight prior at 2 for sigma? As far as I can tell from my experimentation,
# the prior of a normal for sigma cuts off the left side at the mean, even if it's in postive
# territory. It's forced to be a half-normal, so I'll start it 1 standard deviation (sd=1e-6) below 2.

m.stan_glm2 <- stan_glm(
  formula = outcome ~ 1 ,
  family = gaussian ,
  prior_intercept = NULL ,
  prior_aux = normal(2-1e-6, 1e-6) ,
  chains = 4 ,
  cores = 4 ,
  iter = 2e5 ,
  seed = 42 ,
  data = dat

print(m.stan_glm2$stanfit, digits=8)

# sigma is now 2, and the Intercept sd is 0.892 (close to 0.8944)
# However, the width of the 95% intervals are far more _narrow_ than the lm() intervals
# -1.747336 to  +1.755980
# -2.483328 to  +2.483328 for lm()

posterior <- as.array(m.stan_glm2)
posterior_vector_intercept <- as.numeric(posterior[,1:4,1])
posterior_vector_sigma <- as.numeric(posterior[,1:4,2])
( posterior_intercept <- mean(posterior_vector_intercept) )
( posterior_sigma <- mean(posterior_vector_sigma) )
( posterior_intercept_sd <- sd(posterior_vector_intercept) )

xlim_min <- -5
xlim_max <- 5
hist(posterior_vector_intercept, breaks=200, freq=FALSE, xlim=c(xlim_min,xlim_max))
curve(dnorm(x, mean=posterior_intercept, sd=posterior_intercept_sd),
      from=xlim_min, to=xlim_max, col="red", add=TRUE)

# The histogram and the curve are on top of each other. So, the histogram is Gaussian.
# If the posterior_intercept_sd is 0.892, which is close to the lm() SEM of 0.8944,
# why are the uncertaity limits so different?
# -1.747    to  +1.756
# -2.483328 to  +2.483328 for lm()

#[1] -4.004504
#[1] 4.113033
# No severe outliers to mess with things.

# eof

Operating system: Windows 10
R version 4.0.2
rstanarm version 2.21.1
rstan version 2.21.2 (not used when using rstanarm?)

I suspect there are a couple of problems here:

  1. c(-2,-2,0,2,2) isn’t a very likely sample from N(0, 2). There’s much more weight in the tails than you’d expect. lm() just does a least squares fit and doesn’t worry about that.

  2. With only 5 data points the posterior distribution for a variance (or standard deviation) parameter will be heavily influenced by the prior.

See what happens if you set outcome like this:

outcome <- rnorm(1000, sd = 2)

I get

(-0.019, 0.108) from lm()
(-0.018, 0.110) from stan_glm()


Thanks for the reply. Been a while, but I’m reconsidering things again.

Right, I get that if I increase the sample size the differences wash out, but I’m more interested in figuring out where differences exist in method, which is brought to the foreground by using very small datasets. Using large datasets defeats the purpose of my learning exercise.

And, for example when you say “With only 5 data points the posterior distribution for a variance (or standard deviation) parameter will be heavily influenced by the prior.” - that’s why I was using flat priors, to try and eliminate any influence from priors. In fact, that’s the reason why my results were surprising…I expected to see the same results, but didn’t.

I think I’m beginning to have more insight, but still not sure why the above is the case. Perhaps I just have more questions that previously.

Basically, how would I have to set up the stan_glm call to actually have the posterior_interval results match confint?

Or is it impossible? Perhaps match confint.default instead? Is it a problem because confidence intervals obtained by what would essentially be a stan_glm approach to Maximum Likelihood won’t match confidence intervals obtained by lm() due to this?

For objects of class “lm” the direct formulae based on t values are used.

Those are the questions still on my mind.

outcome <- c(-2,-2,0,2,2)

That’s a very unlikely outcome for a normal distribution with a variance of 4 (standard deviation of 2). Here’s one example:

sort(rnorm(5, mean = 0, sd = 2))
[1] -2.3408569 -1.5308027 -0.5612416  0.5482218  0.5976292

stan_glm() assumes that the error is Gaussian.

Moreover, a flat prior is still a prior. And unless I’m mistaken, for a variance parameter, the flat prior will be on the unconstrained space for a standard deviation. @bgoodri can probably help more on that point than I can.

1 Like

That one is flat over the non-negative real numbers.

All that said, to get a posterior distribution (conditional on the data) of the coefficients that is multivariate Student t and thus identical to the estimated sampling distribution of the OLS estimator (across datasets with a fixed \mathbf{X}), you have to put a Jeffreys prior on the precision. The rstanarm package does not support that prior (which does not make a lot of sense anyway) and doesn’t need to because you can get this result analytically.


Ah, excellent. I went back and re-watched your Linear models with rstanarm (GR5065 2019-02-26) video, where you also cover this (though I didn’t comprehend before), and I think I finally get it. Sometimes I just need multiple explanations before I understand something. Thanks.