Location of samples behind the coefficients

sgutreuter · May 7, 2021, 10:53pm

After fitting a negative binomial model containing two factor predictors and their interaction, I want to compute posterior summaries of reparameterized coefficients which give the relative risks for each combination of the two factors.

My guess was that the posterior samples for the coefficients in the linear predictor are in obj$stanfit@sim$sample[i], where obj is the stanreg object produced by stan_glm, and i indexes the chains. Those contain vectors named alpha[1], beta[1], …, beta[5]. So I collected those betas from the four chains in a dataframe pchains, and expected to replicate coef(obj) using lapply(pchains, quantile, probs = 0.5). To my surprise, the results do not match. I hope someone can tell me what I am doing wrong.

Thx!

andrjohns · May 8, 2021, 2:13am

To extract the posterior samples for coefficients, you can use the as.matrix or as.data.frame functions.

For example:

mtcars$mpg10 <- mtcars$mpg / 10
fit <- stan_glm(mpg10 ~ wt + cyl + am, data = mtcars,refresh=0) 

# Inspect posterior means
> summary(fit)[,1]
  (Intercept)            wt           cyl            am         sigma      mean_PPD log-posterior 
   3.93703268   -0.31331126   -0.14999492    0.01788247    0.27119846    2.00920654   -8.48375309 

# Extract coefficient posteriors
obj = as.data.frame(fit,pars=c("(Intercept)","wt","cyl","am"))

# Check posterior means match
> colMeans(obj)
(Intercept)          wt         cyl          am 
 3.93703268 -0.31331126 -0.14999492  0.01788247

sgutreuter · May 8, 2021, 6:33pm

Thank you Andrew. That solves my problem.

However, I am still puzzled by the content of fit$stanfit@sim$samples, which produces different results. Try, for example,

mtcars$mpg10 <- mtcars$mpg / 10
fit <- stan_glm(mpg10 ~ wt + cyl + am, data = mtcars,refresh=0)
summary(fit)[,1]

obj = as.data.frame(fit,pars=c("(Intercept)","wt","cyl","am"))
colMeans(obj)

## Get alpha[1], beta[1], ..., beta[3] from fit$stanfit@sim$samples and
## summarize
wtf <- data.frame(fit$stanfit@sim$samples)[ , 1:4]
colMeans(wtf)
lapply(wtf, quantile,  probs =  0.5)

The differences are small, but small differences always pique my curiosity. stanreg objects are rich and complex, and I have not found a complete description of their contents.

Again, thanks for the help!

jonah · May 8, 2021, 10:56pm

Yeah, the parameters in the underlying stanfit object (which comes from the RStan interface) are related to but don’t exactly correspond to the named parameters in the stanreg object. This is for various reasons including transformations that are done to improve efficiency, avoid numerical issues, etc (e.g., rescaling, centering, and various other things). The particulars depend on the model being fit. We don’t recommend dealing directly with the stanfit object for this reason, we just need to include it in the stanreg object so that internal functions can access it. Anything you need after fitting the model you should be able to get from the stanreg object itself or one of the various methods provided in the package. If you find yourself needing to use the stanfit object then we’ve mistakenly omitted some functionality from rstanarm that we should include (definitely let us know if this happens). But I think in this case you just need what @andrjohns mentioned.

jonah · May 8, 2021, 11:06pm

Actually, what I said is true, but in the particular example you gave (the mtcars simple linear regression) we can just compare the samples directly like this:

# for example check that chain 1 samples for "(Intercept)" are the same
all.equal(
  fit$stanfit@sim$samples[[1]]$`alpha[1]`, 
  as.array(fit, pars = "(Intercept)")[,1,]
) 

# check that chain 4 samples for "cyl" are the same
all.equal(
  fit$stanfit@sim$samples[[4]]$`beta[2]`, 
  as.array(fit, pars = "cyl")[,4,]
)

The structure of stanfit@sim is quite complicated, so it’s not really intended to be used other than by internal functions.

Topic		Replies	Views
Help drawing from the posterior after standardizing predictors and outputs Modeling	2	462	April 4, 2018
Mismatch when predicting inside and outside of STAN Modeling posterior-predictive	4	807	May 20, 2020
Prediction using point estimates in rstan Modeling	13	1658	October 22, 2017
Posterior_predict() for a specific value of predictor in rstanarm? rstanarm	6	666	April 17, 2018
How to know the exact predictors rstanarm priors	1	450	September 25, 2020

Location of samples behind the coefficients

Related topics