How correlation is computed in rstanarm output

dbernett · February 24, 2020, 5:22pm

I have a question about how the correlation between varying slopes and intercepts in computed in the output that is displayed when you call print() on your fitted model object in rstanarm.

The posterior samples matrix displays a sample for each of the following parameters: intercept variance, slope variance, intercept-slope covariance.

It seems to me that the correct way to get an overall point estimate for the correlation is to compute the correlation for each posterior sample (e.g. compute covariance / (sqrt(intercept variance) * sqrt(slope variance) row-wise on the posterior samples matrix), then to take those 4,000 sampled correlations and compute their mean to get your point estimate.

However, I’ve tried doing it this way, and it does not match the output that rstanarm gives when you call print on your model object. Rather the value that rstanarm outputs is equal to what you’d get if you first found the mean of each of the three parameters across the 4,000 samples (i.e. computing a single point estimate for each of intercept variance, slope variance, and intercept-slope covariance), and then took those three point estimates and computed the correlation.

In other words, it seems to me that the correct way should be to compute the correlation on each posterior sample first and then summarize with the mean, but rstanarm appears to be doing it in the reverse order, first summarizing each parameter with it’s mean, and then computing the correlation.

Attached is a documented attempt to compute the correlations in these different ways. Correlations in rstanarm.pdf (98.6 KB)

Operating System: mac OS 10.5.3
rstanarm Version: 2.19.2

Ara_Winter · February 24, 2020, 7:05pm

I am going to put your pdf inline here so it’s easier for folks to look at.

Import the data (using the radon data from Gelman and Hill, 2007):

d_radon <- read_csv("radon.csv") %>%
select(county, floor, log.activity, uranium) %>%
mutate(log.uranium = log(uranium)) %>%
select(-uranium)

Fit the model and extract posterior samples:

m <- stan_glmer (log.activity ~ log.uranium * floor + (floor | county), data = d_rado
n)

m_samples <- as.data.frame(m)
View the rstanarm’s parameter estimates

print(m)

First, calculate the correlation for each posterior sample. Second, find the mean of the 4,000 correlations:

m_samples %>%
rename(intercept_variance = `Sigma[county:(Intercept),(Intercept)]`,
covariance = `Sigma[county:floor,(Intercept)]`,
slope_variance = `Sigma[county:floor,floor]`) %>%
mutate(intercept_sd = sqrt(intercept_variance),
slope_sd = sqrt(slope_variance),
r = covariance / (slope_sd * intercept_sd)) %>%
summarise(r_point_estimate = mean(r))

First, calculate the mean for each of the three parameters (covariance and the two variances) across all 4,000 samples.

m_samples %>%
rename(intercept_variance = `Sigma[county:(Intercept),(Intercept)]`,
covariance = `Sigma[county:floor,(Intercept)]`,
slope_variance = `Sigma[county:floor,floor]`) %>%
summarise_all(.funs = mean) %>%
mutate(intercept_sd = sqrt(intercept_variance),
slope_sd = sqrt(slope_variance),
r_point_estimate = covariance / (slope_sd * intercept_sd)) %>%
select(r_point_estimate)

Topic		Replies	Views
Rstanarm: extracting variance components rstanarm	14	4688	May 18, 2021
Question about hierarchical effects rstanarm	2	773	February 15, 2018
Intercept only model rstanarm rstan	7	1656	August 26, 2020
Clarifying how to calculate predicted response values from fit model rstanarm specification , posterior-predictive	4	196	June 24, 2024
What are the rstanarm::stan_glmer Error Term standard deviations reporting? Modeling rstanarm	1	484	March 16, 2021

How correlation is computed in rstanarm output

Related topics