How correlation is computed in rstanarm output

I have a question about how the correlation between varying slopes and intercepts in computed in the output that is displayed when you call print() on your fitted model object in rstanarm.

The posterior samples matrix displays a sample for each of the following parameters: intercept variance, slope variance, intercept-slope covariance.

It seems to me that the correct way to get an overall point estimate for the correlation is to compute the correlation for each posterior sample (e.g. compute covariance / (sqrt(intercept variance) * sqrt(slope variance) row-wise on the posterior samples matrix), then to take those 4,000 sampled correlations and compute their mean to get your point estimate.

However, I’ve tried doing it this way, and it does not match the output that rstanarm gives when you call print on your model object. Rather the value that rstanarm outputs is equal to what you’d get if you first found the mean of each of the three parameters across the 4,000 samples (i.e. computing a single point estimate for each of intercept variance, slope variance, and intercept-slope covariance), and then took those three point estimates and computed the correlation.

In other words, it seems to me that the correct way should be to compute the correlation on each posterior sample first and then summarize with the mean, but rstanarm appears to be doing it in the reverse order, first summarizing each parameter with it’s mean, and then computing the correlation.

Attached is a documented attempt to compute the correlations in these different ways. Correlations in rstanarm.pdf (98.6 KB)

  • Operating System: mac OS 10.5.3
  • rstanarm Version: 2.19.2
1 Like

I am going to put your pdf inline here so it’s easier for folks to look at.

Import the data (using the radon data from Gelman and Hill, 2007):

d_radon <- read_csv("radon.csv") %>%
select(county, floor, log.activity, uranium) %>%
mutate(log.uranium = log(uranium)) %>%
select(-uranium)

Fit the model and extract posterior samples:

m <- stan_glmer (log.activity ~ log.uranium * floor + (floor | county), data = d_rado
n)

m_samples <- as.data.frame(m)
View the rstanarm’s parameter estimates

print(m)

First, calculate the correlation for each posterior sample. Second, find the mean of the 4,000 correlations:

m_samples %>%
rename(intercept_variance = `Sigma[county:(Intercept),(Intercept)]`,
covariance = `Sigma[county:floor,(Intercept)]`,
slope_variance = `Sigma[county:floor,floor]`) %>%
mutate(intercept_sd = sqrt(intercept_variance),
slope_sd = sqrt(slope_variance),
r = covariance / (slope_sd * intercept_sd)) %>%
summarise(r_point_estimate = mean(r))

First, calculate the mean for each of the three parameters (covariance and the two variances) across all 4,000 samples.

m_samples %>%
rename(intercept_variance = `Sigma[county:(Intercept),(Intercept)]`,
covariance = `Sigma[county:floor,(Intercept)]`,
slope_variance = `Sigma[county:floor,floor]`) %>%
summarise_all(.funs = mean) %>%
mutate(intercept_sd = sqrt(intercept_variance),
slope_sd = sqrt(slope_variance),
r_point_estimate = covariance / (slope_sd * intercept_sd)) %>%
select(r_point_estimate)
1 Like