Understanding output from 1.13 Multivariate priors for hierarchical models example

I am trying to make sure I understand the outputs for the model presented in 1.13. I have created the model in an R package (Files · 2.0 · UMESC / quant-ecology / fishStan · GitLab) that I mentioned in a previous post fishStan v 2.0 - Publicity - The Stan Forums (mc-stan.org). To make sure I understand the outputs, I decided to compare the model to the radon model from Chapter 13 of Gelman and Hill (2006).

I have included the model outputs and also a comparison to lme4::lmer like in the book. As a side note, lmer appears to have changed slightly and now gives slightly different outputs compared to the book’s model fits.

My questions are about how the outputs compare, and specifically, the correlations between coefficients. Also, I apologize for using “fixed-” and “random-effects”, but even Gelman used to use those terms in the old book!

Load data and packages as well as fit the models:

## load and format
library(tidyverse)
library(fishStan)
library(lme4)
data(radon, package = "rstanarm")
options(mc.cores = parallel::detectCores())

radon_2 <-
  radon %>%
  as_tibble() %>%
  mutate(jj = as.numeric(county))

## fit the lmer model
M3 <- lmer(log_radon  ~ floor + (1 + floor | jj), radon_2) 
summary(M3)

## fit the fishStan model
radon_county <-
  radon_2 %>%
  group_by(county, jj) %>%
  summarize(mean = mean(log_radon),
            n = n(),
            floor = mean(floor), .groups = "drop")

## Takes ~0.5 minutes to run on a Dell 2021 laptop
radom_fit_2 <-
  h_lm_stan(
    y = radon_2 %>% pull(log_radon),
    jj = radon_2 %>% pull(jj),
    ind_dat = radon,
    ind_formula = ~ 1 + floor,
    group_dat = radon_county,
    group_formula = ~ 1,
    n_proj = 30,
    x_projection = NULL,
    gamma_prior = NULL,
    gamma_sd_prior = 1,
    iter = 2000
    )

Compare results

fixed-effects

The fixed-effects are correct.

> ## "fixed-effects"
> fixef(M3)
(Intercept)       floor 
   1.494522   -0.654808 
> print(radom_fit_2$model_par, pars = c("gamma"), probs = 0.5)
Inference for Stan model: h_lm.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

            mean se_mean   sd   50% n_eff Rhat
gamma[1,1]  1.49       0 0.05  1.49  1631    1
gamma[1,2] -0.65       0 0.08 -0.65  2629    1

Samples were drawn using NUTS(diag_e) at Wed Oct 13 13:24:22 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Random-effects

The random-effects also seem to make sense.

> print(radom_fit_2$model_par, pars = c("beta[1,1]", "beta[1,2]",
+                                       "beta[2,1]", "beta[2,2]"), probs = 0.5)
Inference for Stan model: h_lm.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

           mean se_mean   sd   50% n_eff Rhat
beta[1,1]  1.19       0 0.26  1.20  4621    1
beta[1,2] -0.55       0 0.30 -0.59  3681    1
beta[2,1]  0.99       0 0.10  0.99  6091    1
beta[2,2] -0.71       0 0.25 -0.69  4802    1

Samples were drawn using NUTS(diag_e) at Wed Oct 13 13:24:22 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
> head(coef(M3)$jj, 2)
  (Intercept)      floor
1   1.1817875 -0.5122226
2   0.9840312 -0.6957749

model “sigma”

This also seems to make sense.

> print(radom_fit_2$model_par, pars = c("sigma"), prob = 0.5)
Inference for Stan model: h_lm.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

      mean se_mean   sd  50% n_eff Rhat
sigma 0.72       0 0.02 0.72  3422    1

Samples were drawn using NUTS(diag_e) at Wed Oct 13 13:24:22 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
> sigma(M3)
[1] 0.716817

Variance/covariance structure

I am confused here. What outputs from Stan compare to the correlation between regression coefficients?

Is this the L_Omega and tau?

Should I be grabbing something else?

Specifically, what from the stan model is the Corr of -0.371?

> print(radom_fit_2$model_par, pars = c("L_Omega", "tau"), prob = 0.5)
Inference for Stan model: h_lm.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

              mean se_mean   sd   50% n_eff Rhat
L_Omega[1,1]  1.00     NaN 0.00  1.00   NaN  NaN
L_Omega[1,2]  0.00     NaN 0.00  0.00   NaN  NaN
L_Omega[2,1] -0.22    0.01 0.31 -0.25  2807 1.00
L_Omega[2,2]  0.92    0.00 0.10  0.96  2034 1.00
tau[1]        0.34    0.00 0.05  0.33  1353 1.00
tau[2]        0.27    0.01 0.14  0.28   608 1.01

Samples were drawn using NUTS(diag_e) at Wed Oct 13 13:24:22 2021.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).
> VarCorr(M3)
 Groups   Name        Std.Dev. Corr  
 jj       (Intercept) 0.33709        
          floor       0.32848  -0.371
 Residual             0.71682    
1 Like

I have figured out the answer to my own question.

The end of 10.3 Fitting a Gaussian process | Stan User’s Guide (mc-stan.org) includes code that shows the relationship between Omega and L_Omega:

generated quantities {
  matrix[D, D] Omega;
  Omega = L_Omega * L_Omega';
}

I will edit my answer to include the numerical results once my computer is free.

Would the Stan team please consider adding in an explanation of back transformation to section 1.13 for people who are not familiar with Cholesky factors.

2 Likes

Hey Richard! You could also use the multiply_lower_tri_self_transpose function in this case as described in 25 Correlation Matrix Distributions | Stan Functions Reference

corr_matrix[D] Omega = multiply_lower_tri_self_transpose(L_Omega);