I have been working on a fairly complicated hierarchical model in Stan and everything was going fine until I introduced a random slope. Upon the introduction of the random slope, I started observing R-Hat > 1.01 and low effective sample sizes.
In an attempt to figure out what was going on, I started reducing the complexity of my simulation and the corresponding model. However, now I’ve wound up all the way back at a basic random intercept and slope simulation, which I’m now trying to fit in rstanarm, and I’m still observing the same issues.
Below, I’ve included my simulation code, code for running it, code for fitting the corresponding lme4
and rstanarm
models, and my results. And this file hlm_sim.R (2.3 KB) contains all of the code.
I am hoping someone might be able to provide some insight into why I’m observing less than optimal sampling and mixing.
For observation y_{tj}, where t \in 1, ..., T=50 drawn from region j \in 1, ..., J=50, the DGP is:
y_{tj} \sim N(\mu_{tj}, \sigma_e)
\mu_{tj} = \beta_{0j} + x_{tj}*\beta_{1j}
\beta_{0j} = \gamma_{00} + u_{0j}
\beta_{1j} = \gamma_{10} + u_{1j}
\left( \begin{array} 1 u_{0j} \\ u_{1j} \end{array}\right) \sim \left( \begin{array} 1 \left( \begin{array} 1 0 \\ 0 \end{array}\right), \left( \begin{array} 1 \sigma_{u0j}^2 \\ \rho\sigma_{u0j}\sigma_{u1j} \end{array} \begin{array} 1 \rho\sigma_{u0j}\sigma_{u1j} \\ \sigma_{u0j}^2 \end{array} \right) \end{array}\right)
where \gamma_{00} = .5, \gamma_{10} = 1 \sigma_{u0j} = .2, \sigma_{u1j} = .2, \rho = .3, \sigma_e = .2
Finally, x_{ij} \sim N(0, 1)
This is the function I am using for the simulation:
library(dplyr)
simulate_panel <- function(J, T,
gamma_00, gamma_10,
sigma_u0j,
sigma_u1j,
cor_u0j_u1j,
sigma_e,
seed){
set.seed(seed)
# Calculate covariance for RE
cov_u0j_u1j = cor_u0j_u1j * sigma_u0j * sigma_u1j
# Generate Sigma
Sigma = matrix(c(sigma_u0j^2,
cov_u0j_u1j,
cov_u0j_u1j,
sigma_u1j^2), ncol=2)
# Draw random effects from multivariate normal distribution
u = MASS::mvrnorm(n = J, mu = c(0, 0), Sigma = Sigma)
simulated_panel <- data_frame(region = 1:J, # Specify constant region values
gamma_00,
gamma_10,
u_0j = u[,1],
u_1j = u[,2],
sigma_e) %>%
do({ # Replicate region values T times
row_rep <- function(x, T) {
x[rep(1:nrow(x), times = T),]}
as.data.frame(row_rep(., T))
}) %>%
arrange(region) %>%
mutate(x = rnorm(n(), 0, 1), # Sample x
mu = (gamma_00 + u_0j + x * (gamma_10 + u_1j)), # Calculate linear predictor
e = rnorm(n(), 0, sigma_e), # Draw residual
y = mu + e) # Calculate y
return(simulated_panel)
}
And, this generates a simulation from the DGP:
simulated_panel <- simulate_panel(J = 50,
T = 50,
gamma_00 = .5,
gamma_10 = 1,
sigma_u0j = .2,
sigma_u1j = .2,
cor_u0j_u1j = .3,
sigma_e = .2,
seed = 123)
Using lme4
, I can recover the parameters specified in the DGP:
lmer.m <- lmer(y ~ 1 + x + (1 + x | region), data = simulated_panel)
summary(lmer.m)
Random effects:
Groups Name Variance Std.Dev. Corr
region (Intercept) 0.03379 0.1838
x 0.03272 0.1809 0.34
Residual 0.04016 0.2004
Number of obs: 2500, groups: region, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.48580 0.02631 18.46
x 1.01698 0.02592 39.24
However, when I fit this model using rstanarm
, ESS is low and R-Hat is high.
stan_lmer.m <- stan_lmer(y ~ 1 + x + (1 + x | region),
data = simulated_panel, chains = 4, cores = 4)
summary(stan_lmer.m, pars = c('(Intercept)', 'x', 'sigma', 'Sigma[region:(Intercept),(Intercept)]',
'Sigma[region:x,(Intercept)]', 'Sigma[region:x,x]'), digits=2)
Estimates:
mean sd 2.5% 97.5%
(Intercept) 0.48 0.03 0.43 0.54
x 1.02 0.02 0.97 1.06
sigma 0.20 0.00 0.20 0.21
Sigma[region:(Intercept),(Intercept)] 0.04 0.01 0.02 0.05
Sigma[region:x,(Intercept)] 0.01 0.01 0.00 0.02
Sigma[region:x,x] 0.03 0.01 0.02 0.05
Diagnostics:
mcse Rhat n_eff
(Intercept) 0.00 1.01 115
x 0.00 1.02 144
sigma 0.00 1.00 3157
Sigma[region:(Intercept),(Intercept)] 0.00 1.01 333
Sigma[region:x,(Intercept)] 0.00 1.01 272
Sigma[region:x,x] 0.00 1.01 356
I know that I can crank up the number of iterations and increase warmup; however, I have fit much more complicated models than this and not observed these kinds of issues. Of course, if that’s the only solution, that I can live with that. However, my full model is more complex and I am concerned about how many iterations I’ll need to estimate the full model.
I initially assumed that something was wrong with my simulation (and still haven’t ruled that out); however, I’ve been beating my head against this for a while and haven’t found a bug. And it’s worth noting that the model is recovering the parameters from the DGP. It’s just that the sampling and mixing seems to be less than optimal.
I should also note that I did recently update rstan
and rstanarm
, so maybe I’m observing the consequences of changes to how Rhat and n_eff is calculated?
UPDATE:
I just ran the same rstanarm
model for 10,000 iterations and observed the following diagnostics:
Diagnostics:
mcse Rhat n_eff
(Intercept) 0.00 1.01 645
x 0.00 1.01 857
sigma 0.00 1.00 15314
Sigma[region:(Intercept),(Intercept)] 0.00 1.00 1408
Sigma[region:x,(Intercept)] 0.00 1.01 1177
Sigma[region:x,x] 0.00 1.00 1641
I should also clarify that my concern is based driven by my beliefs that:
- Sampling for this model should be more efficient
- Rhat > 1.01 shouldn’t just be disregarded.
Maybe these beliefs are unfounded and/or too extreme?