In the Data Analysis Using Regression and Multilevel/Hierarchical Models (ARM) book by Gelman & Hill, it discusses Five ways to write the same model in section 12.5.

The final example is a Large regression with correlated errors. It defines the covariance matrix as:

For any unit i: var_y+var_alpha

For any units i,k within the same group j: var_alpha

For any units i,k in different groups: cov(error_i, error_k)=0

What if instead, these are adjusted to:

For any unit i: var_y+var_alpha+var_base

For any units i,k within the same group j: var_alpha+var_base

For any units i,k in different groups: cov(error_i, error_k)=var_base

where var_base is some base level of correlation among the variables.

I would like to do this in Stan without having to write the explicit covariance matrix. I tried some naive attempts to adopt a model similar to the radon_no_pool.stan example model to account for it, but didn’t have much luck (divergences, tree depth issues, Bayesian fraction of information issues…the works). I suspect it’s because I was basically introducing a latent variable that couldn’t be identified properly.

So I suppose the big question is: am I wasting my time thinking about this?

I’m fitting a cross-sectional model for simplicity, but the full dataset is a panel. I was just curious if I can incorporate the correlation structure that is evident when looking at the full panel in the cross-section.