# Accounting for serial correlation of residuals in panel data model

I’m wondering whether, and if so, how, one ought to account for possible serial correlation in residuals in a Bayesian model for panel data. In classical panel data models, people often cluster standard errors by entity in order to make them appropriately conservative. Is there an analogous step one should or could take in Stan? Or maybe this question is misguided because in a Bayesian approach one just doesn’t worry about serially correlated residuals in the same way?

Below is my model. My data are locales (l) and time periods (t), where the panel is perfectly balanced. The model is doing IV with a joint distribution. I’m instrumenting an endogenous earnings growth variable that I then use to model community college enrollment growth. I’m also including county and period effects.

I’m sure this model is not specified in the most succinct way possible, but it’s running, so I’ve been content with it.

``````data {
int<lower=1> L; // number of locales
int<lower=1> T; // number of time periods
vector[T] E[L]; // array of vectors of enrollment growth
vector[T] W[L]; // array of vectors of labor earnings growth
vector[T] Z[L]; // array of vectors of Bartik instrument
int<lower=1,upper=T> X[L, T]; // array of period indices; each vector should be ordered, 1 to T
real<lower=0> sigma_alpha_E_sd; // SD on prior for sigma_alpha_E
real<lower=0> sigma_pi_E_sd; // SD on prior for sigma_pi_E
real<lower=0> mu_beta_E_sd; // SD on prior for mu_beta_E
real<lower=0> sigma_beta_E_sd; // SD on prior for sigma_beta_E
real<lower=0> phi_E_sd; // SD on prior for phi_E
real<lower=0> sigma_alpha_W_sd; // SD on prior for sigma_alpha_W
real<lower=0> sigma_pi_W_sd; // SD on prior for sigma_pi_W
real<lower=0> mu_beta_W_sd; // SD on prior for mu_beta_W
real<lower=0> sigma_beta_W_sd; // SD on prior for sigma_beta_W
real<lower=0> phi_W_sd;
real<lower=0> theta_sd; // SD on prior for vector theta
}

parameters {
vector[L] alpha_E_nc; // non-centered intercepts for enrollment growth
vector[T] pi_E_nc; // non-centered time period effects for enrollment growth
vector[L] beta_E_nc; // non-centered slope on labor earnings growth determining mean enrollment growth
real<lower=0> sigma_alpha_E; // SD parameter for distribution of alpha_E
real<lower=0> sigma_pi_E; // SD parameter for distribution of pi_E
real mu_beta_E; // mean parameter for distribution of beta_E
real<lower=0> sigma_beta_E; // SD parameter for distribution of beta_E
real phi_E; // mean parameter for county and period effects on enrollment growth

vector[L] alpha_W_nc; // non-centered intercepts for labor earnings growth
vector[T] pi_W_nc; // non-centered time period effects for labor earnings growth
vector[L] beta_W_nc; // non-centered slope on Bartik instrument modeling mean labor earnings growth
real<lower=0> sigma_alpha_W; // SD parameter for distribution of alpha_W
real<lower=0> sigma_pi_W; // SD parameter for distribution of pi_W
real mu_beta_W; // mean parameter for distribution of beta_W
real<lower=0> sigma_beta_W; // SD parameter for distribution of beta_W
real phi_W; // mean parameter for county and period effects on labor earnings growth

vector<lower=0>[2] theta; // vector of conditional SD parameters for E, W
corr_matrix[2] Rho; // correlation matrix
}

transformed parameters {
vector[L] alpha_E; // intercepts for enrollment growth
vector[T] pi_E; // vector of time period effects for enrollment growth
vector[L] beta_E; // slope on labor earnings growth modeling mean enrollment growth
vector[L] alpha_W; // intercepts for labor earnings growth
vector[T] pi_W; // vector of time period effects for labor earnings growth
vector[L] beta_W; // slope on Bartik instrument modeling mean labor earnings growth

alpha_E = alpha_E_nc*sigma_alpha_E; // mean 0 with presence of phi_E
pi_E = pi_E_nc*sigma_pi_E; // mean 0 with presence of phi_E
beta_E = mu_beta_E + beta_E_nc*sigma_beta_E;

alpha_W = alpha_W_nc*sigma_alpha_W; // mean 0 with presence of phi_W
pi_W = pi_W_nc*sigma_pi_W; // mean 0 with presence of phi_W
beta_W = mu_beta_W + beta_W_nc*sigma_beta_W;
}

model {
vector[L*T] mu_E;
vector[L*T] mu_W;
vector[2] Y[L*T];
vector[2] Mu[L*T];

// this is a little hacky, oh well for now
for (l in 1:L) {
for (t in 1:T) {
mu_E[(l-1)*T + t] = phi_E + alpha_E[l] + pi_E[X[l][t]] + beta_E[l]*W[l][t];
mu_W[(l-1)*T + t] = phi_W + alpha_W[l] + pi_W[X[l][t]] + beta_W[l]*Z[l][t];
}
}

// this is a little hacky, oh well for now
for (l in 1:L) {
for (t in 1:T) {
Y[(l-1)*T + t] = [E[l][t], W[l][t]]';
Mu[(l-1)*T + t] = [mu_E[(l-1)*T + t], mu_W[(l-1)*T + t]]';
}
}

alpha_E_nc ~ normal(0, 1);
pi_E_nc ~ normal(0, 1);
beta_E_nc ~ normal(0, 1);
sigma_alpha_E ~ normal(0, sigma_alpha_E_sd);
sigma_pi_E ~ normal(0, sigma_pi_E_sd);
mu_beta_E ~ normal(0, mu_beta_E_sd);
sigma_beta_E ~ normal(0, sigma_beta_E_sd);
phi_E ~ normal(0, phi_E_sd);

alpha_W_nc ~ normal(0, 1);
pi_W_nc ~ normal(0, 1);
beta_W_nc ~ normal(0, 1);
sigma_alpha_W ~ normal(0, sigma_alpha_W_sd);
sigma_pi_W ~ normal(0, sigma_pi_W_sd);
mu_beta_W ~ normal(0, mu_beta_W_sd);
sigma_beta_W ~ normal(0, sigma_beta_W_sd);
phi_W ~ normal(0, phi_W_sd);

theta ~ normal(0, theta_sd);
Rho ~ lkj_corr(2);

}
``````

Hi, @brijo:

A Bayesian approach is going to be similar to a frequentist approach when we’re talking about observables. If there is structure in the residuals for a regression, it helps to model it. A typical approach is to use a time-series model for the residuals if they’re autocorrelated (i.e., the error at one time step is related to the error at previous time steps).

Presumably what you mean here is that the errors in `Y` are autocorrelated? If you want to model that, you need to figure out what the time series is to model the correlation. A vector autoregressive (VAR) prior would work in place of the `multi_normal` currently specified. Or you could go even further and adopt some kind of stochastic volatility model where the covariance would also vary, but those can be very hard to fit.

I’m not sure what “IV with a joint distribution” means. What’s “IV”?

With the part you document as “a little hacky”, it might be easier to define `Y` as an `L x T` matrix and then fill that and use `to_vector` to compute to a sequence (though you have to be careful of order, because our to_vector is column-major for matrices and row-major for arrays, so use whichever one’s appropriate).

You can code a non-centered parameterization directly now without all the intermediate transformed parameter fiddling. For example, this encodes a non-centered parameterization for `beta_E` whereas the version without the affine transform implemented a centered parameterization.

``````real<offset = mu_beta_E, multiplier = sigma_beta_E> beta_E;
...
beta_E ~ normal(mu_beta_E, sigma_beta_E);
``````

It’s much more efficient to use a Cholesky factor formulation of the multivariate normal. You’re using Rho for the correlation matrix and `theta` for the scale, so you can just declare `L_Rho` as `cholesky_factor_corr` type and then just multiply it once by `theta` rather than using the quadratic form, and then use `multi_normal_cholesky`. We explain how to do this pretty early on in the User’s Guide chapter on regression.

Thank you, this is very helpful. I need to think more about what that model for the residuals would be.

Apologies for the excessive abbreviation. By “IV”, I just meant instrumental variables. Here, I do the IV with a joint normal distribution, where everything is estimated at once, rather than the 2SLS kind of deal.

I did not know about the multivariate normal implementation that uses Cholesky factors or about the new way to do non-centered parameterization. Very useful. Thank you.