Accounting for serial correlation of residuals in panel data model

I’m wondering whether, and if so, how, one ought to account for possible serial correlation in residuals in a Bayesian model for panel data. In classical panel data models, people often cluster standard errors by entity in order to make them appropriately conservative. Is there an analogous step one should or could take in Stan? Or maybe this question is misguided because in a Bayesian approach one just doesn’t worry about serially correlated residuals in the same way?

Below is my model. My data are locales (l) and time periods (t), where the panel is perfectly balanced. The model is doing IV with a joint distribution. I’m instrumenting an endogenous earnings growth variable that I then use to model community college enrollment growth. I’m also including county and period effects.

I’m sure this model is not specified in the most succinct way possible, but it’s running, so I’ve been content with it.

data {
  int<lower=1> L; // number of locales
  int<lower=1> T; // number of time periods
  vector[T] E[L]; // array of vectors of enrollment growth
  vector[T] W[L]; // array of vectors of labor earnings growth
  vector[T] Z[L]; // array of vectors of Bartik instrument
  int<lower=1,upper=T> X[L, T]; // array of period indices; each vector should be ordered, 1 to T
  real<lower=0> sigma_alpha_E_sd; // SD on prior for sigma_alpha_E
  real<lower=0> sigma_pi_E_sd; // SD on prior for sigma_pi_E  
  real<lower=0> mu_beta_E_sd; // SD on prior for mu_beta_E
  real<lower=0> sigma_beta_E_sd; // SD on prior for sigma_beta_E
  real<lower=0> phi_E_sd; // SD on prior for phi_E
  real<lower=0> sigma_alpha_W_sd; // SD on prior for sigma_alpha_W
  real<lower=0> sigma_pi_W_sd; // SD on prior for sigma_pi_W  
  real<lower=0> mu_beta_W_sd; // SD on prior for mu_beta_W
  real<lower=0> sigma_beta_W_sd; // SD on prior for sigma_beta_W
  real<lower=0> phi_W_sd; 
  real<lower=0> theta_sd; // SD on prior for vector theta
}

parameters {
  vector[L] alpha_E_nc; // non-centered intercepts for enrollment growth
  vector[T] pi_E_nc; // non-centered time period effects for enrollment growth
  vector[L] beta_E_nc; // non-centered slope on labor earnings growth determining mean enrollment growth
  real<lower=0> sigma_alpha_E; // SD parameter for distribution of alpha_E
  real<lower=0> sigma_pi_E; // SD parameter for distribution of pi_E
  real mu_beta_E; // mean parameter for distribution of beta_E
  real<lower=0> sigma_beta_E; // SD parameter for distribution of beta_E
  real phi_E; // mean parameter for county and period effects on enrollment growth
  
  vector[L] alpha_W_nc; // non-centered intercepts for labor earnings growth
  vector[T] pi_W_nc; // non-centered time period effects for labor earnings growth
  vector[L] beta_W_nc; // non-centered slope on Bartik instrument modeling mean labor earnings growth  
  real<lower=0> sigma_alpha_W; // SD parameter for distribution of alpha_W
  real<lower=0> sigma_pi_W; // SD parameter for distribution of pi_W
  real mu_beta_W; // mean parameter for distribution of beta_W
  real<lower=0> sigma_beta_W; // SD parameter for distribution of beta_W
  real phi_W; // mean parameter for county and period effects on labor earnings growth

  vector<lower=0>[2] theta; // vector of conditional SD parameters for E, W
  corr_matrix[2] Rho; // correlation matrix
}

transformed parameters {
  vector[L] alpha_E; // intercepts for enrollment growth
  vector[T] pi_E; // vector of time period effects for enrollment growth
  vector[L] beta_E; // slope on labor earnings growth modeling mean enrollment growth
  vector[L] alpha_W; // intercepts for labor earnings growth
  vector[T] pi_W; // vector of time period effects for labor earnings growth
  vector[L] beta_W; // slope on Bartik instrument modeling mean labor earnings growth
  
  alpha_E = alpha_E_nc*sigma_alpha_E; // mean 0 with presence of phi_E
  pi_E = pi_E_nc*sigma_pi_E; // mean 0 with presence of phi_E
  beta_E = mu_beta_E + beta_E_nc*sigma_beta_E;
  
  alpha_W = alpha_W_nc*sigma_alpha_W; // mean 0 with presence of phi_W
  pi_W = pi_W_nc*sigma_pi_W; // mean 0 with presence of phi_W
  beta_W = mu_beta_W + beta_W_nc*sigma_beta_W; 
}

model {
  vector[L*T] mu_E; 
  vector[L*T] mu_W; 
  vector[2] Y[L*T];
  vector[2] Mu[L*T];
  
  // this is a little hacky, oh well for now
  for (l in 1:L) {
    for (t in 1:T) {
      mu_E[(l-1)*T + t] = phi_E + alpha_E[l] + pi_E[X[l][t]] + beta_E[l]*W[l][t]; 
      mu_W[(l-1)*T + t] = phi_W + alpha_W[l] + pi_W[X[l][t]] + beta_W[l]*Z[l][t];
    }
  }
  
  // this is a little hacky, oh well for now
  for (l in 1:L) {
    for (t in 1:T) {
      Y[(l-1)*T + t] = [E[l][t], W[l][t]]'; 
      Mu[(l-1)*T + t] = [mu_E[(l-1)*T + t], mu_W[(l-1)*T + t]]';
    }
  }  
  
  alpha_E_nc ~ normal(0, 1);
  pi_E_nc ~ normal(0, 1);
  beta_E_nc ~ normal(0, 1);
  sigma_alpha_E ~ normal(0, sigma_alpha_E_sd);
  sigma_pi_E ~ normal(0, sigma_pi_E_sd);
  mu_beta_E ~ normal(0, mu_beta_E_sd);
  sigma_beta_E ~ normal(0, sigma_beta_E_sd);
  phi_E ~ normal(0, phi_E_sd);
  
  alpha_W_nc ~ normal(0, 1);
  pi_W_nc ~ normal(0, 1);
  beta_W_nc ~ normal(0, 1);
  sigma_alpha_W ~ normal(0, sigma_alpha_W_sd);
  sigma_pi_W ~ normal(0, sigma_pi_W_sd);
  mu_beta_W ~ normal(0, mu_beta_W_sd);
  sigma_beta_W ~ normal(0, sigma_beta_W_sd);
  phi_W ~ normal(0, phi_W_sd);
  
  theta ~ normal(0, theta_sd);
  Rho ~ lkj_corr(2);
  
  Y ~ multi_normal(Mu, quad_form_diag(Rho, theta));
}

Hi, @brijo:

A Bayesian approach is going to be similar to a frequentist approach when we’re talking about observables. If there is structure in the residuals for a regression, it helps to model it. A typical approach is to use a time-series model for the residuals if they’re autocorrelated (i.e., the error at one time step is related to the error at previous time steps).

Presumably what you mean here is that the errors in Y are autocorrelated? If you want to model that, you need to figure out what the time series is to model the correlation. A vector autoregressive (VAR) prior would work in place of the multi_normal currently specified. Or you could go even further and adopt some kind of stochastic volatility model where the covariance would also vary, but those can be very hard to fit.

I’m not sure what “IV with a joint distribution” means. What’s “IV”?

With the part you document as “a little hacky”, it might be easier to define Y as an L x T matrix and then fill that and use to_vector to compute to a sequence (though you have to be careful of order, because our to_vector is column-major for matrices and row-major for arrays, so use whichever one’s appropriate).

You can code a non-centered parameterization directly now without all the intermediate transformed parameter fiddling. For example, this encodes a non-centered parameterization for beta_E whereas the version without the affine transform implemented a centered parameterization.

real<offset = mu_beta_E, multiplier = sigma_beta_E> beta_E;
...
beta_E ~ normal(mu_beta_E, sigma_beta_E);

It’s much more efficient to use a Cholesky factor formulation of the multivariate normal. You’re using Rho for the correlation matrix and theta for the scale, so you can just declare L_Rho as cholesky_factor_corr type and then just multiply it once by theta rather than using the quadratic form, and then use multi_normal_cholesky. We explain how to do this pretty early on in the User’s Guide chapter on regression.

Thank you, this is very helpful. I need to think more about what that model for the residuals would be.

Apologies for the excessive abbreviation. By “IV”, I just meant instrumental variables. Here, I do the IV with a joint normal distribution, where everything is estimated at once, rather than the 2SLS kind of deal.

I did not know about the multivariate normal implementation that uses Cholesky factors or about the new way to do non-centered parameterization. Very useful. Thank you.