Hierarchical model for panel data with many time points and assumed autocorrelation

I am hoping for some modeling advice regarding how to approach a specific DGP.

@rtrangucci & @James_Savage: based on things I’ve read from both of you, I think you both might have experience with Stan models directly relevant this question. I hope you don’t mind the @!

Here’s the problem:

Given panel data for 49 units, many time points, and unknown, unit-varying serial autocorrelation, how would you estimate (1) the unit varying treatment effect for an exogenous intervention D_t administered to all units at the same time and (2) estimate the degree to which the variation of the treatment effect is moderated by a time-invariant cross-sectional covariate M.

That is, I’d like to estimate a model like:

log(y_{it}) = \alpha + f_i(y_{it}) + D_{it} \delta_i + D_{it} M_i \beta + u_i

Where \delta_i is the unit varying treatment effect, \beta represents the change in \delta_i as M_i, and f_i(y_{it}) is an a priori unknown serial autocorrelation function for unit i.

I’ve identified three possible options:

  1. A hierarchical dynamic panel model with simplifying assumptions about the structure of the serial autocorrelation (as discussed here: Dynamic panel data models with Stan?)

  2. A hierarchical state-space (e.g. structural time series) model

  3. A hierarchical gaussian process model

Option one seems the simplest, but identifying the autocorrelation function for each unit seems daunting. Options 2 and 3 would be new to me. They also seem promising, but I understand that they might be difficult to fit and computational intensive, especially with many observations. I have 24 years of observations that could be aggregated into units as small as days; so, both T and N could be relatively large.

I realize this is an open ended question; but, if one of the above options should be avoided, it would be nice to know that now rather than realize it later! I’ve read quite a bit about frequentist models for DGPs like I’ve described, but I’ve seen a less discussion of Bayesian approaches.

Presumably, all three of the approaches I note could work. Accordingly, I’m hoping someone might have suggestions about how to optimally compromise between pragmatism and precision.

Also, just to be clear, these models will not be used for forecasting. I just want to estimate the treatment and moderator effects.