Mundlak- or Chamberlain-type correlated random effects regression models




I have a few general questions regarding Mundlak- or Chamberlain-type so called correlated random effects (CRE) models and how to best deal with them in Stan.

Sorry, this got a bit long. The actual questions are at the bottom of this post.

Maybe a bit of background on what the general idea is, although I think that this is trivial for most of you (feel free to skip). In the applied (frequentist) econometrics literature the default for dealing with unobserved variation is to use “fixed effects” (I know these terms are very ambiguous across and sometimes within disciplines), by either demeaning, differencing, or using dummy variables. I think the dummy variable approach translates most naturally to a Bayesian setting, so here is a very simple example:

y_{it} = \alpha_i + \beta x_{it} + \epsilon_{it},

where i is a subject observed in time t. Let’s say that inference about \beta is important here. To account for unobserved variability across the i's, one estimates a varying intercept \alpha_i by including a matrix of dummy/indicator variables for each i. So far, so good.

I think I don’t have to explain the random effects approach, as it probably the norm in this community, since it is the most basic form of hierarchical model. :)

y_{it} = \alpha_0 + \alpha_i + \beta x_{it} + \epsilon_{it}, with \alpha_i \sim \text{Normal}(0, \sigma_{\text{subject}})

Since econometricians really like consistency, they usually prefer the fixed effects approach, because the random effects model might be inconsistent, if \alpha_i is correlated with x_{it}. If there is no correlation, the random effects model is more efficient, but this is usually not the case.

The fixed effects model comes with a few draw-backs. The most obvious is not being able to estimate some of the parameters that might be of interest, e.g. in a fixed effects model y_{it} = \alpha_i + \beta_1 x_{it} + \beta_2 z_i + \epsilon_{it}, where z_i is a subject-specific, time-invariant covariate, \beta_2 can’t be estimated.

The solution to the correlation problem is fairly simple (see this blog post by Andrew Gelman, or this one), and sometimes called Mundlak device, because of his 1978 Econometrica paper: Just include the subject-level mean of the time-varying covariate.

y_{it} = \alpha_0 + \alpha_i + \beta x_{it} + \gamma \bar{x}_i + \epsilon_{it}, with \alpha_i \sim \text{Normal}(0, \sigma_{\text{subject}}),
y_{it} = \alpha_0 + \alpha_i + \beta x_{it} + \epsilon_{it}, with \alpha_i \sim \text{Normal}(\gamma \bar{x}_i, \sigma_{\text{subject}}).

Noting the correlation between x_{it} and \bar{x}_i, some propose the “within-between” model, which can be viewed as a reformulation of the Mundlak model:

y_{it} = \alpha_0 + \alpha_i + \beta_{\text{within}} (x_{it} - \bar{x}_i) + \beta_{\text{between}} \bar{x}_i + \epsilon_{it},

with coefficients for the effect within a subject and between subjects respectively. This is the same as

y_{it} = \alpha_0 + \alpha_i + \beta_{\text{within}} x_{it} + \gamma \bar{x}_i + \epsilon_{it}, with \gamma = \beta_{\text{between}} - \beta_{\text{within}},

and these kind of models run pretty well in Stan using non-centered parametarization. This can also easily be extended to interactions between dimensions, e.g. having \alpha_0 + \alpha_i + \alpha_t + \alpha_{it}.

I was asking myself:

Is there a “more naturally Bayesian” approach to include \bar{x}_i? Just computing the sample mean and including it is discarding uncertainty about it, right?
I thought about including something like modeling it as \bar{x}_i \sim \text{Normal}(\mu_i,s_i/\sqrt T_i), where \mu_i is the sample mean as above and s_i is the sample standard deviation and T_i is the number of observations that go into the calculation of \mu and s. In an unbalanced panel data set, T might be different for each i.

Another question, that I had:

The Chamberlain-type model is a extension/generalization to the Mundlak. Instead of \alpha_i \sim \text{Normal}(\gamma \bar{x}_i, \sigma_{\text{subject}}), one estimates:

\alpha_i \sim \text{Normal}(\lambda_1 x_{i1} + \lambda_2 x_{i2} + \dots + \lambda_T x_{iT} , \sigma_{\text{subject}}).

Which seems to be really straight forward to estimate as an hierarchical model. Is there any good tricks to make it work for unbalanced panel data? For example if some of the x_{it} are missing for some i? I think explicitly modeling the missing data is opening another can of worms…

Is there any other flexible way of projecting x_{it} on \alpha_i, that is amenable to unbalanced panel data?
It feels like I am missing something here…


Interesting not trivial questions.

If we assume a time varying co-variate then we have to specify the time model to derive the variance. If we assume a random-walk with n steps then we have \sqrt n * \sigma_{subject} as standard-deviation. I mention it, because in the beginning of the post its says, subject i was observed in time t.

Is there a “more naturally Bayesian” approach to include ¯xi\bar{x}_i? Just computing the sample mean and including it is discarding uncertainty about it, right?

One could create a unified co-variance matrix with the kronecker product between \alpha_i and \beta.

I have the student-t distribution with adjusting the standard deviation in mind.

I hope somebody else with catch up with this post.