# Marginal Models for Multivariate (e.g. Longitudinal/Panel) Binary Data

We have binary longitudinal survey data, with some people measured repeatedly through time:
y_{ij} \in \{0,1\} for person i and measurement j taken at time t_{ij}

We fit a model for E[y_{ij}\ |\ x_{ij}, t_{ij}], which we will then project to the target population distribution of p(x_{ij}) (for some time t_{ij} = t) as in Ghitza and Gelman (2020).

So in the language of Longitudinal Data Analysis (p.9), we want marginal (population-averaged) models E[y_{ij}\ |\ x_{ij}, t_{ij}], not subject-specific (“random effects”/“mixed”) models E[y_{ij}\ |\ x_{ij}, t_{ij}, b_i] where b_i is person i's “random effect”.

Two options:

1. Fit a “random effects” model and marginalize over the person effects b_i. We have to be careful about mis-specifying the distribution p(b_i), see Antonelli et al. (2016). Has anyone implemented this non-parametric person effects model in Stan ?.

2. Fit the marginal model. But how to model dependence among a person’s measurements ? What is a convenient joint distribution for binary multivariate data ? One option uses the latent logistic variables (see O’Brien and Dunson (2004)):
y_{ij} = I(L_{ij} \leq f(x_{ij},t_{ij}))
(L_{i1},...,L_{in_i}) \sim \text{Multivariate Logistic}(0,R)
where marginally E[y_{ij}\ |\ x_{ij}, t_{ij}] = \text{logit}^{-1}(f(x_{ij},t_{ij})).
Has anyone implemented this multivariate logistic in Stan ?

Does anyone have another idea for marginal models for longitudinal binary data in Stan ?

Thank you !

4 Likes

This is an idea (but could be wrong):

E{ y[i,j] | x[i,j], t[i,j] } = E_b[i]{ E{ y[i,j] | x[i,j], t[i,j], b[i] }}

Furthermore, E{ y[i,j] | x[i,j], t[i,j], b[i] } has a closed form (if you are using logit) and the only random parameter being conditioned is b[i].

So I think you can take the average of inv_logit(x[i,j], t[i,j], b[i]), which is the same as marginalizing out b[i].

Sorry I cannot enter Latex.

1 Like

To my knowledge (but somebody please correct me if I’m wrong; @Bob_Carpenter) the Dirichlet Process Prior has not gotten any easier to implement in Stan since this was written https://groups.google.com/g/stan-users/c/uo2Y4x05hwU.

One possible way forward is to choose some candidate distribution for the random effect and then assess whether that choices looks okay via mixed predictive checking. For example, with a Gaussian random effect distribution, you could choose as your discrepancy measure a Shapiro-Wilk test statistic computed over your fitted random effects, and then ask how this compares per-iteration to the Shapiro-Wilk statistic computed over a random draw from the hyperparameters. I’ve done some of this with huge data sets, and what I find in general is that it is quite hard to reject the assumption of normality unless a substantial fraction of the parameters are informed by at least several Bernoulli trials. If a subset of the parameters are strongly informed and a lot of the parameters are weakly informed, it can be more powerful to compute the discrepancy measure just over the strongly informed parameters. So depending on your data this test might not be all that sensitive to misspecification, but on the other hand if a test like this isn’t sensitive to misspecification then presumably a more flexible class of random-effect distribution will lead to inference that is weakly identified at best. If you are able to detect misspecification like this, then you could graphically examine the distributions of the random effect parameters to get ideas for a better parametric distribution of the random effect.

Finally, this is a bit off-the-wall, but I just figured I’d mention that if the outcome is zero-inflated (i.e. if the random effect distribution is negative-infinity-inflated), then this model is mathematically equivalent to what ecologists call an “occupancy model” and you might find some useful resources in that direction. To be clear, traditionally occupancy models include only a zero-inflation term and not an additional subject-level random effect, but there’s no reason in principle that you couldn’t then include an additional random effect in the linear predictor for the mean conditional on not being in the zero-inflation state.

2 Likes

Thanks, @sonicking !

Yes, the marginal (population-averaged) model we want E[y_{ij}\ |\ x_{ij}, t_{ij}], is equal by iterated expectations to

E \left [ E[ y_{ij}\ |\ x_{ij}, t_{ij}, b_i] \ \bigg |\ x_{ij}, t_{ij} \right ]

The inner expectation is the subject-specific (“random effects”/“mixed”) model, which as you say can be logistic, with closed form e.g.: E[y_{ij}\ |\ x_{ij}, t_{ij}, b_i] = \text{logit}^{-1}(\beta x_{ij} + \alpha_{t_{ij}} + b_i).

The outer expectation is over the “random effect” distribution of b_i |\ x_{ij}, t_{ij}. Taking this expectation doesn’t in general have a closed form. More seriously, it is sensitive to the distribution of b_i |\ x_{ij}, t_{ij}, see Antonelli et al. (2016).

1 Like

Thanks so much, @jsocolar !

Yes, we did some posterior predictive checks for quantities we care about, which alerted us to the possible misspecification of the “random effects”.

And plotting the distribution of estimated “random effects” (as you suggest) reveals multi-modality.

Really interesting pointer to the “occupancy models” ! Our outcome isn’t zero-inflated, but perhaps the technique points to helpful ideas to more flexibly model “random effects” in general. Thanks so much !

Not sure if this package, brmsmargins, helps you. Good luck and please report back! Marginal Effects for Mixed Effects Models • brmsmargins

This is a shot in the dark, but if you have relatively few time points then it may be feasible to structure the model as categorical sequences of 0s/1s rather than multivariate logistic. For example, if any given respondent has at most 3 observations across time, you’d be predicting membership into one of eight sequences. For respondents who are missing observations, you would then have to marginalize over all remaining possible sequences. Instead of assuming a distribution for the random effects, you could freely model the baseline probability of being in each trajectory. I assume this would address the issue you highlight regarding normally-distributed random effects.

So, I experimented with an approximation approach to this problem that, after very limited testing, doesn’t appear to fail terribly. At some point I might test it more thoroughly. I use an extended Kalman filter to update the latent states and transform them to the linear predictor, then a) compute the likelihood conditional on the predictor exactly, and b) update the latent states assuming the residuals are normally distributed, which they aren’t. You can see an example with limited explanation at Binary data in state space models | Charles Driver

1 Like