Suppose that there are multiple outcome variables of different types, e.g., suppose we have a linear model for Y_1 and a binary logistic model for Y_2. Fitting a multivariate model would be logically done by using cópulas but that is too messy. Were the two outcomes repeatedly measured, we could have random effects in each, and could model a correlation between the two sets of random effects. But how does one think about this when there are no random effects in each univariate model, in order to allow dependence between Y_1 and Y_2 without using a cópula?
Following your example, suppose
Y_{1, n} \sim \textrm{normal}(\alpha + \beta \cdot x_n, \sigma)
and
Y_{2, n} \sim \textrm{bernoulli}(\textrm{logit}^{-1}(\gamma + \delta \cdot u_n))
and we want to define a joint model p_Y(y \mid \alpha, \beta, \gamma, \delta, \sigma).
As you mention, one way to model correlation between Y_1 and Y_2 that doesn’t arise from correlation between x_n and u_n is to add a varying effect (error) term \epsilon_n \in \mathbb{R}^3 with a multivariate normal prior and extended likelihoods
Y_{1, n} \sim \textrm{normal}(\epsilon_{n,1} + \alpha + \beta \cdot x_n, \exp(\log \sigma + \epsilon_{n, 2}))
and
Y_{2, n} \sim \textrm{bernoulli}(\textrm{logit}^{-1}(\epsilon_{n,3} + \gamma + \delta \cdot u_n))
If you don’t want to add a random effect like this, I don’t know how what you can do other than use a copula. I don’t think it’d be easy to marginalize out the \epsilon here and get a direct definition w/o the varying effect. Also, HMC/NUTS scales very well in dimensionality of random effects.
But even fitting a copula is going to require multiple n, isn’t it? Also, I’m curious what gets messy with a copula. I’ve never used one myself. I should teach myself copulas better by writing a chapter for the user’s guide. That’s how I’ve learned most of the models I know (don’t worry—I always get adult supervision before publishing!).
Thanks Bob!
One of the difficulties with copulas is that when you mix discrete and continuous response variables the copula is not unique and somewhat arbitrary decisions have to be made. Then there is the restriction that copulas places on corr(Y_{1}, Y_{2}).
What is the rationale for having a random error in the variance of Y_1?
With these n \times 3 random errors I would guess that the effective number of parameters being estimated is quite large and sampling may be slow as a result.
I may not need to marginalize to be able to use the models, but I assume your point is that if you were to marginalize out the random errors the resulting marginal model for one result may not resemble the standard models. That’s one problem that copulas don’t have.
One of the difficulties with copulas is that when you mix discrete and continuous response variables the copula is not unique and somewhat arbitrary decisions have to be made.
If it helps, I’m currently reworking and extending some methods for working with copulas in Stan. Here’s the section on mixing continuous and discrete responses in a pretty trivial way.
The examples are limited to Gaussian copula at the moment, but I’m currently adding student-t examples and then starting to look at vine copula
This is great to hear. I like the ability to stick to standard single-outcome models when using copulas. I’d like to use your approach if you can make it apply to any number of outcomes (up to say, 4 at least) of mixed types: continuous, time-to-event, and ordinal (proportional odds model).
I can’t wait to read your article. This may be a lot to ask but I hope this can get more and more user-friendly. One of the ways to accomplish this would be for the user to provide a series of Stan code fragments that stand alone for univariate analysis that some code could put together as a copula, with the combined code compiled by Stan.
I just thought you wanted all the parameters to covary.
HMC scales really well in dimension as long as the geometry isn’t too bad. I’ve fit models for RNA sequencing data with 400K random effects (20K gene effects across 20 experimental conditions) with no problem (as in it fits in like 30 minutes, not 3 days). I’ve fit million-parameter random effects in other cases. So I don’t think it’ll be an issue, especially if it’s not implicated in variance.
Hi, @andrjohns. Would you consider writing a short chapter for the User’s Guide? Otherwise, I might try so that I can learn copulas and then ask you for feedback.
We tend to bounce between the extremes of “write it yourself in Stan” and “we’ll encapsulate it all for you in brms”. The trick to doing something in between is coming up with useful reusable functions as we dont’ have a lot of other angles of attack.
Thanks again Bob. I think I would avoid extra random effects parameters that allow variances to be affected by another model’s outcome variable values.
I’m still trying to wrap my head around why a result would not be unstable with so many parameters that don’t have severe constraints on them. In a similar spirit it seems like the problem with trying to add random effects to handle repeated measures when only a few subjects have more than one Y measurement.
@harrelfe Remembered I had an old proof-of-concept implementing what (I think) you’re after in brms: CorrelatedDiscrete.md · GitHub, using random-effects to model correlated binomial and poisson outcomes
It’s a few years old, so the Stan/brms code could be out-of-date (and the math/approach could be completely wrong)
That is really wild to model different outcome variables as different observations and induce within-person correlations using standard random effects methods. I wonder if that would work if two outcomes have dramatically different scales. I guess also that unlike Bob’s approach it would not be too easy to have totally different predictors applying to different outcomes.
This might be too obvious or trivial, but for completeness, another option is to include the outcome (or residual) of one model as a covariate in the other (and vice versa if desired). While this seems trivial, I thought it’d be worth mentioning since it’s similar to the “autologistic” structure. When a binary response is autocorrelated, it is generally not well identified to construct an autoregressive model based on latent residuals. One solution is to use the lagged outcomes as covariates. In the present case, we again want some correlation structure that we might ordinarily achieve with latent residuals, but we can’t readily do this with a binary outcome, so perhaps there’s some useful analogy here.
Interesting. In Markov models, including lags as covariates is standard and makes model fitting really easy. The marginalization required to compute state occupancy probabilities SOPs is not very difficult if measurement times are regular. While modeling transition probabilities for the first follow-up time period, intent-to-treat (ITT) in a randomized trial is preserved. Then after the first time period ITT is no longer preserved but SOPs return us to the ITT state. When you condition on another event, there may be problems marginalizing it back out to get ITT, and there may be problems with different outcomes being on different follow-up schedules and with censored time-to-event. For example if modeling an ordinal outcome at 2 months and time-to-death with up to 1 year follow-up, for those that haven’t died it’s hard to know what to put into the ordinal model. It’s also difficult to know which outcome is conditioned on. In a Markov model there is a simple forward flow of information. In general that’s not the case.
@harrelfe As far as I know, these models have been used before. For instance, this is how Chib & Winkleman (2001) model correlated count data, which I’ve used in Stan with success.
Chib & Winkleman (2001). MCMC analysis of correlated count data. https://doi.org/10.1198/07350010152596673
Excellent. Did you use special factorizations as in their paper or did you let Stan do all the work?
Yes, I used a multivariate normal distribution on the log latent effects/random intercepts, and just let Stan estimate the correlations across my 11 Poisson variables.