Joint Modeling of Mixed Outcome Types

Hi all!

Broad question:
In Stan, is it possible to jointly model a multivariate response vector of mixed data types?

I borrow heavily from these two papers for the generalities:
Gueorguieva_Agresti_2001.pdf (1.3 MB)
Zhang_EtAl_2014.pdf (1.5 MB)

Suppose I have an observed response vector Y_i = (C_i, O_i) where each individual/row in my dataset has a series of continuous measurements C and discrete ordinal scores O. Separately, these models are rather straightforward (although certainly not trivial) in Stan.

In the continuous case C we can simply design a MVN model with a mean vector \mathbf{\mu} and covariance structure \Sigma. The code here MVN code demonstrates an example from my own use case. Further, in the ordinal case O, one can use @bgoodri 's parameterization here that is a parameterization of a tMVN and the GHK algorithm.

The crux of this post is if I can combine these two models into a single model. That is, instead of modeling the continuous and ordinal models separately, can they be modeled jointly in a way that captures the correlation/relationship between each outcome according to some predictor x. Back to Zhang et al., from above we assumes the continuous variables C_i follows a MVN with some mean function \mu and covariance structure \Sigma. We also assume the ordinals variables O_i follow a multivariate probit model with continuous normal latent variable ZO_i also with some mean \mu and covariance \Sigma. Thus I have a final response vector Z_i = (C_i, ZO_i). At the end of the day I want to model Z_i \sim MVN(\mu_i, \Sigma).

This is where I am stuck. Looking through the forum, @CerulloE has adapted the probit code for both binary and polychotomous responses. My questions is if I can extend it or adapt it to incorportate continuous and ordinal (both binary and polychotomous) responses.

Thank you!

You can do this with a normal copula. Code for mixed discrete is at Helpful Stan Functions: Mixed Discrete-Continuous Gaussian Copula Functions.

It does not have ordinal outcomes yet. @andrjohns ported this to brms and has a branch where you can use brms R code to model the mixed outcomes. He may have added ordinal. Anyway, you can follow the idea and code up the ordinal marginal to use with the normal marginals.

1 Like

Thanks @spinkney! The key limitation with the mixed-type outcome modelling is that it requires the CDF for each outcome type. Unfortunately, Stan does not yet have an ordinal CDF.

It’s something that I’m planning on adding eventually, but I can’t say for certain when I’ll get to it.

Thank you, @spinkney and @andrjohns !

So I will first admit, I’m a novice at the usage and implementation of copulas - apologies if I make 0 sense or way off base with my train of thought.

I’ve been scanning your helpful Stan functions as well as your example code for a general multi_normal copula here. Please correct me if I am wrong, but essentially the workflow is as follows:

  1. use the normal marginal function from your link on the continuous variables in my dataset - this returns a 2D array of matrices where rtn[1] is the centered random variable and rtn[2] is the jacobian adjustment.

  2. use some sort of “ordinal” marginal function on the ordinal variables in my dataset. Now, I know you said this doesn’t currently exist technically. But, looking at @bgoodri 's tMVN/GHK parameterization and comparing it to some of your other marginals like poisson and bernoulli, it appears I want the variable z which represents my random variable and the operation on bound, which represents the jacobian adjustment. So like above, I end up with a 2D array of matrices with the random variable z and jacobian adjustment.

  3. The marginals from the normal marginal and the “ordinal” marginal can then be fed in as arguments in your centered_gaussian_copula_cholesky_lpdf log prob function.

3b) Essentially the construction here is like your code here. The exception is your likelihood statement: my for loop would be something like:

for(n to N){
y[cont,N] = normal_marginal(...)
y[ord,N] = ordinal_marginal(...) // here I mean z and bound from the tMVN code

y ~ multi_normal_cholesky_copula(L)

Am I close at all?

Yes! Take a look at this more fleshed out example by @andrjohns Gaussian Copulas · Issue #1317 · paul-buerkner/brms · GitHub

@andrjohns As I slowly try to figure out how to code up a CDF for probit outcomes (maybe similar to @bgoodri 's multi-probit code??) I was perusing here Seemingly unrelated / Multivariate Probit Regression #1366 and here Gaussian Copulas #1317.

Correct me if I am wrong, but it may be possible that in the coming weeks you’ll have a possible probit/ordinal cdf?