Suppose I have an observed response vector Y_i = (C_i, O_i) where each individual/row in my dataset has a series of continuous measurements C and discrete ordinal scores O. Separately, these models are rather straightforward (although certainly not trivial) in Stan.
In the continuous case C we can simply design a MVN model with a mean vector \mathbf{\mu} and covariance structure \Sigma. The code here MVN code demonstrates an example from my own use case. Further, in the ordinal case O, one can use @bgoodri 's parameterization here that is a parameterization of a tMVN and the GHK algorithm.
The crux of this post is if I can combine these two models into a single model. That is, instead of modeling the continuous and ordinal models separately, can they be modeled jointly in a way that captures the correlation/relationship between each outcome according to some predictor x. Back to Zhang et al., from above we assumes the continuous variables C_i follows a MVN with some mean function \mu and covariance structure \Sigma. We also assume the ordinals variables O_i follow a multivariate probit model with continuous normal latent variable ZO_i also with some mean \mu and covariance \Sigma. Thus I have a final response vector Z_i = (C_i, ZO_i). At the end of the day I want to model Z_i \sim MVN(\mu_i, \Sigma).
This is where I am stuck. Looking through the forum, @CerulloE has adapted the probit code for both binary and polychotomous responses. My questions is if I can extend it or adapt it to incorportate continuous and ordinal (both binary and polychotomous) responses.
It does not have ordinal outcomes yet. @andrjohns ported this to brms and has a branch where you can use brms R code to model the mixed outcomes. He may have added ordinal. Anyway, you can follow the idea and code up the ordinal marginal to use with the normal marginals.
Thanks @spinkney! The key limitation with the mixed-type outcome modelling is that it requires the CDF for each outcome type. Unfortunately, Stan does not yet have an ordinal CDF.
It’s something that I’m planning on adding eventually, but I can’t say for certain when I’ll get to it.
So I will first admit, I’m a novice at the usage and implementation of copulas - apologies if I make 0 sense or way off base with my train of thought.
I’ve been scanning your helpful Stan functions as well as your example code for a general multi_normal copula here. Please correct me if I am wrong, but essentially the workflow is as follows:
use the normal marginal function from your link on the continuous variables in my dataset - this returns a 2D array of matrices where rtn[1] is the centered random variable and rtn[2] is the jacobian adjustment.
use some sort of “ordinal” marginal function on the ordinal variables in my dataset. Now, I know you said this doesn’t currently exist technically. But, looking at @bgoodri 's tMVN/GHK parameterization and comparing it to some of your other marginals like poisson and bernoulli, it appears I want the variable z which represents my random variable and the operation on bound, which represents the jacobian adjustment. So like above, I end up with a 2D array of matrices with the random variable z and jacobian adjustment.
The marginals from the normal marginal and the “ordinal” marginal can then be fed in as arguments in your centered_gaussian_copula_cholesky_lpdf log prob function.
3b) Essentially the construction here is like your code here. The exception is your likelihood statement: my for loop would be something like:
for(n to N){
y[cont,N] = normal_marginal(...)
y[ord,N] = ordinal_marginal(...) // here I mean z and bound from the tMVN code
}
y ~ multi_normal_cholesky_copula(L)