Feature request: copulas for multivariate responses of mixed types

harrelfe · May 23, 2018, 11:38am

Paul I see that you list general multivariate models as an area for future development in brms. In randomized clinical trials there is a significant need for multivariate modeling of mixtures of univariate outcomes, including binary, ordinal, continuous, and time-to-event outcomes. Multivariate copulas seem to be the best way to go because researchers wish to get the usual marginal interpretation of treatment effect for each of the component outcomes. This paper by Costa and Drury is excellent, relating to a bivariate situation - a continuous outcome and a binary outcome joined with a copula. They even go so far as to allow the copula dependence parameter to vary by treatment, as in the placebo group the two responses can be decoupled (that would be a feature for the more distant future). I hope you’ll consider implementing copulas, at least ones with dependence parameters with specified priors but the parameter not varying with treatment. Thanks for considering!

paul.buerkner · May 23, 2018, 11:42am

Hi Frank,

I agree this will be an important extention for brms. Does anyone have any experience with copulas in Stan in general? My own knowledge in this area is still rather limited. Also, if we end up agreeing this is doable, we should open an issue in github (https://github.com/paul-buerkner/brms) where I keep track of all the feature requests.

harrelfe · May 23, 2018, 11:44am

Great. Sorry I didn’t think of opening an issue in Github. I hope some people with Stan experience with copulas respond. The paper I referenced does not provide any code and I’m not sure which software system they used. I’ll contact them. I’ll add this request to Github.

andre.pfeuffer · May 30, 2018, 3:18am

There is an excellent paper about using copulas on count data:

A PRIMER ON COPULAS FOR COUNT DATA
BY CHRISTIAN GENEST AND JOHANNA NESLEHOVA

Advantage of copulas are to model tail dependencies, multinormal gaussian only consider
linear dependencies between random variables.

Some work had only been done in Stan by Ben Lampert, it’s based upon the Poisson
distribution, but can easily enhanced to other discrete distributions.

Copulas are one way to go - a very flexible. This comes to a price though. Technically demanding,
but solvable. Then also there’s the need of more data or stronger priors.
One has to consider if not multivariate extensions, eg. Laguerre Polynom(s) already may be
suitable. Then it comes to the point where we have to analyse your data,
happily, if not, we already have had been replaced by some sophisticated deep learning
whats’o’ever.
Your question is to vague to give an answer. (By no means I want to offend anybody)
Just my 2 cents.

Bob_Carpenter · June 5, 2018, 2:00am

@bgoodri has been talking about adding them to Stan. He is behind most of our multivariate stats.

The usual obstacles to new features are a clear design (what does this look like to you functionally when done) and someone to do it.

bgoodri · June 5, 2018, 2:19am

More the latter. They are basically just density functions for a multivariate random variable with uniform margins, so it is not as if there is much discretion in the design.

Bob_Carpenter · June 5, 2018, 2:33am

I wasn’t talking about anything fancy here, just:

function signature,
mathematical definition, and
naming conventions.

Is there a math lib issue somewhere?

More importantly, is it something you (Ben) think we should do?

bgoodri · June 5, 2018, 5:02am

I think it should be done, but there is no issue.

The signature would be a bit different from what we currently have in Stan Math because the random variable is bivariate or multivariate. I suppose we could require a vector or array of vectors that is exactly of length two, but it would look better in the Stan language if it were something like target += clayton_copula_lpdf(foo, bar | tau);.

The generic definition is

but there are dozens of specific ones within that definition that each have their own density functions. The names are reasonably straightforward because they tend to be named after people, but there are various parameterizations. A sane library would try to parameterize as many of them as possible in terms of Kendall’s \tau or something.

Bob_Carpenter · June 5, 2018, 5:06am

That’s definitely beyond what the language will support now. How would you feel if that had to be a tuple, as in:

clayton_copula_lpdf( (foo, bar) | tau);

or

(foo, bar) ~ clayton_copula_lpdf(tau);

For now, I’d be OK with one that took an array or vector of size two.

Is the size two constraint because that’s the only CDFs we’ll be able to implement?

bgoodri · June 5, 2018, 5:14am

I think a tuple of size two or an array of tuples is fine. Most known copulas are bivariate, but some are multivariate (Gaussian is the most common multivariate one). Also, some bivariate ones have multivariate extensions, which will put some stress on our naming conventions.

Bob_Carpenter · June 5, 2018, 5:20am

This is what I meant by a naming convention design. I don’t mind must putting multi_ in front of something or overloading. If you could write that up as a math lib issue with a single example to do first, we might be able to find someone to code it.

harrelfe · June 6, 2018, 11:45am

I’m very glad to see this discussed. I can’t emphasize enough how many applications would benefit from copulas. It is the norm in randomized clinical trials, for example, to analyze all patient endpoints separately with no borrowing of information, not to mention using ad hoc frequentist type I error control based on independence of endpoints. In clinical trials costing tens of millions of $ we don’t even learn how the various patient outcomes ‘run together’ nor do we profit from the correlations in terms of frequentist (or Bayesian) power. At FDA I’m pushing the utility of computing P(drug benefits outcome 1 and/or drug benefits outcome 2) and especially for the “and” the dependence modeling is crucial.

Bob_Carpenter · June 6, 2018, 8:39pm

Looks like we’re about to get a bunch of copula functions ported to Stan from the vinecopulib package. The first target copulas are

Gaussian, student-t, clayton, gumbel, frank, bb1, bb6, bb7, bb8

if that means anything to you. There’s no strict timeframe, as we rely on volunteer contributions.

harrelfe · June 28, 2018, 11:41am

This is great news Bob. Sorry for the delayed response. I hope you get volunteers. I am not familiar with most of those copulas. I think what is important is a general framework that allows combinations of dependent variables of mixed categorical / ordinal / continuous types.

donny · September 26, 2020, 1:06pm

An alternative approach is to not use a particular copula or to assume the marginals follow a specific distribution. Rather, there is a semi-parametric copula approach based upon the rank likelihood. In this case, “Any continuous multivariate distribution can be used to form a copula model via
an inverse-CDF transformation” (p. 270). Then, similar to a probit model, the ranks are used to sample the underlying latent data. The nice aspects are that there is no need to sample the cutpoints for ordinal data and this works for “mixed data”. And, in fact, many are now using this approach in lieu of a multivariate probit model. See the R packages BGGM and BDgraph.

Here is the paper.

“EXTENDING THE RANK LIKELIHOOD FOR SEMIPARAMETRIC
COPULA ESTIMATION”

robert · February 27, 2021, 6:14pm

Hi Bob, has there been any progress towards porting those vinecopulib functions into Stan since 2018?

vianeylb · March 1, 2021, 10:07pm

Hi @Bob_Carpenter, @Robert is working on HMMs + copulas for his dissertation and we’re wondering if there are any plans to port those into Stan? Any chance y’all might know @Jonah @charlesm93 @stevebronder?

charlesm93 · March 5, 2021, 9:23pm

Hey. I for one don’t know what the plan for this is. Either someone responds to this thread, else I’ll bring it up at the next Stan meeting (next Thursday) and report here. Meetings are open and anyone is welcome to swing by if they want to discuss a topic.

stevebronder · March 6, 2021, 9:30pm

@Bob_Carpenter from your 2018 post where you talking about the stuff like the Gaussian Copula here? Is there a math issue for these? I think just writing up how to add these would be useful to getting someone to add them. A lot of these seem reasonable where the parameters in the class would just be added to the pdf etc. function signature.

spinkney · March 7, 2021, 1:12am

I’ve coded up the multivariate normal copula with an example, stan function, and R code.

I’ve gone through the math and we can do the derivatives as well as they’re not so different than the multi normal cholesky lpdf.

Topic		Replies	Views
Brms: multivariate response, different distributions brms	8	3221	July 29, 2020
Marginal_effects for multivariate models with mixed continuous/ordinal responses brms	4	483	May 15, 2019
Rsidual correlation in multivariate non gaussian model Modeling brms	1	116	March 6, 2024
Possible to model mixed families with set_rescor(TRUE)? brms	10	735	November 30, 2023
Multivariate formula with different number of observations brms cognitive-science	28	1998	August 5, 2020

Feature request: copulas for multivariate responses of mixed types

Related Topics