Feature request: copulas for multivariate responses of mixed types

Paul I see that you list general multivariate models as an area for future development in brms. In randomized clinical trials there is a significant need for multivariate modeling of mixtures of univariate outcomes, including binary, ordinal, continuous, and time-to-event outcomes. Multivariate copulas seem to be the best way to go because researchers wish to get the usual marginal interpretation of treatment effect for each of the component outcomes. This paper by Costa and Drury is excellent, relating to a bivariate situation - a continuous outcome and a binary outcome joined with a copula. They even go so far as to allow the copula dependence parameter to vary by treatment, as in the placebo group the two responses can be decoupled (that would be a feature for the more distant future). I hope youā€™ll consider implementing copulas, at least ones with dependence parameters with specified priors but the parameter not varying with treatment. Thanks for considering!

4 Likes

Hi Frank,

I agree this will be an important extention for brms. Does anyone have any experience with copulas in Stan in general? My own knowledge in this area is still rather limited. Also, if we end up agreeing this is doable, we should open an issue in github (https://github.com/paul-buerkner/brms) where I keep track of all the feature requests.

1 Like

Great. Sorry I didnā€™t think of opening an issue in Github. I hope some people with Stan experience with copulas respond. The paper I referenced does not provide any code and Iā€™m not sure which software system they used. Iā€™ll contact them. Iā€™ll add this request to Github.

There is an excellent paper about using copulas on count data:

A PRIMER ON COPULAS FOR COUNT DATA
BY CHRISTIAN GENEST AND JOHANNA NESLEHOVA

Advantage of copulas are to model tail dependencies, multinormal gaussian only consider
linear dependencies between random variables.

Some work had only been done in Stan by Ben Lampert, itā€™s based upon the Poisson
distribution, but can easily enhanced to other discrete distributions.

Copulas are one way to go - a very flexible. This comes to a price though. Technically demanding,
but solvable. Then also thereā€™s the need of more data or stronger priors.
One has to consider if not multivariate extensions, eg. Laguerre Polynom(s) already may be
suitable. Then it comes to the point where we have to analyse your data,
happily, if not, we already have had been replaced by some sophisticated deep learning
whatsā€™oā€™ever.
Your question is to vague to give an answer. (By no means I want to offend anybody)
Just my 2 cents.

1 Like

@bgoodri has been talking about adding them to Stan. He is behind most of our multivariate stats.

The usual obstacles to new features are a clear design (what does this look like to you functionally when done) and someone to do it.

More the latter. They are basically just density functions for a multivariate random variable with uniform margins, so it is not as if there is much discretion in the design.

I wasnā€™t talking about anything fancy here, just:

  • function signature,
  • mathematical definition, and
  • naming conventions.

Is there a math lib issue somewhere?

More importantly, is it something you (Ben) think we should do?

I think it should be done, but there is no issue.

The signature would be a bit different from what we currently have in Stan Math because the random variable is bivariate or multivariate. I suppose we could require a vector or array of vectors that is exactly of length two, but it would look better in the Stan language if it were something like target += clayton_copula_lpdf(foo, bar | tau);.

The generic definition is


but there are dozens of specific ones within that definition that each have their own density functions. The names are reasonably straightforward because they tend to be named after people, but there are various parameterizations. A sane library would try to parameterize as many of them as possible in terms of Kendallā€™s \tau or something.

Thatā€™s definitely beyond what the language will support now. How would you feel if that had to be a tuple, as in:

clayton_copula_lpdf( (foo, bar) | tau);

or

(foo, bar) ~ clayton_copula_lpdf(tau);

For now, Iā€™d be OK with one that took an array or vector of size two.

Is the size two constraint because thatā€™s the only CDFs weā€™ll be able to implement?

I think a tuple of size two or an array of tuples is fine. Most known copulas are bivariate, but some are multivariate (Gaussian is the most common multivariate one). Also, some bivariate ones have multivariate extensions, which will put some stress on our naming conventions.

This is what I meant by a naming convention design. I donā€™t mind must putting multi_ in front of something or overloading. If you could write that up as a math lib issue with a single example to do first, we might be able to find someone to code it.

Iā€™m very glad to see this discussed. I canā€™t emphasize enough how many applications would benefit from copulas. It is the norm in randomized clinical trials, for example, to analyze all patient endpoints separately with no borrowing of information, not to mention using ad hoc frequentist type I error control based on independence of endpoints. In clinical trials costing tens of millions of $ we donā€™t even learn how the various patient outcomes ā€˜run togetherā€™ nor do we profit from the correlations in terms of frequentist (or Bayesian) power. At FDA Iā€™m pushing the utility of computing P(drug benefits outcome 1 and/or drug benefits outcome 2) and especially for the ā€œandā€ the dependence modeling is crucial.

5 Likes

Looks like weā€™re about to get a bunch of copula functions ported to Stan from the vinecopulib package. The first target copulas are

Gaussian, student-t, clayton, gumbel, frank, bb1, bb6, bb7, bb8

if that means anything to you. Thereā€™s no strict timeframe, as we rely on volunteer contributions.

5 Likes

This is great news Bob. Sorry for the delayed response. I hope you get volunteers. I am not familiar with most of those copulas. I think what is important is a general framework that allows combinations of dependent variables of mixed categorical / ordinal / continuous types.

An alternative approach is to not use a particular copula or to assume the marginals follow a specific distribution. Rather, there is a semi-parametric copula approach based upon the rank likelihood. In this case, ā€œAny continuous multivariate distribution can be used to form a copula model via
an inverse-CDF transformationā€ (p. 270). Then, similar to a probit model, the ranks are used to sample the underlying latent data. The nice aspects are that there is no need to sample the cutpoints for ordinal data and this works for ā€œmixed dataā€. And, in fact, many are now using this approach in lieu of a multivariate probit model. See the R packages BGGM and BDgraph.

Here is the paper.

ā€œEXTENDING THE RANK LIKELIHOOD FOR SEMIPARAMETRIC
COPULA ESTIMATIONā€

3 Likes

Hi Bob, has there been any progress towards porting those vinecopulib functions into Stan since 2018?

4 Likes

Hi @Bob_Carpenter, @Robert is working on HMMs + copulas for his dissertation and weā€™re wondering if there are any plans to port those into Stan? Any chance yā€™all might know @Jonah @charlesm93 @stevebronder?

5 Likes

Hey. I for one donā€™t know what the plan for this is. Either someone responds to this thread, else Iā€™ll bring it up at the next Stan meeting (next Thursday) and report here. Meetings are open and anyone is welcome to swing by if they want to discuss a topic.

2 Likes

@Bob_Carpenter from your 2018 post where you talking about the stuff like the Gaussian Copula here? Is there a math issue for these? I think just writing up how to add these would be useful to getting someone to add them. A lot of these seem reasonable where the parameters in the class would just be added to the pdf etc. function signature.

Iā€™ve coded up the multivariate normal copula with an example, stan function, and R code.

Iā€™ve gone through the math and we can do the derivatives as well as theyā€™re not so different than the multi normal cholesky lpdf.

15 Likes