Paul I see that you list general multivariate models as an area for future development in `brms`

. In randomized clinical trials there is a significant need for multivariate modeling of mixtures of univariate outcomes, including binary, ordinal, continuous, and time-to-event outcomes. Multivariate copulas seem to be the best way to go because researchers wish to get the usual marginal interpretation of treatment effect for each of the component outcomes. This paper by Costa and Drury is excellent, relating to a bivariate situation - a continuous outcome and a binary outcome joined with a copula. They even go so far as to allow the copula dependence parameter to vary by treatment, as in the placebo group the two responses can be decoupled (that would be a feature for the more distant future). I hope youāll consider implementing copulas, at least ones with dependence parameters with specified priors but the parameter not varying with treatment. Thanks for considering!

# Feature request: copulas for multivariate responses of mixed types

**harrelfe**#1

Hi Frank,

I agree this will be an important extention for brms. Does anyone have any experience with copulas in Stan in general? My own knowledge in this area is still rather limited. Also, if we end up agreeing this is doable, we should open an issue in github (https://github.com/paul-buerkner/brms) where I keep track of all the feature requests.

**harrelfe**#3

Great. Sorry I didnāt think of opening an issue in Github. I hope some people with Stan experience with copulas respond. The paper I referenced does not provide any code and Iām not sure which software system they used. Iāll contact them. Iāll add this request to Github.

There is an excellent paper about using copulas on count data:

A PRIMER ON COPULAS FOR COUNT DATA

BY CHRISTIAN GENEST AND JOHANNA NESLEHOVA

Advantage of copulas are to model tail dependencies, multinormal gaussian only consider

linear dependencies between random variables.

Some work had only been done in Stan by Ben Lampert, itās based upon the Poisson

distribution, but can easily enhanced to other discrete distributions.

Copulas are one way to go - a very flexible. This comes to a price though. Technically demanding,

but solvable. Then also thereās the need of more data or stronger priors.

One has to consider if not multivariate extensions, eg. Laguerre Polynom(s) already may be

suitable. Then it comes to the point where we have to analyse your data,

happily, if not, we already have had been replaced by some sophisticated deep learning

whatsāoāever.

Your question is to vague to give an answer. (By no means I want to offend anybody)

Just my 2 cents.

@bgoodri has been talking about adding them to Stan. He is behind most of our multivariate stats.

The usual obstacles to new features are a clear design (what does this look like to you functionally when done) and someone to do it.

**bgoodri**#6

More the latter. They are basically just density functions for a multivariate random variable with uniform margins, so it is not as if there is much discretion in the design.

I wasnāt talking about anything fancy here, just:

- function signature,
- mathematical definition, and
- naming conventions.

Is there a math lib issue somewhere?

More importantly, is it something you (Ben) think we should do?

**bgoodri**#8

I think it should be done, but there is no issue.

The signature would be a bit different from what we currently have in Stan Math because the random variable is bivariate or multivariate. I suppose we could require a vector or array of vectors that is exactly of length two, but it would look better in the Stan language if it were something like `target += clayton_copula_lpdf(foo, bar | tau);`

.

The generic definition is

but there are dozens of specific ones within that definition that each have their own density functions. The names are reasonably straightforward because they tend to be named after people, but there are various parameterizations. A sane library would try to parameterize as many of them as possible in terms of Kendallās \tau or something.

Thatās definitely beyond what the language will support now. How would you feel if that had to be a tuple, as in:

```
clayton_copula_lpdf( (foo, bar) | tau);
```

or

```
(foo, bar) ~ clayton_copula_lpdf(tau);
```

For now, Iād be OK with one that took an array or vector of size two.

Is the size two constraint because thatās the only CDFs weāll be able to implement?

**bgoodri**#10

I think a tuple of size two or an array of tuples is fine. Most known copulas are bivariate, but some are multivariate (Gaussian is the most common multivariate one). Also, some bivariate ones have multivariate extensions, which will put some stress on our naming conventions.

**Bob_Carpenter**#11

This is what I meant by a naming convention design. I donāt mind must putting `multi_`

in front of something or overloading. If you could write that up as a math lib issue with a single example to do first, we might be able to find someone to code it.

**harrelfe**#12

Iām very glad to see this discussed. I canāt emphasize enough how many applications would benefit from copulas. It is the norm in randomized clinical trials, for example, to analyze all patient endpoints separately with no borrowing of information, not to mention using ad hoc frequentist type I error control based on independence of endpoints. In clinical trials costing tens of millions of $ we donāt even learn how the various patient outcomes ārun togetherā nor do we profit from the correlations in terms of frequentist (or Bayesian) power. At FDA Iām pushing the utility of computing P(drug benefits outcome 1 and/or drug benefits outcome 2) and especially for the āandā the dependence modeling is crucial.

**Bob_Carpenter**#13

Looks like weāre about to get a bunch of copula functions ported to Stan from the vinecopulib package. The first target copulas are

Gaussian, student-t, clayton, gumbel, frank, bb1, bb6, bb7, bb8

if that means anything to you. Thereās no strict timeframe, as we rely on volunteer contributions.