Paul I see that you list general multivariate models as an area for future development in brms
. In randomized clinical trials there is a significant need for multivariate modeling of mixtures of univariate outcomes, including binary, ordinal, continuous, and time-to-event outcomes. Multivariate copulas seem to be the best way to go because researchers wish to get the usual marginal interpretation of treatment effect for each of the component outcomes. This paper by Costa and Drury is excellent, relating to a bivariate situation - a continuous outcome and a binary outcome joined with a copula. They even go so far as to allow the copula dependence parameter to vary by treatment, as in the placebo group the two responses can be decoupled (that would be a feature for the more distant future). I hope youāll consider implementing copulas, at least ones with dependence parameters with specified priors but the parameter not varying with treatment. Thanks for considering!
Hi Frank,
I agree this will be an important extention for brms. Does anyone have any experience with copulas in Stan in general? My own knowledge in this area is still rather limited. Also, if we end up agreeing this is doable, we should open an issue in github (https://github.com/paul-buerkner/brms) where I keep track of all the feature requests.
Great. Sorry I didnāt think of opening an issue in Github. I hope some people with Stan experience with copulas respond. The paper I referenced does not provide any code and Iām not sure which software system they used. Iāll contact them. Iāll add this request to Github.
There is an excellent paper about using copulas on count data:
A PRIMER ON COPULAS FOR COUNT DATA
BY CHRISTIAN GENEST AND JOHANNA NESLEHOVA
Advantage of copulas are to model tail dependencies, multinormal gaussian only consider
linear dependencies between random variables.
Some work had only been done in Stan by Ben Lampert, itās based upon the Poisson
distribution, but can easily enhanced to other discrete distributions.
Copulas are one way to go - a very flexible. This comes to a price though. Technically demanding,
but solvable. Then also thereās the need of more data or stronger priors.
One has to consider if not multivariate extensions, eg. Laguerre Polynom(s) already may be
suitable. Then it comes to the point where we have to analyse your data,
happily, if not, we already have had been replaced by some sophisticated deep learning
whatsāoāever.
Your question is to vague to give an answer. (By no means I want to offend anybody)
Just my 2 cents.
@bgoodri has been talking about adding them to Stan. He is behind most of our multivariate stats.
The usual obstacles to new features are a clear design (what does this look like to you functionally when done) and someone to do it.
More the latter. They are basically just density functions for a multivariate random variable with uniform margins, so it is not as if there is much discretion in the design.
I wasnāt talking about anything fancy here, just:
- function signature,
- mathematical definition, and
- naming conventions.
Is there a math lib issue somewhere?
More importantly, is it something you (Ben) think we should do?
I think it should be done, but there is no issue.
The signature would be a bit different from what we currently have in Stan Math because the random variable is bivariate or multivariate. I suppose we could require a vector or array of vectors that is exactly of length two, but it would look better in the Stan language if it were something like target += clayton_copula_lpdf(foo, bar | tau);
.
The generic definition is
but there are dozens of specific ones within that definition that each have their own density functions. The names are reasonably straightforward because they tend to be named after people, but there are various parameterizations. A sane library would try to parameterize as many of them as possible in terms of Kendallās \tau or something.
Thatās definitely beyond what the language will support now. How would you feel if that had to be a tuple, as in:
clayton_copula_lpdf( (foo, bar) | tau);
or
(foo, bar) ~ clayton_copula_lpdf(tau);
For now, Iād be OK with one that took an array or vector of size two.
Is the size two constraint because thatās the only CDFs weāll be able to implement?
I think a tuple of size two or an array of tuples is fine. Most known copulas are bivariate, but some are multivariate (Gaussian is the most common multivariate one). Also, some bivariate ones have multivariate extensions, which will put some stress on our naming conventions.
This is what I meant by a naming convention design. I donāt mind must putting multi_
in front of something or overloading. If you could write that up as a math lib issue with a single example to do first, we might be able to find someone to code it.
Iām very glad to see this discussed. I canāt emphasize enough how many applications would benefit from copulas. It is the norm in randomized clinical trials, for example, to analyze all patient endpoints separately with no borrowing of information, not to mention using ad hoc frequentist type I error control based on independence of endpoints. In clinical trials costing tens of millions of $ we donāt even learn how the various patient outcomes ārun togetherā nor do we profit from the correlations in terms of frequentist (or Bayesian) power. At FDA Iām pushing the utility of computing P(drug benefits outcome 1 and/or drug benefits outcome 2) and especially for the āandā the dependence modeling is crucial.
Looks like weāre about to get a bunch of copula functions ported to Stan from the vinecopulib package. The first target copulas are
Gaussian, student-t, clayton, gumbel, frank, bb1, bb6, bb7, bb8
if that means anything to you. Thereās no strict timeframe, as we rely on volunteer contributions.
This is great news Bob. Sorry for the delayed response. I hope you get volunteers. I am not familiar with most of those copulas. I think what is important is a general framework that allows combinations of dependent variables of mixed categorical / ordinal / continuous types.
An alternative approach is to not use a particular copula or to assume the marginals follow a specific distribution. Rather, there is a semi-parametric copula approach based upon the rank likelihood. In this case, āAny continuous multivariate distribution can be used to form a copula model via
an inverse-CDF transformationā (p. 270). Then, similar to a probit model, the ranks are used to sample the underlying latent data. The nice aspects are that there is no need to sample the cutpoints for ordinal data and this works for āmixed dataā. And, in fact, many are now using this approach in lieu of a multivariate probit model. See the R packages BGGM and BDgraph.
Here is the paper.
āEXTENDING THE RANK LIKELIHOOD FOR SEMIPARAMETRIC
COPULA ESTIMATIONā
Hi Bob, has there been any progress towards porting those vinecopulib functions into Stan since 2018?
Hi @Bob_Carpenter, @Robert is working on HMMs + copulas for his dissertation and weāre wondering if there are any plans to port those into Stan? Any chance yāall might know @Jonah @charlesm93 @stevebronder?
Hey. I for one donāt know what the plan for this is. Either someone responds to this thread, else Iāll bring it up at the next Stan meeting (next Thursday) and report here. Meetings are open and anyone is welcome to swing by if they want to discuss a topic.
@Bob_Carpenter from your 2018 post where you talking about the stuff like the Gaussian Copula here? Is there a math issue for these? I think just writing up how to add these would be useful to getting someone to add them. A lot of these seem reasonable where the parameters in the class would just be added to the pdf etc. function signature.
Iāve coded up the multivariate normal copula with an example, stan function, and R code.
Iāve gone through the math and we can do the derivatives as well as theyāre not so different than the multi normal cholesky lpdf.