New formatting lpdf(.. | .., ..) for for multiple conditionalities

specification

#1

Hello Community,

as much as I like the formulation lpdf(… | …, …)

it doesn’t accommodate cases in which two variables are conditional to other variables

lpdf(x, y | w, z, p, q) like in this case Gamma process - is this feasible?

indeed to make stan work i need to write lpdf(x | y, w, z, p, q) even it doesn’t reflect the conditionality


#2

Not necessarily. All you have to do is: vector[2] obs = [x, y]', then *_lpdf(obs | w, z, p, q).

That’s ultimately what you want.


#3

Thanks,

didn’t know that was possible. I changed the title.


#4

Daniel’s right in that you can return a vector or array now. We’ll introduce a tuple<T1, T2, ..., TN> type soon that will allow for grouping in function arguments and returns (including densities).

Another notation problem is when we have a mixed distribution that’s neither a pdf or pmf (that is, it’s mixed and has both discrete and continuous components). I don’t know what we’ll do about that yet. @betanalpha?


#5

Any such mixed densities would always be of the form pdf( continuous | discrete) pmf(discrete) or pmf(discrete | continuous) pdf(continuous) and it would be far more natural to keep two densities separated.


#6

From a programming perspective, I could see someone generating a pair of a simplex and a multinomial draw. Sure, it’d be drawing the simplex from some distribution then the multinomial conditional on that, but it’d make sense to put the shared functionality together if some client code wanted access to the outcome and the latent parameter in a Dirichlet-multinomial. I could imagine cases where it would be motivated by efficiency or encapsulation. I can also see using it for something like the sufficient statistics form of the multivariate normal where you get a discrete count and then the first and second sample moments or something like that; but then for that case, the discreteness of the count isn’t particularly meaningful as we wouldn’t be modeling it.

The posterior is just such a mixed density when we have discrete RNGs in generated quantities. It’s factored according to Michael’s second case Michael suggests.


#7

I am sorry to reactivate this thread, but I disagree that all mixed densities necessarily are necessarily combinations of the pdf and pmf. A case in point is the already implemented Wiener distribution which is a bivariate distribution defined over pairs of strictly positive RTs and a binary variable. The most natural way for the Wiener would indeed be:
wiener_lpdf(rt, response| alpha, tau, beta, delta)
Currently, the PDF has to be called twice for each response with (1-beta) and -delta for rts hitting the lower bound. This was already discussed on the list some time ago where the original author of the Wiener PDF jumped in and said the same: https://groups.google.com/d/msg/stan-users/-6wJfA-t2cQ/Y0xLTZlZCAAJ

Especially for bivariate PDFs with possibly more than two responses (e.g., the LBA), a specification via a tuple or explicitly with two values before the | would be ideal.


#8

In theory, if you have a discrete component and continuous component, you can always decompose their joint probability function p(a,b) into p(a|b) * p(b). In Stan as of v2.15, you pretty much have to build up such a thing the way Michael suggests. One example would be building a mixture of a Bernoulli and a normal. You could apply it to data, but you can’t give a parameter that distribution because Stan can’t sample from distributions with discrete components.