Target+= and multivariate likelihood


#1

This query arise from something I was wondering in another thread ( Multiple outcomes hurdle model - exists? ), where at one stage I’m trying to make a multivariate lognormal model. Due to the change of variables involved I understand I need to make an adjustment to the model log density. However my confusion arises in how exactly to do this in the multivariate case. Which I realise in turn stems from my poor understanding of the structure of the model log density (i.e. is it simply a number, or because I have a multi-variate outcome it the model log density also multi-variate ??).

Full model herelinear_mvlognorm.stan (1.8 KB), but to put it into context, the salient parts of my model are:

data{
    int<lower=0> NvarsY;   // num dependent variables
    ....
    vector[NvarsY] y [N];  // data for dependent vars
}

...

model{
    ....
    //likelihood
    log(y) ~ multi_normal_cholesky(mu, L_Sigma);
    for (dv in 1:NvarsY) 
        target += -log(y[dv]);
}

What I’m unclear on is the last two lines - I don’t know if I’m adding the -log(y) term correctly given that y is multivariate, and I don’t know what is the form of target+=. Should I be looping over 1:N, or 1:dv as in the example, or is simply -log(y) as a whole entity sufficient, or should I be summing things over y ?


#2

Sorry, for not replying in that earlier thread.

The log (posterior) probability is always “only” one number for each iteration in Stan. Remember, that (putting it really simply and omitting the normalizing constant) we are looking for p(y|\theta)p(\theta), where p(y|\theta) is the “likelihood” and p(\theta) is the prior. This whole thing p(y|\theta)p(\theta) is the posterior probability (again, omitting the normalizer). Taking the log of this gives us the log posterior… \log p(y|\theta) + \log p(\theta). This is what we are interested in; it’s the target. And you see that the log turns all the products to additions. So as long as we are on the log scale, we can just happily add to this target. For example, if the likelihood has two parameters then we just add another prior to the target, e.g. \log p(y|\theta,\phi) + \log p(\theta) + \log p(\phi).

Take the univariate log-normal as a simple example. Let’s say we transform y_i to \log(y_i) with i=1,\dots,N observations. The log absolute Jacobian is -\log(y_i), so

\sum_{i=1}^N \left( \log p(\log(y_i)|\mu,\sigma) -\log(y_i) + \log p(\mu) + \log p(\sigma) \right)

or

\sum_{i=1}^N (\log p(\log(y_i)|\mu,\sigma)) - \sum_{i=1}^N(\log(y_i)) + \log p(\mu) + \log p(\sigma)

to make it more explicit. \sum_{i=1}^N \log p(\log(y_i)|\mu,\sigma) is target += normal_lpdf(log(y) | mu, sigma) – note that this is vectorized in Stan (over all i), when we have vector[N] y in the data block! Note that log(y) ~ normal(mu, sigma) is essentially the same as the expression above… the latter drops constants, but this is not really important here. So this evaluates to a single number (per Stan iteration).
(The terms \log p(\mu) and \log p(\sigma) are the priors on \mu and \sigma, but this is not really important here, note that they also evaluate to one number for each iteration in Stan.)
So, now coming to your question, you might think, that - \sum_{i=1}^N(\log(y_i)) would translate to target += sum(log(y). And you would be correct! Actually, you could also make it even more explicit and do

for (n in 1:N)
  target += -log(y[n])

…same thing! And finally, also the “simple” target += log(y) evaluates to the same “number”/increase in the traget equation as the two other formulations.

Long story short… If you have a multivariate normal (or log-normal) model, the log posterior probability still evaluates to one number per iteration. Since everything is on a log-scale you simply add stuff (+=) and so for each dependent variable in vector[NvarsY] y [N]; you just add the Jacobian correction by looping over NvarsY.

EDIT: the last bit is wrong… see below


#3

Great answer @Max_Mantei - again thank you very much.


#4

Sorry, I got something wrong in the last bit of the previous post.

When you have vector[NvarsY] y[N], then y[i] gives you the ith entry in the dimension of N – so the result is a vector of length NvarsY. The correct way is to properly loop over N… thus:

for (n in 1:N)
  target += -log(y[n]);

Sorry, that bit did not carry over as seamlessly from the univariate case as I thought. So, here are two things to be careful about:

  • Correct indexing (maybe use the print() statement to debug)
  • The sum of logs is not the same as the log of sums… (Think about what comes first: First, think about the target for one observation and then think about summing over all observations.)