Pointwise log lik for multi-response Poisson model

Consider a multiresponse (i.e., multivariate) model with correlations between responses j \in J captured by correlated observation-level random effects \nu, and a Poisson observation model for each response.

y_{i,j} \sim \text{Poisson}(\lambda_{i,j})
\text{log}(\lambda_{i,j}) = b_{0_j} + \nu_{i,j}
\nu_i \sim \text{MultiNormal}(\vec{0}, \Sigma) ~\forall~ i \in I

To calculate the pointwise log-likelihood in a way that is sensible for loocv, I am thinking that you need to include both the latent \nu and the observed y. But of course then you have the problem of non-factorizability of the multivariate normal. But we can derive the conditional means and standard deviations using the approach from: BĂĽrkner, P. C., Gabry, J., & Vehtari, A. (2021). Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models. Computational Statistics , 36 (2), 1243-1261.

So, putting the pieces together, the pointwise log lik would be:

\text{log_lik}_{i,j} = \text{normal_lpdf}(\nu_{i,j} ~|~ \text{conditional_mu}, \text{conditional_sigma}) \\ + \text{poisson_lpmf}(y_{i,j} ~|~ \text{exp}(b_{0_j} + \nu_{i,j}))

Where conditional mu and conditional sigma are based soley on the latent \nu values and the covariance matrix \Sigma. This formulation takes into account both the latent observation-level parameters and the observed values. Am I on the right track here?

Given that \nu is a matrix, I’m assuming you mean that it’s mapped over the first dimension, so that it’s

\nu_i \sim \text{MultiNormal}(0, \Sigma) for i \in \{ 1, \ldots, I \},

where \nu_i = \nu_{i, 1}, \ldots, \nu_{i, J}. That means \Sigma is J \times J and b_{0_j} must be a scalar or an I-vector.

Rather than trying to do leave-one-out over the y_{i, j}, with multivariate data it usually makes more sense to do leave one group out over the y_i = y_{i,1}, \ldots, y_{i, J}. Mainly because that’s the unit that’s usually (conditionally) exchangeable. It is also common in this case to marginalize out the \nu to leave just the dependency on \Sigma and b_0, but you get that for free with Stan if you just report

\mathcal{L}(y_i) = \log \text{Poisson}(y_i \mid \lambda_i) = \sum_{j=1}^J \text{Poisson}(y_{i, j} \mid \lambda_{i, j}),

which can be coded using Stan’s vectorization directly as

log_lik[i] = poisson_lpmf(y[i] | exp(lambda[i]));

but is more efficient to avoid the exp() by using the Poisson with a log-scale parameter as follows

log_lik[i] = poisson_log_lpmf(y[i] | lambda[i]);

The multivarite normal marginalizes cleanly, but the level of exchangeability is usually over the whole vecrtor. There was just a discussion of how to do this in a recent thread about cross-validation.

Hi Bob,

Thank you for your reply! I’ll try to restate what you wrote to check my understanding:

(1) it usually makes more sense to consider the “point” in the pointwise likelihood the vector for each row y{_i} rather than each observation y_{i,j}. The Bürkner et al. paper that I referenced in the original post considers cases where we truly do want a pointwise likelihood over each observation y_{i,j}.

(2) Assuming we did want to calculate the pointwise likelihood over observations rather than rows, it is not necessary to include the latent observation-level random effects because we already marginalize over \nu when using poisson_log_lpmf(b_{0_j} + \nu_{i,j}). This differs from the multivariate normal and student-t case, because we already partition the latent variables and the observational model.

I have a follow-up question. Let’s say instead of multiple Poisson responses, we have j = 1 is a Gaussian response and j = 2 is Poisson response. In which case:

\begin{bmatrix} y_{i1} - b_{0_1} \\ \nu_{i2} \end{bmatrix} \sim \text{MultiNormal}\left(\mathbf{0}, \Sigma\right) \quad \forall~ i \in I
y_{i2} \sim \text{Poisson}(b_{0_2} + \nu_{i2} )

Now, say we want to calculate the likelihood for each row i. Here I am thinking we need the conditional mean and standard deviation for the Gaussian response?

log_lik[i] = normal_lpdf(y_{i1} - b_{0_1} | cond_mean, cond_sd) + poisson_log_lpmf(y_{i2} | b_{0_2} + \nu_{i,2})

You include the effect, but you don’t include the densities of the random effect.

If you’re just fitting a multivariate normal to data, then there are no random effects, so nothing to marginalize out.

Yes, I think what you wrote is correct here. Technically, you’re generating the data y_i, which has two components, y_{i,1} and y_{i,2}. I would’ve written \text{normal}(y_{i,1} | \text{condMean} + b_{0_1}, \text{condSd}) to make that clear, but it works out to the same value as what you wrote. So you marginalize the \nu_{i,2} value out as that’s not part of the observed data.

This is getting subtle enough that I’d be more comfortable if @avehtari could check my work.

P.S. I think it’d be easier if you wrote

[y_{i,1} \ \ v_{i, 1}]^\top \sim \text{MultiNormal}(b_0, \Sigma).

1 Like

If you are sampling \lambda_{i,j} you can easily do LOO-CV using

log_lik[i,j] = poisson_lpmf(y[i, j] | lambda[i,j]);

and then in the end you need to convert this to a vector with length I*J, so that you can use it with loo package.

If you would prefer to do to leave ith row out (I didn’t see you mentioning what i is indexing), you can do

log_lik[i] = poisson_lpmf(y[i,] | lambda[i,]);

where poisson_lpmf() sums together the lpmfs over j. Using then this log_lik with loo package gives you expected log joint predictive probability, which will be more sensitive to the correlation model part. It is likely that is latter would be better for model comparison.

In these cases, explicitly presenting \lambda_{ij} makes the likelihood to be factorized and easy to handle and the situation is different from the non-factorized case presented in the paper you mention. The non-factorized case really can be solved only for normal and Student-t, and for other observation models you need to present \lambda_{ij} (or corresponding parameter) explicitly or resort to normal approximations (like INLA software does).

1 Like

Thanks, @avehtari!

1 Like

Thanks, @avehtari and @Bob_Carpenter! I had been thinking about some subtleties with observation-level random effects and started to think I had been calculating the pointwise likelihood wrong in the past, but what you say here makes sense. The more complex case of one Normal one Poisson in my first reply is what motivated the delve into factorization.