This post is especially for @avehtari and others who are well-versed in questions about information criteria.
I have a Poisson regression model that, in principle, has a single response variable. However, because of a few somewhat complicated random effects structures (not show in this example), it made sense to split up the estimation into multiple groups/functions. For simplicity, imagine we are estimating the respective outcomes for males and females via separate likelihoods. Assuming different samples sizes for the two (N_A1 and N_A2), the model then looks something like this:
model {
//group 1 -- males
for ( i in 1:N_A1 ) {
lambda_A1[i] = a
;
}
//group 2 -- females
for ( i in 1:N_A2 ) {
lambda_A2[i] = a
;
}
y_A1 ~ poisson_log( lambda_A1 );
y_A2 ~ poisson_log( lambda_A2 );
}
Meanwhile, my colleagues are interested in having information criteria (LOOIC, WAIC, etc.) for the models that we estimate. In general, I understand that the log_likelihood is not well defined for multivariate Poisson models. However, question number 1 – my assumption is that since this isn’t truly a multivariate response, it would be okay to estimate the log_lik for each of the functions. Is that right?
Assuming that it would be okay, I’m not quite sure how to generate log_lik quantities that are compatible with some of the convenience functions that take the log_lik from Stan models to generate summary calculations. In other words, the following represents my first pass at doing this, with the evident need to generate a separate log_lik quantity for each of the functions:
generated quantities{
vector[N_A1] log_lik_A1;
vector[N_A2] log_lik_A2;
real lambda_A1;
real lambda_A2;
lambda_A1 = a;
lambda_A1 = exp(lambda_A1);
lambda_A2 = a;
lambda_A2 = exp(lambda_A2);
for ( i in 1:N_A1 ) log_lik_A1[i] = poisson_lpmf( y_A1[i] | lambda_A1 );
for ( i in 1:N_A2 ) log_lik_A2[i] = poisson_lpmf( y_A2[i] | lambda_A2 );
}
However, it seems that the log_lik for the respective observations would ideally end up as a single vector of log_liks rather than the two separate vectors that are defined here. Question number 2 – is there a way to accomplish that?
(The alternative would be for me to recode the model as a single likelihood function, but the indexing will get somewhat intense because there are actually closer to 30 sets of correlated random effects, not 2, but with common variances/covariances and fixed effect parameters across the functions.)