Calculating the likelihood for loo in a binomial mixture model



This is my first real attempt at using Stan (coming from a BUGS background). I am an ecologist trying to run a binomial mixture model. I’m running things through rstan. I think I have the model running well but I’m trying to capture the pointwise likelihood for use in loo. [EDIT] below is my code:

// Binomial mixture model with covariates
data {
  int<lower=0> R;       // Number of transects
  int<lower=0> T;       // Number of temporal replications
  int<lower=0> y[R, T]; // Counts
  vector[R] elev;          // Covariate
  vector[R] elev2;          // Covariate
  vector[R] litter;          // Covariate
  matrix[R, T] RH;      // Covariate
  matrix[R, T] temp;      // Covariate
  matrix[R, T] temp2;      // Covariate
  vector[R] gcover;      // Covariate
  vector[R] gcover2;      // Covariate
  int<lower=0> K;       // Upper bound of population size

transformed data {
  int<lower=0> max_y[R];
  int<lower=0> N_ll;
  int foo[R];

  for (i in 1:R)
    max_y[i] = max(y[i]);
  for (i in 1:R) {
    foo[i] = K - max_y[i] + 1;
    N_ll = sum(foo);
parameters {
  real alpha0;
  real alpha1;
  real alpha2;
  real alpha3;
  real beta0;
  real beta1;
  real beta2;
  real beta3;
  real beta4;
  real beta5;

transformed parameters {
  vector[R] log_lambda; // Log population size
  matrix[R, T] logit_p; // Logit detection probability

  log_lambda = alpha0 + alpha1 * elev + alpha2 * elev2 + alpha3 * litter;
  logit_p = beta0 + beta1 * RH + beta2 * temp + beta3 * temp2 + rep_matrix(beta4 * gcover, T);

model {
  // Priors
  // Improper flat priors are implicitly used on
  // alpha0, alpha1, beta0 and beta1.

    // Likelihood
  for (i in 1:R) {
    vector[K - max_y[i] + 1] lp;

    for (j in 1:(K - max_y[i] + 1))
      lp[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
             + binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
    target += log_sum_exp(lp);

generated quantities {
  int<lower=0> N[R];                // Abundance
  int totalN;
  vector[R] log_lik;
  real mean_abundance;
  matrix[R, T] p; 
     // Likelihood for use in loo-cv
    for (i in 1:R) {
    vector[K - max_y[i] + 1] ll;

    for (j in 1:(K - max_y[i] + 1)) {
      ll[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
             + binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
    log_lik[i] = log_sum_exp(ll);
  for (i in 1:R) {
     N[i] = poisson_log_rng(log_lambda[i]);
     p[i, 1:T] = inv_logit(logit_p[i, 1:T]);
   totalN = sum(N);  // Total pop. size across all sites
  mean_abundance = exp(alpha0);

But I’m not sure how to capture the log likelihood. I’m currently trying the following in the generated quantities:

generated quantities {
    for (i in 1:R) {
    vector[K - max_y[i] + 1] ll;

    for (j in 1:(K - max_y[i] + 1)) {
      ll[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
             + binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
    log_lik[i] = log_sum_exp(ll);

I’m not sure if this is capturing what I want or need for loo in general or even the likelihood properly. I am not even sure if I need R log likelihood values or if I need the sum of the lengths of the ll across all R sites (N_ll I think)?

My other question related to this model is if the binomial part of the lp is specified correctly? I am assuming it’s getting the probability for the series (vector of length T) for each of the R sites. As opposed to having to specify another for loop through the T visits to each site (making y[i,t] and logit_p[i,t].

Thank you for any help while I try to learn the models better along with Stan and the associate samplers. I can also submit my whole script and data in needed but they are both a mess at the moment.


Hi Daniel, could you include the whole Stan code? Without data and parameters block there is too much guessing. I think I could figure out that R is the number of sites, and T is the number of visits? What is K? Do you want to predict for a new site, or for old sites but new visits=


I added the full stan code as I currently have it. You are correct that R is the number of sites and T is the number of visits to each site. K is the maximum possible (or larger) population size at any site. Then the likelihood can be calculated for each value from the minimum population size (y_max - the maximum number counted on any single visit) to K with the goal of estimating the true population size N that is somewhere between y_max and K.

My goals are to 1) estimate the parameters in the regressions (done well currently I think), 2) estimate N[i] and to a lesser extent p[i, t] (done well I think), 3) evaluate model fit (Bayesian p-value, calc not included) and loo-cv, and 4) compare models with WAIC. Predicting to new sites is probably of some interest.



Excellent, its more clear now.

loo-cv and WAIC are estimating the same thing, and there is no need to compute them both. PSIS-LOO in loo package has better diagnostics than WAIC, and thus is more reliable. In easy cases PSIS-LOO and WAIC give very similar results. In difficult cases, WAIC is more likely fail and if the results differ PSIS-LOO is more accurate or gives clear indication that both are failing. Thus computing WAIC is not giving you any additional information.

Both loo-cv and WAIC are not just measuring some arbitrary model fit. In usual case tphey both correspond to estimating the predictive performance for a single observation in one of the current sites. Model fit is only for how well would you predict the observation for the next visit in one of the current sites.

I now noticed that you have

binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);

which already sums over the visits.
In the model block

target += log_sum_exp(lp);

is matched in the generated quantities block with (although variable name has changed from lp to ll)

log_lik[i] = log_sum_exp(ll);

As the each log_lik[i] includes the likelihood contribution from all the observation for site i, using these log_lik values for LOO (or WAIC) corresponds to leave-one-site-out predictive performance estimate.


Thanks! Leave-one-site-out predictive performance is definitely what I was primarily looking for. I’m glad to hear that I had it working correctly despite my recent and very brief introduction to Stan. I truly appreciate the amazing documentation, examples, and forums that you have all created.