This is my first real attempt at using Stan (coming from a BUGS background). I am an ecologist trying to run a binomial mixture model. I’m running things through rstan
. I think I have the model running well but I’m trying to capture the pointwise likelihood for use in loo
. [EDIT] below is my code:
// Binomial mixture model with covariates
data {
int<lower=0> R; // Number of transects
int<lower=0> T; // Number of temporal replications
int<lower=0> y[R, T]; // Counts
vector[R] elev; // Covariate
vector[R] elev2; // Covariate
vector[R] litter; // Covariate
matrix[R, T] RH; // Covariate
matrix[R, T] temp; // Covariate
matrix[R, T] temp2; // Covariate
vector[R] gcover; // Covariate
vector[R] gcover2; // Covariate
int<lower=0> K; // Upper bound of population size
}
transformed data {
int<lower=0> max_y[R];
int<lower=0> N_ll;
int foo[R];
for (i in 1:R)
max_y[i] = max(y[i]);
for (i in 1:R) {
foo[i] = K - max_y[i] + 1;
}
N_ll = sum(foo);
}
parameters {
real alpha0;
real alpha1;
real alpha2;
real alpha3;
real beta0;
real beta1;
real beta2;
real beta3;
real beta4;
real beta5;
}
transformed parameters {
vector[R] log_lambda; // Log population size
matrix[R, T] logit_p; // Logit detection probability
log_lambda = alpha0 + alpha1 * elev + alpha2 * elev2 + alpha3 * litter;
logit_p = beta0 + beta1 * RH + beta2 * temp + beta3 * temp2 + rep_matrix(beta4 * gcover, T);
}
model {
// Priors
// Improper flat priors are implicitly used on
// alpha0, alpha1, beta0 and beta1.
// Likelihood
for (i in 1:R) {
vector[K - max_y[i] + 1] lp;
for (j in 1:(K - max_y[i] + 1))
lp[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
+ binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
target += log_sum_exp(lp);
}
}
generated quantities {
int<lower=0> N[R]; // Abundance
int totalN;
vector[R] log_lik;
real mean_abundance;
matrix[R, T] p;
// Likelihood for use in loo-cv
for (i in 1:R) {
vector[K - max_y[i] + 1] ll;
for (j in 1:(K - max_y[i] + 1)) {
ll[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
+ binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
}
log_lik[i] = log_sum_exp(ll);
}
for (i in 1:R) {
N[i] = poisson_log_rng(log_lambda[i]);
p[i, 1:T] = inv_logit(logit_p[i, 1:T]);
}
totalN = sum(N); // Total pop. size across all sites
mean_abundance = exp(alpha0);
}
But I’m not sure how to capture the log likelihood. I’m currently trying the following in the generated quantities:
generated quantities {
for (i in 1:R) {
vector[K - max_y[i] + 1] ll;
for (j in 1:(K - max_y[i] + 1)) {
ll[j] = poisson_log_lpmf(max_y[i] + j - 1 | log_lambda[i])
+ binomial_logit_lpmf(y[i] | max_y[i] + j - 1, logit_p[i]);
}
log_lik[i] = log_sum_exp(ll);
}
}
I’m not sure if this is capturing what I want or need for loo
in general or even the likelihood properly. I am not even sure if I need R
log likelihood values or if I need the sum of the lengths of the ll
across all R
sites (N_ll
I think)?
My other question related to this model is if the binomial part of the lp
is specified correctly? I am assuming it’s getting the probability for the series (vector of length T) for each of the R sites. As opposed to having to specify another for loop through the T visits to each site (making y[i,t]
and logit_p[i,t]
.
Thank you for any help while I try to learn the models better along with Stan and the associate samplers. I can also submit my whole script and data in needed but they are both a mess at the moment.