Thanks for posting. I took a look through the first one, and overall, I’d suggest adding some more text to connect the dots on the models for readers. I imagine even readers who use these models are not going to be well acquainted with why they are set up the way they are.
I’d suggest reading and citing this paper from @monnahc—it’s aimed at ecologists and it has some really nice evaluations.
I also wrote the occupancy case study for Stan, which is ancient and may not even run any more: Multiple Species-Site Occupancy Model
Rather than mixing the frequentist likelihood notation \mathcal{L} and conditioning |, I would suggest just following Gelman et al. in Bayesian Data Analysis and writing densities as p(y \mid \psi, p).
I’m using mostly flat or weakly informative priors.
People new to Bayes fret endlessly about priors. I’d stick to weakly informative versions and explain that they 're both better for computation and for statistics. Gelman has a great paper on this, where he analyzes priors on variance: https://sites.stat.columbia.edu/gelman/research/published/taumain.pdf
Before asking people to install and run a package like pacman
, I’d suggest explaining what it is. The fewer dependencies there are in a case study, the longer lived it will tend to be.
Rather than this:
vector[I] log_psi = log(psi), log1m_psi = log1m(psi);
for (i in 1:I) {
if (Q[i]) {
target += log_psi[i] + bernoulli_lpmf(y[i, 1:J[i]] | p[i, 1:J[i]]);
} else {
target += log_sum_exp(log1m_psi[i],
log_psi[i] + bernoulli_lpmf(y[i, 1:J[i]] | p[i, 1:J[i]]));
}
}
you can write this:
for (n in 1:I) {
if (Q[n]) {
int len = J[n];
int ys = y[1:len];
int ps = p[1:len];
target += log_mix(psi, bernoulli_lpmf(ys, ps),
bernoulli_lpmf(ys, ps));
}
To make this even more efficient, you could sort out the indexes for which Q[n] != 0
in the transformed data block and cut out the conditional. It’s right on the edge of being worthwhile, but I’d be tempted to write this function:
array[] int get_vals(array[] int y, array[] int lengths, int n) {
int len = lengths[n];
return y[n, 1:len];
}
and similarly for real values, then replace all of the indexing in the code with calls to get_vals
.
It might be clearer to write beta_lupdf(psi_a | 1, 1)
as uniform(psi_a | 0, 1)
if your audience isn’t familiar with beta distributions. This is implicitly a standard logistic prior on the log odds, and you can frame it that way, too, especially if you want to do hierarchical modeling, which is much easier on the unconstrained scale.
Not your fault—it’s R, or more specifically the tidyverse, but I really dislike the term tibble
and that it wastes a line reminding you every output.
Usually one would use loo
to compare models, though estimating log predictive density also works. Are the draws really iid though? I think they’re all different sizes.
I don’t see the connection between the multinomial and the occupancy model. The occupancy model was a mixture and I don’t see that in the multinomial model. If this is still something like a mixture, shouldn’t this be regular addition rather than log-sum-exp?
lp[1] = log_sum_exp(log1m(psi_a), lp[1]);
I think it might make more sense to talk about mixtures first, since you use one in the first example.
log_mix(lambda, lps)
can be used for multivariate mixtures where lambda
is a simplex and lps
is a vector (or maybe array) of log densities.
I like the illustration of setting K
.
This is doing a lot of extra work. The way to make Stan code more efficient is to remove redundant calculations. For instance,
vector[I] psi = rep_vector(psi_a, I);
vector[I] lambda = rep_vector(lambda_a, I);
matrix[I, J_max] p = rep_matrix(p_a, I, J_max);
// log-transform occupancy probabilities
vector[I] log_psi = log(psi), log1m_psi = log1m(psi);
can be
vector[I] log_psi = rep_vector(log(psi), I);
vector[I] log1m_psi = rep_vector(log1m(psi), I);
Even better, avoid putting these into matrix form and rely on broadcasting.
I would start the last section mentioning that it’s going to take the form of a survival model, with right censoring.
I found the \min notation in the definition of y confusing and I don’t know what pmin
does in R. I think what you’re saying is that z_{m, n} \sim \text{exponential}(...) and y_{m, n} = \min(z_{m, n}, \Delta_{m, n}). I didn’t understand the reasoning behind setting \Delta = 1. Nor am I clear on why there is even a minimum value here. It looks like the units of y are number of detections but \Delta is time. I could use some help with the units. In Stan, the \min operation is going to introduce a discontinuity in the model, which can be difficult for optimization and HMC.