I am reaching out to share a new Stan-based R package, eDNAjoint, I recently developed. The software allows users to jointly model environmental DNA (eDNA) and “traditional” survey data in a Bayesian framework to understand the relative sensitivities of the two survey methods.
The paper is open access and I found it very easy to follow (a bio journal that allows real methodology in the main paper!). The model looks very well suited to Stan.
P.S. I took a look at the underlying Stan code on GitHub and have some comments, though overall the code looks fine as is.
(negbin == 1) ? 1 : 0 is equivalent to negbin == 1. Given that negbin only takes two values, 0 or 1, this is also equivalent to just negbin in this case. There are other instances of this in the code.
You can declare-define in a single line, so you can use the following here and in many other places:
We don’t usually recommend artificial boundary-avoiding constraints. If you have problems using lower=-1 it’s almost always a sign of a bigger problem that lower=-0.99999 won’t directly solve (though it may eliminate warning messages). This shows up here:
vector<lower = -0.99999>[nparams] q_trans;
You might want to calculate get_lambda_count on the log scale to avoid round trips of log and exponent. Then you can use neg_binomial_2_log directly. The reason for this is that any time you exponentiate it can cause arithmetic instability because there’s only a narrow range that doesn’t shoot off into the tails or overflow/underflow.
I’m not sure why you need to convert coef to an array. Can’t you just write the function to take a vector input?
You can vectorize the sampling statements in the model (but not in generated quantities where you want individual log_lik values), which will accelerate the code a bit. For example, replace the loops with
I think you can probably replace the loops in calc_mu with matrix operations on the outside in the calculation of mu, but I didn’t work it out. If so, this is a pretty big speedup.