I am reaching out to share a new Stan-based R package, eDNAjoint, I recently developed. The software allows users to jointly model environmental DNA (eDNA) and “traditional” survey data in a Bayesian framework to understand the relative sensitivities of the two survey methods.
The paper is open access and I found it very easy to follow (a bio journal that allows real methodology in the main paper!). The model looks very well suited to Stan.
P.S. I took a look at the underlying Stan code on GitHub and have some comments, though overall the code looks fine as is.
(negbin == 1) ? 1 : 0 is equivalent to negbin == 1. Given that negbin only takes two values, 0 or 1, this is also equivalent to just negbin in this case. There are other instances of this in the code.
You can declare-define in a single line, so you can use the following here and in many other places:
We don’t usually recommend artificial boundary-avoiding constraints. If you have problems using lower=-1 it’s almost always a sign of a bigger problem that lower=-0.99999 won’t directly solve (though it may eliminate warning messages). This shows up here:
vector<lower = -0.99999>[nparams] q_trans;
You might want to calculate get_lambda_count on the log scale to avoid round trips of log and exponent. Then you can use neg_binomial_2_log directly. The reason for this is that any time you exponentiate it can cause arithmetic instability because there’s only a narrow range that doesn’t shoot off into the tails or overflow/underflow.
I’m not sure why you need to convert coef to an array. Can’t you just write the function to take a vector input?
You can vectorize the sampling statements in the model (but not in generated quantities where you want individual log_lik values), which will accelerate the code a bit. For example, replace the loops with
I think you can probably replace the loops in calc_mu with matrix operations on the outside in the calculation of mu, but I didn’t work it out. If so, this is a pretty big speedup.
@Bob_Carpenter Thanks again for your tips here! A bit slow to actually implement these, but I have a few questions:
Regarding your point 4: I understand how overflow/underflow can happen when exponentiating, but in my model code now, there is no exponentiation for the parameters of neg_binomial_2. Although, I can see how this would be useful for a negative binomial regression. Are you referring to exponentiation in the sampling process? Or is this just a tip if there is exponentiation in model code?
Regarding your point 6: During sampling, I am getting the following error when vectorizing: n_E ~ neg_binomial_2(lambda, phi);.
Chain 1: Exception: neg_binomial_2_lpmf: Failures variable has size = 1274, but Precision parameter has size 1; and they must be the same size.
In my code, n_E and lambda are arrays of length N, and phi is a scalar.
Do you suggest broadcasting phi somehow? Or just keeping the loop?
Thanks so much again for your help and for looking so closely at the code!