New Stan-based R package: eDNAjoint for modeling environmental DNA data

Hi Stan community!

I am reaching out to share a new Stan-based R package, eDNAjoint, I recently developed. The software allows users to jointly model environmental DNA (eDNA) and “traditional” survey data in a Bayesian framework to understand the relative sensitivities of the two survey methods.

A manuscript about the package was recently published in Methods in Ecology and Evolution. If you need help getting started, check out the package user guide and a MEElive! webinar with an example workflow.

Also, for any Stan users interested in contributing code, I have highlighted ways to get involved in this rOpenSci blog post.

Thanks,
Abby Keller

5 Likes

Thanks for posting, @abigailkeller1.

The paper is open access and I found it very easy to follow (a bio journal that allows real methodology in the main paper!). The model looks very well suited to Stan.

P.S. I took a look at the underlying Stan code on GitHub and have some comments, though overall the code looks fine as is.

  1. (negbin == 1) ? 1 : 0 is equivalent to negbin == 1. Given that negbin only takes two values, 0 or 1, this is also equivalent to just negbin in this case. There are other instances of this in the code.
  2. You can declare-define in a single line, so you can use the following here and in many other places:
  vector<lower = 0, upper = 1>[Nloc_trad] p11_trad
    = calc_p11(Nloc_trad, mu_trad, mat_site, trad_ind, alpha);
  1. We don’t usually recommend artificial boundary-avoiding constraints. If you have problems using lower=-1 it’s almost always a sign of a bigger problem that lower=-0.99999 won’t directly solve (though it may eliminate warning messages). This shows up here:
 vector<lower = -0.99999>[nparams] q_trans;
  1. You might want to calculate get_lambda_count on the log scale to avoid round trips of log and exponent. Then you can use neg_binomial_2_log directly. The reason for this is that any time you exponentiate it can cause arithmetic instability because there’s only a narrow range that doesn’t shoot off into the tails or overflow/underflow.
  2. I’m not sure why you need to convert coef to an array. Can’t you just write the function to take a vector input?
  3. You can vectorize the sampling statements in the model (but not in generated quantities where you want individual log_lik values), which will accelerate the code a bit. For example, replace the loops with
n_E ~ neg_binomial_2(lambda, phi);
...
K_dna ~ binomial(N_dna, p_dna[L_dna]);
  1. I think you can probably replace the loops in calc_mu with matrix operations on the outside in the calculation of mu, but I didn’t work it out. If so, this is a pretty big speedup.
  2. The body of calc_p11 can just be
  return mu_trad ./ (mu_trad + exp(mat_site[trad_ind, ] * alpha));

P.S. I really like the fishing-themed hex logo.

2 Likes

@Bob_Carpenter Thanks so much for this review! This was so quick. :)

I’m planning to work on some updates over the next month, so I will definitely be implementing your advice. Thank you!

Very cool! Thanks for letting us know about this.