New Stan-based R package: eDNAjoint for modeling environmental DNA data

abigailkeller1 · April 10, 2025, 3:35pm

Hi Stan community!

I am reaching out to share a new Stan-based R package, eDNAjoint, I recently developed. The software allows users to jointly model environmental DNA (eDNA) and “traditional” survey data in a Bayesian framework to understand the relative sensitivities of the two survey methods.

A manuscript about the package was recently published in Methods in Ecology and Evolution. If you need help getting started, check out the package user guide and a MEElive! webinar with an example workflow.

Also, for any Stan users interested in contributing code, I have highlighted ways to get involved in this rOpenSci blog post.

Thanks,
Abby Keller

Bob_Carpenter · April 10, 2025, 5:01pm

Thanks for posting, @abigailkeller1.

The paper is open access and I found it very easy to follow (a bio journal that allows real methodology in the main paper!). The model looks very well suited to Stan.

P.S. I took a look at the underlying Stan code on GitHub and have some comments, though overall the code looks fine as is.

(negbin == 1) ? 1 : 0 is equivalent to negbin == 1. Given that negbin only takes two values, 0 or 1, this is also equivalent to just negbin in this case. There are other instances of this in the code.
You can declare-define in a single line, so you can use the following here and in many other places:

  vector<lower = 0, upper = 1>[Nloc_trad] p11_trad
    = calc_p11(Nloc_trad, mu_trad, mat_site, trad_ind, alpha);

We don’t usually recommend artificial boundary-avoiding constraints. If you have problems using lower=-1 it’s almost always a sign of a bigger problem that lower=-0.99999 won’t directly solve (though it may eliminate warning messages). This shows up here:

 vector<lower = -0.99999>[nparams] q_trans;

You might want to calculate get_lambda_count on the log scale to avoid round trips of log and exponent. Then you can use neg_binomial_2_log directly. The reason for this is that any time you exponentiate it can cause arithmetic instability because there’s only a narrow range that doesn’t shoot off into the tails or overflow/underflow.
I’m not sure why you need to convert coef to an array. Can’t you just write the function to take a vector input?
You can vectorize the sampling statements in the model (but not in generated quantities where you want individual log_lik values), which will accelerate the code a bit. For example, replace the loops with

n_E ~ neg_binomial_2(lambda, phi);
...
K_dna ~ binomial(N_dna, p_dna[L_dna]);

I think you can probably replace the loops in calc_mu with matrix operations on the outside in the calculation of mu, but I didn’t work it out. If so, this is a pretty big speedup.
The body of calc_p11 can just be

  return mu_trad ./ (mu_trad + exp(mat_site[trad_ind, ] * alpha));

P.S. I really like the fishing-themed hex logo.

abigailkeller1 · April 10, 2025, 5:12pm

@Bob_Carpenter Thanks so much for this review! This was so quick. :)

I’m planning to work on some updates over the next month, so I will definitely be implementing your advice. Thank you!

jonah · April 10, 2025, 7:50pm

Very cool! Thanks for letting us know about this.

Topic		Replies	Views
Occupancy modeling in Stan Publicity ecology	7	2287	November 3, 2021
Workshop: Intro to rstan & rstanarm for (ecology) grad students Publicity ecology	1	1109	May 8, 2017
Blog posts on ecological modeling in Stan Modeling	6	245	February 11, 2025
CmdStan & Stan 2.36 release candidate General	2	407	December 3, 2024
Beta release of our 'posterior' R package Announcements posterior-package	2	736	June 15, 2020

New Stan-based R package: eDNAjoint for modeling environmental DNA data

Related topics